Skip to content

Produces a large number of zombie processes #177

@beardnick

Description

@beardnick

Environment

prove

prove --version
TAP::Harness v3.43 and Perl v5.34.0

nginx

nginx -V
nginx version: openresty/1.25.3.1
built by gcc 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 
built with OpenSSL 3.2.0 23 Nov 2023
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-debug --with-cc-opt='-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2 -DAPISIX_RUNTIME_VER=1.2.0 -DNGX_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.so -DNGX_HTTP_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.so -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl3/include' --add-module=../ngx_devel_kit-0.3.3 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.26 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.37 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.20 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.14 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -Wl,-rpath,/usr/local/openresty/wasmtime-c-api/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl3/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl3/lib' --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../mod_dubbo-1.0.2 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../ngx_multi_upstream_module-1.2.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../apisix-nginx-module-1.16.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../apisix-nginx-module-1.16.0/src/stream --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../apisix-nginx-module-1.16.0/src/meta --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../wasm-nginx-module-0.7.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../lua-var-nginx-module-v0.5.3 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../grpc-client-nginx-module-v0.5.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../lua-resty-events-0.2.0 --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --with-http_v3_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --without-pcre2 --with-http_ssl_module

apisix

apisix version
3.9.1

test-nginx: master branch

How to reproduce

Run unit tests with prove. There are many errors like timeout when waiting for the process 78711 to exit.

prove -v -I ./test-nginx/lib -I./ t/plugin/openid-connect.t

ok 1 - t/plugin/openid-connect.t TEST 1: Sanity check with minimal valid configuration. - status code ok
ok 2 - t/plugin/openid-connect.t TEST 1: Sanity check with minimal valid configuration. - response_body - response is expected (repeated req 0, req 0)
ok 3 - t/plugin/openid-connect.t TEST 1: Sanity check with minimal valid configuration. - pattern "[error]" does not match a line in error.log (req 0)
t/plugin/openid-connect.t TEST 2: Missing `client_id`. - timeout when waiting for the process 78711 to exit at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 681.
t/plugin/openid-connect.t TEST 2: Missing `client_id`. - WARNING: killing the child process 78711 with force... at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 720.
ok 4 - t/plugin/openid-connect.t TEST 2: Missing `client_id`. - status code ok
ok 5 - t/plugin/openid-connect.t TEST 2: Missing `client_id`. - response_body - response is expected (repeated req 0, req 0)
ok 6 - t/plugin/openid-connect.t TEST 2: Missing `client_id`. - pattern "[error]" does not match a line in error.log (req 0)
t/plugin/openid-connect.t TEST 3: Wrong type for `client_id`. - timeout when waiting for the process 78899 to exit at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 681.
t/plugin/openid-connect.t TEST 3: Wrong type for `client_id`. - WARNING: killing the child process 78899 with force... at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 720.

and there are many defunct nginx processes

ps -ef | grep nginx
root         785       1  0 08:40 ?        00:00:00 [nginx] <defunct>
root        2248       1  0 08:41 ?        00:00:00 [nginx] <defunct>
root        2885       1  0 08:42 ?        00:00:00 [nginx] <defunct>
root        4446       1  0 08:43 ?        00:00:00 [nginx] <defunct>
root        5007       1  0 08:44 ?        00:00:00 [nginx] <defunct>
root       19585       1  0 09:00 ?        00:00:00 [nginx] <defunct>
root       19770       1  0 09:00 ?        00:00:00 [nginx] <defunct>
root       21483       1  0 09:02 ?        00:00:00 [nginx] <defunct>
root       25649       1  0 09:07 ?        00:00:00 [nginx] <defunct>
root       27841       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       27842       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       27843       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       27989       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       27990       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       27991       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       28104       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       28105       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       28106       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       28243       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       28244       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       28245       1  0 09:09 ?        00:00:00 [nginx] <defunct>
root       29939       1  0 09:11 ?        00:00:00 [nginx] <defunct>

The possible reason

The prove will kill the nginx process after completing one unit test. However, nginx may exit too quickly, and the prove hasn't waited for the child process to finish. As a result, the nginx process becomes a zombie process, but is_running still considers it a valid process. The prove will continue attempting to kill the nginx process repeatedly until the timeout.

if (defined $pid) {
if ($ENV{TEST_NGINX_FAST_SHUTDOWN}) {
if ($Verbose) {
warn "sending TERM signal to $pid";
}
kill(SIGTERM, $pid);
} else {
if ($Verbose) {
warn "sending QUIT signal to $pid";
}
kill(SIGQUIT, $pid);
}
}
if ($Verbose) {
warn "waitpid timeout: ", timeout();
}
my $timeout_val = timeout();
while ($timeout_val > 0 && is_running($pid)) {
waitpid($pid, WNOHANG);
sleep 0.05;
$timeout_val -= 0.05;
}

My workaround

I've modified the is_running function to recognize zombie processes, allowing the unit tests to run faster without generating as many error messages. However, it will still produce a large number of zombie processes.

The original function

sub is_running ($) {
my $pid = shift;
return kill 0, $pid;
}

My workaround

sub is_running ($) {
    my $pid = shift;
    return  (kill(0, $pid)) && (not is_defunct($pid));
}

sub is_defunct ($) {
    my $pid = shift;
    my $output = `ps -o stat= -p $pid`;
    chomp($output);
    return $output =~ /Z/;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions