nginx reload出现了core, 原因是 lua module初始化流程中有修改 ngx_cycle导致。

我们在回归测试时遇到这样一个问题:
nginx reload时，nginx master进程会收到子进程退出信号，并调用 ngx_process_get_status强制释放共享内存锁。

#0  0x000055e2ccdd6a9a in ngx_shmtx_force_unlock (mtx=0x68, pid=3086274) at src/core/ngx_shmtx.c:155
#1  0x000055e2cce03804 in ngx_unlock_mutexes (pid=3086274) at src/os/unix/ngx_process.c:647
#2  0x000055e2cce0371a in ngx_process_get_status () at src/os/unix/ngx_process.c:596
#3  0x000055e2cce03378 in ngx_signal_handler (signo=17, siginfo=0x7f43389fbcf0, ucontext=0x7f43389fbbc0)
    at src/os/unix/ngx_process.c:493


f1 && dump shared_memory list

(gdb) p i
$1 = 45
(gdb) p shm_zone[44]
$2 = {data = 0x7f433706fee0, shm = {addr = 0x7f428adb0000 "", size = 1048576, name = {len = 14, 
      data = 0x7f433706fe98 "cc_rule_shared"}, log = 0x7f43374d8080, exists = 0}, 
  init = 0x55e2ccf5b7a4 <ngx_http_lua_shared_memory_init>, tag = 0x55e2cd4bd500 <ngx_http_lua_module>, sync = 0x0, 
  noreuse = 0}
(gdb) p shm_zone[45]
$3 = {data = 0x7f4338123f78, shm = {addr = 0x0, size = 32768, name = {len = 11, data = 0x7f43378d1c41 "tlog_server"}, 
    log = 0x7f425f70c080, exists = 0}, init = 0x55e2ccfd144c <ngx_http_tlog_check_shm_zone_init>, 
  tag = 0x55e2cd4c2640 <ngx_http_tlog_upstream_module>, sync = 0x0, noreuse = 0}

上述的log对象理论上是同一个地址 (ngx_cycle->log)。 排除了内存越界等问题后（asan方式检查），通过在代码中埋点，发现在 child信号中获取到的ngx_cycle是一个正在初始化中的cycle。经过进一步排查，发现ngx_cycle的修改发生在  ngx_http_lua_shared_memory_init 中，我们通过对此处代码添加flag, 记录上述的core是否发生在 ngx_cycle修改前后，进一步确定了此问题。

ngx_http_lua_shared_memory_init

    if (lmcf->shm_zones_inited == lmcf->shm_zones->nelts
        && lmcf->init_handler && !ngx_test_config)
    {
        saved_cycle = ngx_cycle;

        in_update_cycle = 0;  // 我们添加了这几行代码，确定了core发生时，in_update_cycle为1
        ngx_cycle = ctx->cycle;
        in_update_cycle = 1;
        rc = lmcf->init_handler(ctx->log, lmcf, lmcf->lua);  // 如果这个地方耗时较长，很很容复现。我们的lua code文件数量比较多。
        ngx_cycle = saved_cycle;
        in_update_cycle = 0;

        if (rc != NGX_OK) {
            /* an error happened */
            return NGX_ERROR;
        }
    }


根本原因:
nginx收到reconfigure信号后，只是记录了一个reconfigure标志。在for循环中处理reload操作。 而child信号的处理会中断正在执行的流程，并访问ngx_cycle。 如果上述ngx_cycle被修改为一个正在初始化中的cycle, 就导致了问题出现。



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nginx reload出现了core, 原因是 lua module初始化流程中有修改 ngx_cycle导致。 #2421

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nginx reload出现了core, 原因是 lua module初始化流程中有修改 ngx_cycle导致。 #2421

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions