Skip to content

关于ngx.thread.spawn+ngx.location.capture+ngx.exit引发的内存泄露以及coredump问题 #2394

Open
@yyqbuct

Description

@yyqbuct

问题版本号

ngx_lua-0.10.13
最新的代码,应该也存在问题

问题复现用例

local fetch = function(uri)
    return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/f1")-- 子请求f1(0.1s 返回结果)
local t2 = ngx.thread.spawn(fetch, "/f2")-- 子请求f2(0.2s 返回结果)
ngx.thread.wait(t1, t2)
ngx.say("example2")
ngx.exit(200)

出现异常时的堆栈(运行行为未定义,可能是segment fault,也可能是死循环)

lj_gc_step会进入死循环

  • gc_onestep 返回0
  • lim-=0一直大于0
  • g->gc.state不等于GCSPAUSE
    Image
    Image

分析

  • light thread t1 返回后,entry thread 调用了ngx.exit(),随后在ngx_http_lua_handle_exit函数中释放请求资源(由于r->main->count不等于0,所以请求资源是泄露了的),并解除了对light thread t2以及entry thread的引用(意味着会被GC回收)。
  • light thread t2 返回后,如果lua_state被回收,运行行为未定义

复现用例协程调度图

Image

如何修复

思路:防范于未然,调用ngx.exit时如果有未结束的capture,直接抛异常。
其实ngx.exit内部有检查是否能结束请求的逻辑

    if (ctx->no_abort
        && rc != NGX_ERROR
        && rc != NGX_HTTP_CLOSE
        && rc != NGX_HTTP_REQUEST_TIME_OUT
        && rc != NGX_HTTP_CLIENT_CLOSED_REQUEST)
    {
        return luaL_error(L, "attempt to abort with pending subrequests");
    }

这个检查在只有一个capture子请求时是生效的。如果如用例所示有多个capture时,就会失效。
解决办法是将no_abort的语义由bool类型改为对capture的计数,创建时+1,结束时-1,具体代码如下所示:

diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
index 01ef2be..f8bfb2e 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
@@ -500,6 +500,9 @@ typedef struct ngx_http_lua_ctx_s {
 
     int                      uthreads; /* number of active user threads */
 
+    int                     no_aborts; /* prohibit "world abortion" via ngx.exit()
+                                          and etc */
+
     uint16_t                 context;   /* the current running directive context
                                            (or running phase) for the current
                                            Lua chunk */
@@ -538,9 +541,6 @@ typedef struct ngx_http_lua_ctx_s {
 
     unsigned         buffering:1; /* HTTP 1.0 response body buffering flag */
 
-    unsigned         no_abort:1; /* prohibit "world abortion" via ngx.exit()
-                                    and etc */
-
     unsigned         header_sent:1; /* r->header_sent is not sufficient for
                                      * this because special header filters
                                      * like ngx_image_filter may intercept
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
index 6ac2cbf..03883e1 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
@@ -354,7 +354,7 @@ ngx_http_lua_ngx_exit(lua_State *L)
 #endif
     }
 
-    if (ctx->no_abort
+    if (ctx->no_aborts
         && rc != NGX_ERROR
         && rc != NGX_HTTP_CLOSE
         && rc != NGX_HTTP_REQUEST_TIME_OUT
@@ -508,7 +508,7 @@ ngx_http_lua_ffi_exit(ngx_http_request_t *r, int status, u_char *err,
 #endif

     }

 
-    if (ctx->no_abort
+    if (ctx->no_aborts
         && status != NGX_ERROR
         && status != NGX_HTTP_CLOSE
         && status != NGX_HTTP_REQUEST_TIME_OUT
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
index 826a43c..f7d8de0 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
@@ -616,7 +616,7 @@ ngx_http_lua_ngx_location_capture_multi(lua_State *L)
         ngx_array_destroy(extra_vars);
     }
 
-    ctx->no_abort = 1;
+    ctx->no_aborts++;
 
     return lua_yield(L, 0);
 }
@@ -987,7 +987,7 @@ ngx_http_lua_post_subrequest(ngx_http_request_t *r, void *data, ngx_int_t rc)
     if (pr_coctx->pending_subreqs == 0) {
         dd("all subrequests are done");
 
-        pr_ctx->no_abort = 0;
+        pr_ctx->no_aborts--;
         pr_ctx->resume_handler = ngx_http_lua_subrequest_resume;
         pr_ctx->cur_co_ctx = pr_coctx;
     }
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
index f7a537e..8d8d0f8 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
@@ -1411,8 +1411,8 @@ user_co_done:
 
                 dd("headers sent? %d", r->header_sent || ctx->header_sent);
 
-                if (ctx->no_abort) {
-                    ctx->no_abort = 0;
+                if (ctx->no_aborts) {
+                    ctx->no_aborts--;
                     return NGX_ERROR;
                 }
 
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
index 7dcc6f7..c5f2d20 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
@@ -245,7 +245,7 @@ void ngx_http_lua_cleanup_free(ngx_http_request_t *r,

 
 #define ngx_http_lua_check_if_abortable(L, ctx)                              \
-    if ((ctx)->no_abort) {                                                   \
+    if ((ctx)->no_aborts) {                                                   \
         return luaL_error(L, "attempt to abort with pending subrequests");   \
     }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions