-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: Add error ratio-based circuit breaking policy to api-breaker plugin #12765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add error ratio-based circuit breaking policy to api-breaker plugin #12765
Conversation
…ugin - Add new 'unhealthy-ratio' policy that triggers circuit breaker based on error rate within sliding time window - Implement three-state circuit breaker: CLOSED -> OPEN -> HALF_OPEN -> CLOSED - Add configurable parameters: error_ratio, min_request_threshold, sliding_window_size, permitted_number_of_calls_in_half_open_state, success_ratio - Maintain full backward compatibility with existing 'unhealthy-count' policy as default - Add comprehensive test coverage for new functionality - Update documentation in both Chinese and English - Follow APISIX coding standards and testing conventions This enhancement provides more intelligent circuit breaking for microservices architectures by considering error rates rather than just consecutive failure counts.
Baoyuantop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! Based on the current configuration, we need to add some test cases:
-
After the sliding window time (sliding_window_size) expires, are the statistics (total number of requests, number of failures) correctly cleared?
-
Failure fallback in half-open state (Half-Open -> Open)
-
Sending more requests than permitted_number_of_calls_in_half_open_state in half-open state
t/plugin/api-breaker2.t
Outdated
| === TEST $((${1}+1)): hit route (return 200) | ||
| --- request | ||
| GET /api_breaker | ||
| --- response_body | ||
| hello world | ||
|
|
||
|
|
||
|
|
||
| === TEST $((${1}+1)): hit route and return 500 (first failure) | ||
| --- request | ||
| GET /api_breaker?code=500 | ||
| --- error_code: 500 | ||
| --- response_body | ||
| fault injection! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make multiple requests in a single case; you can refer to the tests in api-breaker.t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask if there is an official test image of apisix? It is very difficult to set up the environment for testing .t files locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HaoTien, can the test run in the environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered some issues when build development environment with Dev Containers on macOS:
Installing https://luarocks.org/penlight-1.14.0-3.src.rock
Missing dependencies for penlight 1.14.0-3:
luafilesystem (not installed)
penlight 1.14.0-3 depends on luafilesystem (not installed)
Installing https://luarocks.org/luafilesystem-1.8.0-1.src.rock
Error: LuaRocks 3.12.0 bug (please report at https://github.com/luarocks/luarocks/issues).
Arch.: linux-aarch64
/usr/local/share/lua/5.1/luarocks/fetch.lua:139: attempt to concatenate local 'name' (a nil value)
stack traceback:
/usr/local/share/lua/5.1/luarocks/fetch.lua:196: in function 'fetch_url'
/usr/local/share/lua/5.1/luarocks/fetch.lua:85: in function 'fetch_caching'
/usr/local/share/lua/5.1/luarocks/fetch.lua:243: in function 'fetch_url_at_temp_dir'
/usr/local/share/lua/5.1/luarocks/fetch.lua:347: in function 'fetch_and_unpack_rock'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:66: in function 'build_rock'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:125: in function 'do_build'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:171: in function 'installer'
/usr/local/share/lua/5.1/luarocks/deps.lua:237: in function 'fulfill_dependency'
/usr/local/share/lua/5.1/luarocks/deps.lua:332: in function 'process_dependencies'
/usr/local/share/lua/5.1/luarocks/build.lua:404: in function 'build_rockspec'
...
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:125: in function 'do_build'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:171: in function 'installer'
/usr/local/share/lua/5.1/luarocks/deps.lua:237: in function 'fulfill_dependency'
/usr/local/share/lua/5.1/luarocks/deps.lua:332: in function 'process_dependencies'
/usr/local/share/lua/5.1/luarocks/build.lua:404: in function 'do_build'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:171: in function </usr/local/share/lua/5.1/luarocks/cmd/build.lua:138>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/luarocks/cmd.lua:794: in function 'run_command'
/usr/local/bin/luarocks:38: in main chunk
[C]: at 0xb537f8c98d94
make: *** [Makefile:134: deps] Error 99
[64668 ms] postCreateCommand from devcontainer.json failed with exit code 2. Skipping any further user-provided commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HaoTien, has the problem been resolved? Everything works fine when I try it locally. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://apisix.apache.org/docs/apisix/build-apisix-dev-environment-on-mac/ . I changed this method to set up the mac development environment. Whether I used the master or the release3.14 branch, an error was reported.
docker build -t apisix-dev-env -f example/build-dev-image.dockerfile .
Step 4/8 : RUN cpanm --notest Test::Nginx
...
--> Working on Net::HTTP
Fetching http://www.cpan.org/authors/id/O/OA/OALDERS/Net-HTTP-6.24.tar.gz ... OK
Configuring Net-HTTP-6.24 ... OK
Building Net-HTTP-6.24 ... OK
Successfully installed Net-HTTP-6.24
! Installing the dependencies failed: Installed version (6.22) of HTTP::Message is not in range '7.01', Installed version (6.22) of HTTP::Response is not in range '7.01', Installed version (6.22) of HTTP::Request is not in range '7.01'
! Bailing out the installation for libwww-perl-6.81.
--> Working on Test::Base
Fetching http://www.cpan.org/authors/id/I/IN/INGY/Test-Base-0.89.tar.gz ... OK
Configuring Test-Base-0.89 ... OK
==> Found dependencies: Spiffy
--> Working on Spiffy
Fetching http://www.cpan.org/authors/id/I/IN/INGY/Spiffy-0.46.tar.gz ... OK
Configuring Spiffy-0.46 ... OK
Building Spiffy-0.46 ... OK
Successfully installed Spiffy-0.46
Building Test-Base-0.89 ... OK
Successfully installed Test-Base-0.89
! Installing the dependencies failed: Module 'LWP::UserAgent' is not installed
! Bailing out the installation for Test-Nginx-0.30.
11 distributions installed
The command '/bin/sh -c cpanm --notest Test::Nginx' returned a non-zero code: 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff --git a/example/build-dev-image.dockerfile b/example/build-dev-image.dockerfile
index da0d827b..87526f3a 100644
--- a/example/build-dev-image.dockerfile
+++ b/example/build-dev-image.dockerfile
@@ -19,7 +19,7 @@ FROM ubuntu:20.04
# Install Test::Nginx
RUN apt update
-RUN apt install -y cpanminus make
+RUN apt install -y cpanminus make libwww-perl
RUN cpanm --notest Test::Nginx
# Install development utilsHi @HaoTien, I tried this locally and it worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HaoTien, following up on the previous review comments. Please let us know if you have any updates. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I updated the test cases and they all passed.
root@colima:/apisix# prove -Itest-nginx/lib -I./ t/plugin/api-breaker2.t
t/plugin/api-breaker2.t .. 1/? t/plugin/api-breaker2.t TEST 2: default configuration for unhealthy-ratio policy - timeout when waiting for the process 108706 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 2: default configuration for unhealthy-ratio policy - WARNING: killing the child process 108706 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 4/? t/plugin/api-breaker2.t TEST 3: bad error_ratio (too high) - timeout when waiting for the process 108713 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 3: bad error_ratio (too high) - WARNING: killing the child process 108713 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 7/? t/plugin/api-breaker2.t TEST 4: bad error_ratio (negative) - timeout when waiting for the process 108720 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 4: bad error_ratio (negative) - WARNING: killing the child process 108720 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 10/? t/plugin/api-breaker2.t TEST 5: bad min_request_threshold (zero) - timeout when waiting for the process 108727 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 5: bad min_request_threshold (zero) - WARNING: killing the child process 108727 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 13/? t/plugin/api-breaker2.t TEST 6: bad sliding_window_size (too small) - timeout when waiting for the process 108734 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 6: bad sliding_window_size (too small) - WARNING: killing the child process 108734 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 16/? t/plugin/api-breaker2.t TEST 7: bad sliding_window_size (too large) - timeout when waiting for the process 108741 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 7: bad sliding_window_size (too large) - WARNING: killing the child process 108741 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 19/? t/plugin/api-breaker2.t TEST 8: bad success_ratio (too high) - timeout when waiting for the process 108748 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 8: bad success_ratio (too high) - WARNING: killing the child process 108748 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 22/? t/plugin/api-breaker2.t TEST 9: bad half_open_max_calls (too large) - timeout when waiting for the process 108755 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 9: bad half_open_max_calls (too large) - WARNING: killing the child process 108755 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 25/? t/plugin/api-breaker2.t TEST 10: set route with unhealthy-ratio policy - timeout when waiting for the process 108762 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 10: set route with unhealthy-ratio policy - WARNING: killing the child process 108762 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 28/? t/plugin/api-breaker2.t TEST $((${1}+1)): test ratio-based circuit breaker functionality - timeout when waiting for the process 108769 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST $((${1}+1)): test ratio-based circuit breaker functionality - WARNING: killing the child process 108769 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 34/? t/plugin/api-breaker2.t TEST $((${1}+1)): wait for circuit breaker to enter half-open state - timeout when waiting for the process 108776 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST $((${1}+1)): wait for circuit breaker to enter half-open state - WARNING: killing the child process 108776 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 46/? t/plugin/api-breaker2.t TEST 16: test half-open state functionality - timeout when waiting for the process 108783 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 16: test half-open state functionality - WARNING: killing the child process 108783 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 49/? t/plugin/api-breaker2.t TEST 19: verify circuit breaker works with custom break_response_headers - timeout when waiting for the process 108790 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 19: verify circuit breaker works with custom break_response_headers - WARNING: killing the child process 108790 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 58/? t/plugin/api-breaker2.t TEST 20: trigger circuit breaker with custom headers (combined) - timeout when waiting for the process 108797 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 20: trigger circuit breaker with custom headers (combined) - WARNING: killing the child process 108797 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 63/? t/plugin/api-breaker2.t TEST 23: setup route for sliding window expiration test - timeout when waiting for the process 108804 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 23: setup route for sliding window expiration test - WARNING: killing the child process 108804 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 67/? t/plugin/api-breaker2.t TEST 24: test sliding window statistics reset after expiration - timeout when waiting for the process 108811 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 24: test sliding window statistics reset after expiration - WARNING: killing the child process 108811 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 70/? t/plugin/api-breaker2.t TEST 25: setup route for half-open failure fallback test - timeout when waiting for the process 108818 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 25: setup route for half-open failure fallback test - WARNING: killing the child process 108818 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 73/? t/plugin/api-breaker2.t TEST 26: test half-open state failure fallback to open state - timeout when waiting for the process 108825 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 26: test half-open state failure fallback to open state - WARNING: killing the child process 108825 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 76/? t/plugin/api-breaker2.t TEST 27: setup route for half-open request limit test - timeout when waiting for the process 108832 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 27: setup route for half-open request limit test - WARNING: killing the child process 108832 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 79/? t/plugin/api-breaker2.t TEST 28: test half-open state request limit enforcement and header check - timeout when waiting for the process 108839 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 28: test half-open state request limit enforcement and header check - WARNING: killing the child process 108839 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 82/? END - timeout when waiting for the process 108846 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
END - WARNING: killing the child process 108846 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. ok
All tests successful.
Files=1, Tests=84, 203 wallclock secs ( 0.04 usr 0.01 sys + 1.36 cusr 0.98 csys = 2.39 CPU)
Result: PASS
…open_state; insert some test cases
|
Hi @HaoTien, please fix the lint error |
feat: Add error ratio-based circuit breaking policy to api-breaker plugin
What this PR does / why we need it
This PR implements error ratio-based circuit breaking (
unhealthy-ratiopolicy) for theapi-breakerplugin, providing more intelligent and adaptive circuit breaking behavior based on error rates within a sliding time window, rather than just consecutive failure counts.Closes #12763
Types of changes
Description
Current Limitations
New Features Added
unhealthy-ratiopolicy that triggers circuit breaker based on error rate within a sliding time windowNew Configuration Parameters
policy"unhealthy-count"unhealthy.error_ratio0.5unhealthy.min_request_threshold10unhealthy.sliding_window_size300unhealthy.permitted_number_of_calls_in_half_open_state3healthy.success_ratio0.6Example Configuration
{ "plugins": { "api-breaker": { "break_response_code": 503, "policy": "unhealthy-ratio", "max_breaker_sec": 60, "unhealthy": { "http_statuses": [500, 502, 503, 504], "error_ratio": 0.5, "min_request_threshold": 10, "sliding_window_size": 300, "permitted_number_of_calls_in_half_open_state": 3 }, "healthy": { "http_statuses": [200, 201, 202], "success_ratio": 0.6 } } } }How Has This Been Tested?
Test Results
Files Modified
apisix/plugins/api-breaker.lua- Core plugin logic with new ratio-based policyt/plugin/api-breaker2.t- New comprehensive test file for ratio-based circuit breakingdocs/en/latest/plugins/api-breaker.md- Updated English documentationdocs/zh/latest/plugins/api-breaker.md- Updated Chinese documentationChecklist
Additional Notes
This implementation:
The feature addresses real-world use cases for:
Ready for review and feedback!