Skip to content

Conversation

@HaoTien
Copy link

@HaoTien HaoTien commented Nov 21, 2025

feat: Add error ratio-based circuit breaking policy to api-breaker plugin

What this PR does / why we need it

This PR implements error ratio-based circuit breaking (unhealthy-ratio policy) for the api-breaker plugin, providing more intelligent and adaptive circuit breaking behavior based on error rates within a sliding time window, rather than just consecutive failure counts.

Closes #12763

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Description

Current Limitations

  • The existing failure count-based approach only considers consecutive failures
  • It doesn't account for the overall error rate in relation to total requests
  • May be too sensitive during low traffic periods or not sensitive enough during high traffic periods

New Features Added

  • Error ratio-based circuit breaking: New unhealthy-ratio policy that triggers circuit breaker based on error rate within a sliding time window
  • Configurable parameters: Support for error ratio threshold, minimum request threshold, sliding window size, etc.
  • Circuit breaker states: Proper implementation of CLOSED, OPEN, and HALF_OPEN states
  • Backward compatibility: Existing configurations continue to work without changes

New Configuration Parameters

Parameter Type Default Description
policy string "unhealthy-count" Circuit breaker policy
unhealthy.error_ratio number 0.5 Error rate threshold (0-1) to trigger circuit breaker
unhealthy.min_request_threshold integer 10 Minimum requests needed before evaluating error rate
unhealthy.sliding_window_size integer 300 Sliding window size in seconds for error rate calculation
unhealthy.permitted_number_of_calls_in_half_open_state integer 3 Number of permitted calls in half-open state
healthy.success_ratio number 0.6 Success rate threshold to close circuit breaker from half-open state

Example Configuration

{
  "plugins": {
    "api-breaker": {
      "break_response_code": 503,
      "policy": "unhealthy-ratio",
      "max_breaker_sec": 60,
      "unhealthy": {
        "http_statuses": [500, 502, 503, 504],
        "error_ratio": 0.5,
        "min_request_threshold": 10,
        "sliding_window_size": 300,
        "permitted_number_of_calls_in_half_open_state": 3
      },
      "healthy": {
        "http_statuses": [200, 201, 202],
        "success_ratio": 0.6
      }
    }
  }
}

How Has This Been Tested?

  • Schema validation tests for new parameters
  • Functional tests for error ratio calculation
  • Circuit breaker state transition tests
  • Integration tests with various traffic patterns
  • Backward compatibility tests
  • Performance tests to ensure no regression

Test Results

# Run the new test file
prove -I. -r t/plugin/api-breaker2.t

# Verify existing tests still pass
prove -I. -r t/plugin/api-breaker.t

Files Modified

  • apisix/plugins/api-breaker.lua - Core plugin logic with new ratio-based policy
  • t/plugin/api-breaker2.t - New comprehensive test file for ratio-based circuit breaking
  • docs/en/latest/plugins/api-breaker.md - Updated English documentation
  • docs/zh/latest/plugins/api-breaker.md - Updated Chinese documentation

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the CONTRIBUTING document
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have squashed my commits into logical units
  • My commit messages are in the proper format

Additional Notes

This implementation:

  • Maintains full backward compatibility - existing configurations work unchanged
  • Follows APISIX patterns - consistent with existing plugin architecture
  • Comprehensive testing - covers all scenarios and edge cases
  • Performance optimized - efficient sliding window implementation
  • Well documented - updated both English and Chinese docs

The feature addresses real-world use cases for:

  • High-traffic services with better error spike handling
  • Variable traffic patterns with adaptive behavior
  • Microservices architectures requiring precise circuit breaking
  • SLA-based circuit breaking with configurable error rates

Ready for review and feedback!

…ugin

- Add new 'unhealthy-ratio' policy that triggers circuit breaker based on error rate within sliding time window
- Implement three-state circuit breaker: CLOSED -> OPEN -> HALF_OPEN -> CLOSED
- Add configurable parameters: error_ratio, min_request_threshold, sliding_window_size, permitted_number_of_calls_in_half_open_state, success_ratio
- Maintain full backward compatibility with existing 'unhealthy-count' policy as default
- Add comprehensive test coverage for new functionality
- Update documentation in both Chinese and English
- Follow APISIX coding standards and testing conventions

This enhancement provides more intelligent circuit breaking for microservices architectures by considering error rates rather than just consecutive failure counts.
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. doc Documentation things enhancement New feature or request labels Nov 21, 2025
Copy link
Contributor

@Baoyuantop Baoyuantop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! Based on the current configuration, we need to add some test cases:

  1. After the sliding window time (sliding_window_size) expires, are the statistics (total number of requests, number of failures) correctly cleared?

  2. Failure fallback in half-open state (Half-Open -> Open)

  3. Sending more requests than permitted_number_of_calls_in_half_open_state in half-open state

Comment on lines 446 to 459
=== TEST $((${1}+1)): hit route (return 200)
--- request
GET /api_breaker
--- response_body
hello world



=== TEST $((${1}+1)): hit route and return 500 (first failure)
--- request
GET /api_breaker?code=500
--- error_code: 500
--- response_body
fault injection!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make multiple requests in a single case; you can refer to the tests in api-breaker.t

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask if there is an official test image of apisix? It is very difficult to set up the environment for testing .t files locally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HaoTien, can the test run in the environment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered some issues when build development environment with Dev Containers on macOS:

Installing https://luarocks.org/penlight-1.14.0-3.src.rock

Missing dependencies for penlight 1.14.0-3:
luafilesystem (not installed)

penlight 1.14.0-3 depends on luafilesystem (not installed)
Installing https://luarocks.org/luafilesystem-1.8.0-1.src.rock

Error: LuaRocks 3.12.0 bug (please report at https://github.com/luarocks/luarocks/issues).
Arch.: linux-aarch64
/usr/local/share/lua/5.1/luarocks/fetch.lua:139: attempt to concatenate local 'name' (a nil value)
stack traceback:
/usr/local/share/lua/5.1/luarocks/fetch.lua:196: in function 'fetch_url'
/usr/local/share/lua/5.1/luarocks/fetch.lua:85: in function 'fetch_caching'
/usr/local/share/lua/5.1/luarocks/fetch.lua:243: in function 'fetch_url_at_temp_dir'
/usr/local/share/lua/5.1/luarocks/fetch.lua:347: in function 'fetch_and_unpack_rock'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:66: in function 'build_rock'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:125: in function 'do_build'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:171: in function 'installer'
/usr/local/share/lua/5.1/luarocks/deps.lua:237: in function 'fulfill_dependency'
/usr/local/share/lua/5.1/luarocks/deps.lua:332: in function 'process_dependencies'
/usr/local/share/lua/5.1/luarocks/build.lua:404: in function 'build_rockspec'
...
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:125: in function 'do_build'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:171: in function 'installer'
/usr/local/share/lua/5.1/luarocks/deps.lua:237: in function 'fulfill_dependency'
/usr/local/share/lua/5.1/luarocks/deps.lua:332: in function 'process_dependencies'
/usr/local/share/lua/5.1/luarocks/build.lua:404: in function 'do_build'
/usr/local/share/lua/5.1/luarocks/cmd/build.lua:171: in function </usr/local/share/lua/5.1/luarocks/cmd/build.lua:138>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/luarocks/cmd.lua:794: in function 'run_command'
/usr/local/bin/luarocks:38: in main chunk
[C]: at 0xb537f8c98d94
make: *** [Makefile:134: deps] Error 99
[64668 ms] postCreateCommand from devcontainer.json failed with exit code 2. Skipping any further user-provided commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HaoTien, has the problem been resolved? Everything works fine when I try it locally. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://apisix.apache.org/docs/apisix/build-apisix-dev-environment-on-mac/ . I changed this method to set up the mac development environment. Whether I used the master or the release3.14 branch, an error was reported.

docker build -t apisix-dev-env -f example/build-dev-image.dockerfile .

Step 4/8 : RUN cpanm --notest Test::Nginx
...
--> Working on Net::HTTP
Fetching http://www.cpan.org/authors/id/O/OA/OALDERS/Net-HTTP-6.24.tar.gz ... OK
Configuring Net-HTTP-6.24 ... OK
Building Net-HTTP-6.24 ... OK
Successfully installed Net-HTTP-6.24
! Installing the dependencies failed: Installed version (6.22) of HTTP::Message is not in range '7.01', Installed version (6.22) of HTTP::Response is not in range '7.01', Installed version (6.22) of HTTP::Request is not in range '7.01'
! Bailing out the installation for libwww-perl-6.81.
--> Working on Test::Base
Fetching http://www.cpan.org/authors/id/I/IN/INGY/Test-Base-0.89.tar.gz ... OK
Configuring Test-Base-0.89 ... OK
==> Found dependencies: Spiffy
--> Working on Spiffy
Fetching http://www.cpan.org/authors/id/I/IN/INGY/Spiffy-0.46.tar.gz ... OK
Configuring Spiffy-0.46 ... OK
Building Spiffy-0.46 ... OK
Successfully installed Spiffy-0.46
Building Test-Base-0.89 ... OK
Successfully installed Test-Base-0.89
! Installing the dependencies failed: Module 'LWP::UserAgent' is not installed
! Bailing out the installation for Test-Nginx-0.30.
11 distributions installed
The command '/bin/sh -c cpanm --notest Test::Nginx' returned a non-zero code: 1

Copy link
Contributor

@Baoyuantop Baoyuantop Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/example/build-dev-image.dockerfile b/example/build-dev-image.dockerfile
index da0d827b..87526f3a 100644
--- a/example/build-dev-image.dockerfile
+++ b/example/build-dev-image.dockerfile
@@ -19,7 +19,7 @@ FROM ubuntu:20.04
 
 # Install Test::Nginx
 RUN apt update
-RUN apt install -y cpanminus make
+RUN apt install -y cpanminus make libwww-perl
 RUN cpanm --notest Test::Nginx
 
 # Install development utils

Hi @HaoTien, I tried this locally and it worked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HaoTien, following up on the previous review comments. Please let us know if you have any updates. Thank you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I updated the test cases and they all passed.

root@colima:/apisix# prove -Itest-nginx/lib -I./ t/plugin/api-breaker2.t
t/plugin/api-breaker2.t .. 1/? t/plugin/api-breaker2.t TEST 2: default configuration for unhealthy-ratio policy - timeout when waiting for the process 108706 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 2: default configuration for unhealthy-ratio policy - WARNING: killing the child process 108706 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 4/? t/plugin/api-breaker2.t TEST 3: bad error_ratio (too high) - timeout when waiting for the process 108713 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 3: bad error_ratio (too high) - WARNING: killing the child process 108713 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 7/? t/plugin/api-breaker2.t TEST 4: bad error_ratio (negative) - timeout when waiting for the process 108720 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 4: bad error_ratio (negative) - WARNING: killing the child process 108720 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 10/? t/plugin/api-breaker2.t TEST 5: bad min_request_threshold (zero) - timeout when waiting for the process 108727 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 5: bad min_request_threshold (zero) - WARNING: killing the child process 108727 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 13/? t/plugin/api-breaker2.t TEST 6: bad sliding_window_size (too small) - timeout when waiting for the process 108734 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 6: bad sliding_window_size (too small) - WARNING: killing the child process 108734 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 16/? t/plugin/api-breaker2.t TEST 7: bad sliding_window_size (too large) - timeout when waiting for the process 108741 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 7: bad sliding_window_size (too large) - WARNING: killing the child process 108741 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 19/? t/plugin/api-breaker2.t TEST 8: bad success_ratio (too high) - timeout when waiting for the process 108748 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 8: bad success_ratio (too high) - WARNING: killing the child process 108748 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 22/? t/plugin/api-breaker2.t TEST 9: bad half_open_max_calls (too large) - timeout when waiting for the process 108755 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 9: bad half_open_max_calls (too large) - WARNING: killing the child process 108755 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 25/? t/plugin/api-breaker2.t TEST 10: set route with unhealthy-ratio policy - timeout when waiting for the process 108762 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 10: set route with unhealthy-ratio policy - WARNING: killing the child process 108762 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 28/? t/plugin/api-breaker2.t TEST $((${1}+1)): test ratio-based circuit breaker functionality - timeout when waiting for the process 108769 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST $((${1}+1)): test ratio-based circuit breaker functionality - WARNING: killing the child process 108769 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 34/? t/plugin/api-breaker2.t TEST $((${1}+1)): wait for circuit breaker to enter half-open state - timeout when waiting for the process 108776 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST $((${1}+1)): wait for circuit breaker to enter half-open state - WARNING: killing the child process 108776 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 46/? t/plugin/api-breaker2.t TEST 16: test half-open state functionality - timeout when waiting for the process 108783 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 16: test half-open state functionality - WARNING: killing the child process 108783 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 49/? t/plugin/api-breaker2.t TEST 19: verify circuit breaker works with custom break_response_headers - timeout when waiting for the process 108790 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 19: verify circuit breaker works with custom break_response_headers - WARNING: killing the child process 108790 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 58/? t/plugin/api-breaker2.t TEST 20: trigger circuit breaker with custom headers (combined) - timeout when waiting for the process 108797 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 20: trigger circuit breaker with custom headers (combined) - WARNING: killing the child process 108797 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 63/? t/plugin/api-breaker2.t TEST 23: setup route for sliding window expiration test - timeout when waiting for the process 108804 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 23: setup route for sliding window expiration test - WARNING: killing the child process 108804 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 67/? t/plugin/api-breaker2.t TEST 24: test sliding window statistics reset after expiration - timeout when waiting for the process 108811 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 24: test sliding window statistics reset after expiration - WARNING: killing the child process 108811 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 70/? t/plugin/api-breaker2.t TEST 25: setup route for half-open failure fallback test - timeout when waiting for the process 108818 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 25: setup route for half-open failure fallback test - WARNING: killing the child process 108818 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 73/? t/plugin/api-breaker2.t TEST 26: test half-open state failure fallback to open state - timeout when waiting for the process 108825 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 26: test half-open state failure fallback to open state - WARNING: killing the child process 108825 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 76/? t/plugin/api-breaker2.t TEST 27: setup route for half-open request limit test - timeout when waiting for the process 108832 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 27: setup route for half-open request limit test - WARNING: killing the child process 108832 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 79/? t/plugin/api-breaker2.t TEST 28: test half-open state request limit enforcement and header check - timeout when waiting for the process 108839 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
t/plugin/api-breaker2.t TEST 28: test half-open state request limit enforcement and header check - WARNING: killing the child process 108839 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. 82/? END - timeout when waiting for the process 108846 to exit at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 668.
END - WARNING: killing the child process 108846 with force... at /usr/local/share/perl/5.30.0/Test/Nginx/Util.pm line 707.
t/plugin/api-breaker2.t .. ok    
All tests successful.
Files=1, Tests=84, 203 wallclock secs ( 0.04 usr  0.01 sys +  1.36 cusr  0.98 csys =  2.39 CPU)
Result: PASS

@Baoyuantop Baoyuantop added the wait for update wait for the author's response in this issue/PR label Dec 24, 2025
@Baoyuantop
Copy link
Contributor

Hi @HaoTien, please fix the lint error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Documentation things enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files. wait for update wait for the author's response in this issue/PR

Projects

None yet

2 participants