8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 #25689

mz1999 · 2025-06-09T06:53:20Z

This pull request addresses an issue in runtime/StackGuardPages/TestStackGuardPagesNative where the native test component (exeinvoke.c) exhibited platform-dependent behavior and did not fully align with the intended test objectives for verifying stack guard page removal on thread detachment.

Summary of the Problem:

The test_native_overflow_initial scenario within TestStackGuardPagesNative showed inconsistent results:

On certain Linux distributions (e.g., CentOS 7), the test would hang and eventually time out during its second phase of stack allocation.
On other distributions (e.g., Ubuntu 24), the test would pass, but this pass was found to be coincidental, relying on an unintended SEGV_MAPERR to terminate a loop that should have had a defined exit condition.

The core issue was that the native code's second stack overflow attempt, designed to check for guard page removal, used an unbounded loop. Its termination (and thus the test's outcome) depended on platform-specific OS behavior regarding extensive stack allocation after guard pages are supposedly modified.

Test Objective Analysis:

The primary goal of TestStackGuardPagesNative, particularly for the initial thread (test_native_overflow_initial), is to:

Verify Guard Page Presence: Confirm that when a native thread is attached to the JVM, a deliberate stack overflow triggers a SIGSEGV with si_code = SEGV_ACCERR, indicating an active stack guard page.
Verify Guard Page Removal/Modification: After the thread detaches from the JVM via DetachCurrentThread(), confirm that the previously active stack guard page is no longer enforcing the same strict protection. This is ideally demonstrated by successfully allocating stack space up to the depth that previously caused the SEGV_ACCERR, without encountering any signal.

How the Original Implementation Deviated from the Test Intent:

The native do_overflow function, when invoked for the second phase (to check guard page removal), implemented an unconditional for(;;) loop.

Intended Logic vs. Actual Behavior: The test intended for this second phase to demonstrate that allocations up to the prior failure depth are now "clean" (no SEGV_ACCERR). However, the unbounded loop meant:
- On systems like CentOS 7, where deep stack allocation without an immediate SEGV_MAPERR was possible, this loop ran for an excessive duration, leading to a hang.
- On systems like Ubuntu 24, the loop was terminated by a SEGV_MAPERR. The existing result-checking logic in run_native_overflow incorrectly treated this SEGV_MAPERR as a "pass" condition for guard page removal, which is misleading. The true indication of successful guard page removal is the absence of any signal during the controlled second allocation phase.

This reliance on incidental, platform-dependent signal behavior to terminate the loop (and the misinterpretation of SEGV_MAPERR as a success) meant the test was not robustly verifying its core objective.

Proposed Solution:

This PR modifies the do_overflow function in exeinvoke.c to accurately reflect the test intent for both phases of stack overflow checking, ensuring deterministic behavior:

The do_overflow function is updated to use a single while loop whose condition correctly handles both the initial overflow check and the subsequent verification of guard page removal:

void do_overflow(){
  volatile int *p = NULL;
  // The loop condition is true if:
  // 1. It's the first run (_kp_rec_count == 0), causing the loop to run until SIGSEGV.
  // 2. It's the second run (_kp_rec_count > 0) AND the current allocation depth (_rec_count)
  //    is less than the depth that previously caused SIGSEGV (_kp_rec_count).
  while (_kp_rec_count == 0 || _rec_count < _kp_rec_count) {
    _rec_count++;
    p = (int*)alloca(128);
    _peek_value = p[0]; // Ensure memory is touched to trigger guard page if active
  }
  // - On the first run, this point is never reached due to longjmp from the signal handler.
  // - On the second run, if no SIGSEGV occurs, the loop completes when _rec_count == _kp_rec_count.
}

Impact and Clarity of Test Outcome with This Change:

Prevents Hangs: The primary impact of this change is the elimination of the potential for an infinite (or excessively long) loop during the second phase of do_overflow. The loop now has a defined boundary based on _kp_rec_count. This directly resolves the hang observed on platforms like CentOS 7.
Deterministic Execution of do_overflow: The do_overflow function will now consistently:
- In the first run: Execute until a SIGSEGV occurs and longjmp is called.
- In the second run: Execute allocations up to _kp_rec_count times. If no SIGSEGV occurs, it will complete this bounded loop and return normally. If a SIGSEGV does occur within these _kp_rec_count allocations, longjmp will be called.

This refinement ensures TestStackGuardPagesNative more accurately and reliably verifies the stack guard page lifecycle for native threads attached to the JVM.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Error

⚠️ OCA signatory status must be verified

Issue

JDK-8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25689/head:pull/25689
$ git checkout pull/25689

Update a local copy of the PR:
$ git checkout pull/25689
$ git pull https://git.openjdk.org/jdk.git pull/25689/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25689

View PR using the GUI difftool:
$ git pr show -t 25689

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25689.diff

… with exit code 139 replace infinite loop with conditional in do_overflow function

bridgekeeper · 2025-06-09T06:54:22Z

Hi @mz1999, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user mz1999" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

openjdk · 2025-06-09T06:54:42Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-06-09T06:55:21Z

@mz1999 The following label will be automatically applied to this pull request:

hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails…

6ff0428

… with exit code 139 replace infinite loop with conditional in do_overflow function

bridgekeeper bot added the oca Needs verification of OCA signatory status label Jun 9, 2025

openjdk bot added the hotspot-runtime [email protected] label Jun 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 #25689

8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 #25689

mz1999 commented Jun 9, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Jun 9, 2025

Uh oh!

openjdk bot commented Jun 9, 2025

Uh oh!

openjdk bot commented Jun 9, 2025

Uh oh!

Uh oh!

8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 #25689

Are you sure you want to change the base?

8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 #25689

Conversation

mz1999 commented Jun 9, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Error

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Jun 9, 2025

Uh oh!

openjdk bot commented Jun 9, 2025

Uh oh!

openjdk bot commented Jun 9, 2025

Uh oh!

Uh oh!

mz1999 commented Jun 9, 2025 •

edited by openjdk bot

Loading