Skip to content

8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 #25689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mz1999
Copy link

@mz1999 mz1999 commented Jun 9, 2025

This pull request addresses an issue in runtime/StackGuardPages/TestStackGuardPagesNative where the native test component (exeinvoke.c) exhibited platform-dependent behavior and did not fully align with the intended test objectives for verifying stack guard page removal on thread detachment.

Summary of the Problem:

The test_native_overflow_initial scenario within TestStackGuardPagesNative showed inconsistent results:

  • On certain Linux distributions (e.g., CentOS 7), the test would hang and eventually time out during its second phase of stack allocation.
  • On other distributions (e.g., Ubuntu 24), the test would pass, but this pass was found to be coincidental, relying on an unintended SEGV_MAPERR to terminate a loop that should have had a defined exit condition.

The core issue was that the native code's second stack overflow attempt, designed to check for guard page removal, used an unbounded loop. Its termination (and thus the test's outcome) depended on platform-specific OS behavior regarding extensive stack allocation after guard pages are supposedly modified.

Test Objective Analysis:

The primary goal of TestStackGuardPagesNative, particularly for the initial thread (test_native_overflow_initial), is to:

  1. Verify Guard Page Presence: Confirm that when a native thread is attached to the JVM, a deliberate stack overflow triggers a SIGSEGV with si_code = SEGV_ACCERR, indicating an active stack guard page.
  2. Verify Guard Page Removal/Modification: After the thread detaches from the JVM via DetachCurrentThread(), confirm that the previously active stack guard page is no longer enforcing the same strict protection. This is ideally demonstrated by successfully allocating stack space up to the depth that previously caused the SEGV_ACCERR, without encountering any signal.

How the Original Implementation Deviated from the Test Intent:

The native do_overflow function, when invoked for the second phase (to check guard page removal), implemented an unconditional for(;;) loop.

  • Intended Logic vs. Actual Behavior: The test intended for this second phase to demonstrate that allocations up to the prior failure depth are now "clean" (no SEGV_ACCERR). However, the unbounded loop meant:
    • On systems like CentOS 7, where deep stack allocation without an immediate SEGV_MAPERR was possible, this loop ran for an excessive duration, leading to a hang.
    • On systems like Ubuntu 24, the loop was terminated by a SEGV_MAPERR. The existing result-checking logic in run_native_overflow incorrectly treated this SEGV_MAPERR as a "pass" condition for guard page removal, which is misleading. The true indication of successful guard page removal is the absence of any signal during the controlled second allocation phase.

This reliance on incidental, platform-dependent signal behavior to terminate the loop (and the misinterpretation of SEGV_MAPERR as a success) meant the test was not robustly verifying its core objective.

Proposed Solution:

This PR modifies the do_overflow function in exeinvoke.c to accurately reflect the test intent for both phases of stack overflow checking, ensuring deterministic behavior:

The do_overflow function is updated to use a single while loop whose condition correctly handles both the initial overflow check and the subsequent verification of guard page removal:

void do_overflow(){
  volatile int *p = NULL;
  // The loop condition is true if:
  // 1. It's the first run (_kp_rec_count == 0), causing the loop to run until SIGSEGV.
  // 2. It's the second run (_kp_rec_count > 0) AND the current allocation depth (_rec_count)
  //    is less than the depth that previously caused SIGSEGV (_kp_rec_count).
  while (_kp_rec_count == 0 || _rec_count < _kp_rec_count) {
    _rec_count++;
    p = (int*)alloca(128);
    _peek_value = p[0]; // Ensure memory is touched to trigger guard page if active
  }
  // - On the first run, this point is never reached due to longjmp from the signal handler.
  // - On the second run, if no SIGSEGV occurs, the loop completes when _rec_count == _kp_rec_count.
}

Impact and Clarity of Test Outcome with This Change:

  • Prevents Hangs: The primary impact of this change is the elimination of the potential for an infinite (or excessively long) loop during the second phase of do_overflow. The loop now has a defined boundary based on _kp_rec_count. This directly resolves the hang observed on platforms like CentOS 7.
  • Deterministic Execution of do_overflow: The do_overflow function will now consistently:
    • In the first run: Execute until a SIGSEGV occurs and longjmp is called.
    • In the second run: Execute allocations up to _kp_rec_count times. If no SIGSEGV occurs, it will complete this bounded loop and return normally. If a SIGSEGV does occur within these _kp_rec_count allocations, longjmp will be called.

This refinement ensures TestStackGuardPagesNative more accurately and reliably verifies the stack guard page lifecycle for native threads attached to the JVM.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Error

 ⚠️ OCA signatory status must be verified

Issue

  • JDK-8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139 (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25689/head:pull/25689
$ git checkout pull/25689

Update a local copy of the PR:
$ git checkout pull/25689
$ git pull https://git.openjdk.org/jdk.git pull/25689/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25689

View PR using the GUI difftool:
$ git pr show -t 25689

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25689.diff

… with exit code 139

replace infinite loop with conditional in do_overflow function
@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Jun 9, 2025
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 9, 2025

Hi @mz1999, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user mz1999" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented Jun 9, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jun 9, 2025

@mz1999 The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime [email protected] oca Needs verification of OCA signatory status
Development

Successfully merging this pull request may close these issues.

1 participant