Skip to content

env_run: isolate adapter subprocesses from arc_env activation vars#883

Open
calvinp0 wants to merge 2 commits intomainfrom
env-run-isolation
Open

env_run: isolate adapter subprocesses from arc_env activation vars#883
calvinp0 wants to merge 2 commits intomainfrom
env-run-isolation

Conversation

@calvinp0
Copy link
Copy Markdown
Member

@calvinp0 calvinp0 commented Apr 30, 2026

Summary

ARC runs inside arc_env, but the AutoTST, GCN, and TorchANI adapters shell out to scripts that live in their own envs (tst_env, ts_gcn_env, tani_env). The previous code path was:

subprocess.run("source ~/.bashrc; <target_env_python> <script>", shell=True)

This invokes the target env's interpreter directly without deactivating arc_env first. ARC's exported activation vars (BABEL_LIBDIR, LD_LIBRARY_PATH, CONDA_PREFIX, ...) stay bound to arc_env's paths in the child, so shared libraries in the target env then resolve plugins against the wrong tree — producing ABI-mismatch crashes (most visibly OpenBabel plugin loading in tst_env).

What this PR does

  • Adds arc/job/env_run.py exposing run_in_conda_env(python, script, *args), which routes the subprocess through <launcher> run -n <env> so the target env's activate.d hooks fire and rebind the leaked vars to the correct paths.
  • Detects an available launcher in preference order: the active one per CONDA_EXE/MAMBA_EXE, then condamambamicromamba on PATH. Selects the right stdio flag per launcher (--no-capture-output for conda/mamba, omitted for micromamba which rejects it).
  • Wires the three callers (arc/job/adapters/torch_ani.py, arc/job/adapters/ts/autotst_ts.py, arc/job/adapters/ts/gcn_ts.py) over to the helper and drops the now-unused subprocess imports.
  • Adds from __future__ import annotations to autotst_script.py and tani_script.py so their PEP 604 union syntax (X | Y) imports cleanly under whatever Python version the target env ships (tst_env is 3.9; tani_env varies).

Test plan

  • AutoTST TS-search run completes without OpenBabel plugin errors
  • GCN TS-search run completes and produces guess xyz
  • TorchANI single-point job completes and writes output yml
  • Verified on a host where conda is the active launcher; verify also under mamba / micromamba if available

Copilot AI review requested due to automatic review settings April 30, 2026 08:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a centralized helper for running adapter scripts inside their own conda-family environments to prevent arc_env activation variables (e.g., BABEL_LIBDIR, LD_LIBRARY_PATH, CONDA_PREFIX) from leaking into subprocesses and causing ABI/plugin-loading issues.

Changes:

  • Add arc/job/env_run.py with run_in_conda_env() that launches scripts via conda/mamba/micromamba run.
  • Update AutoTST, GCN, and TorchANI adapters to use run_in_conda_env() instead of subprocess.run(..., shell=True).
  • Add from __future__ import annotations to the AutoTST/TorchANI adapter scripts.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
arc/job/env_run.py New helper for launcher detection and cross-env script execution via *mamba/conda run.
arc/job/adapters/ts/gcn_ts.py Switch GCN subprocess invocation to run_in_conda_env().
arc/job/adapters/ts/autotst_ts.py Switch AutoTST subprocess invocation to run_in_conda_env().
arc/job/adapters/torch_ani.py Switch TorchANI subprocess invocation to run_in_conda_env().
arc/job/adapters/scripts/tani_script.py Add future import intended to support postponed annotations.
arc/job/adapters/scripts/autotst_script.py Add future import intended to support postponed annotations.
Comments suppressed due to low confidence (2)

arc/job/adapters/scripts/autotst_script.py:47

  • from __future__ import annotations does not make PEP 604 union syntax (str | None) parse on Python 3.9 (it still raises SyntaxError at import time). If tst_env is 3.9, this script needs to replace X | Y annotations with typing.Optional/typing.Union (or require Python >= 3.10 in that env).
# tst_env is Python 3.9; this script uses PEP 604 union syntax
# (``str | None``) for parity with the rest of ARC. The future-import
# defers annotation evaluation so 3.9 doesn't choke at def-time.
from __future__ import annotations

import argparse
import numpy as np
import os
import yaml

from autotst.reaction import Reaction


def parse_command_line_arguments(command_line_args=None):
    """
    Parse command-line arguments.
    This uses the :mod:`argparse` module, which ensures that the command-line arguments are
    sensible, parses them, and returns them.
    """
    parser = argparse.ArgumentParser(description='AutoTST')
    parser.add_argument('reaction_label', metavar='Reaction', type=str, nargs=1,
                        help='a string representation of a reaction, e.g., CCCC+[O]O_[CH2]CCC+OO')
    parser.add_argument('output_path', metavar='The output path', type=str, nargs=1,
                        help='a string representation of the output path to store the results')
    parser.add_argument('-t', '--mkl_num_threads', metavar='The number of threads path', type=str, nargs=1,
                        help='a string representation of the number of threads to use', default='1')

    args = parser.parse_args(command_line_args)

    # Process args to set correct default values and format
    args.reaction_label = args.reaction_label[0]
    args.output_path = args.output_path[0]

    return args


def main(reaction_label: str | None = None,
         output_path: str | None = None,
         ) -> None:

arc/job/adapters/scripts/tani_script.py:88

  • Same issue here: this file contains PEP 604 union annotations like int | None, which will not import on Python < 3.10 even with from __future__ import annotations. Either update the target env requirement to Python >= 3.10 or change these annotations to typing.Optional/typing.Union so tani_env can run under older Pythons.
# tani_env's Python may predate PEP 604 (``X | Y`` unions); this
# future-import defers annotation evaluation so the script imports on
# any Python ≥3.7 regardless of the env's interpreter version.
from __future__ import annotations

import argparse
import os
import yaml

from ase import Atoms
from ase.constraints import FixInternals
from ase.optimize import BFGS
from ase.optimize.sciopt import Converged, OptimizerConvergenceError, SciPyFminBFGS, SciPyFminCG
import torch
import torchani
from torchani.ase import Calculator

elements = {'H': 1, 'He': 2, 'Li': 3, 'Be': 4, 'B': 5, 'C': 6, 'N': 7, 'O': 8,
 'F': 9, 'Ne': 10, 'Na': 11, 'Mg': 12, 'Al': 13, 'Si': 14, 'P': 15, 'S': 16, 'Cl': 17,
 'Ar': 18, 'K': 19, 'Ca': 20, 'Sc': 21, 'Ti': 22,'V': 23,'Cr': 24, 'Mn': 25, 'Fe': 26,
 'Co': 27, 'Ni': 28, 'Cu': 29, 'Zn': 30, 'Ga': 31, 'Ge': 32, 'As': 33, 'Se': 34, 'Br': 35,
 'Kr': 36, 'Rb': 37, 'Sr': 38, 'Y': 39, 'Zr': 40, 'Nb': 41, 'Mo': 42, 'Tc': 43, 'Ru': 44,
 'Rh': 45, 'Pd': 46, 'Ag': 47, 'Cd': 48, 'In': 49, 'Sn': 50, 'Sb': 51, 'Te': 52, 'I': 53,
 'Xe': 54, 'Cs': 55, 'Ba': 56, 'La': 57, 'Ce': 58, 'Pr': 59, 'Nd': 60, 'Pm': 61, 'Sm': 62,
 'Eu': 63, 'Gd': 64, 'Tb': 65, 'Dy': 66, 'Ho': 67, 'Er': 68, 'Tm': 69, 'Yb': 70, 'Lu': 71,
 'Hf': 72, 'Ta': 73, 'W': 74, 'Re': 75, 'Os': 76, 'Ir': 77, 'Pt': 78, 'Au': 79, 'Hg': 80, 
 'Tl': 81, 'Pb': 82, 'Bi': 83, 'Po': 84,'At': 85, 'Rn': 86, 'Fr': 87, 'Ra': 88, 'Ac': 89,
 'Th': 90, 'Pa': 91, 'U': 92, 'Np': 93, 'Pu': 94, 'Am': 95, 'Cm': 96, 'Bk': 97, 'Cf': 98,
 'Es': 99,'Fm': 100, 'Md': 101, 'No': 102,'Lr': 103, 'Rf': 104, 'Db': 105, 'Sg': 106,
 'Bh': 107, 'Hs': 108, 'Mt': 109, 'Ds': 110, 'Rg': 111, 'Cn': 112, 'Nh': 113, 'Fl': 114,
 'Mc': 115, 'Lv': 116, 'Ts': 117,'Og': 118}


def run_sp(xyz, device, model):
    """
    Run a single-point energy calculation.
    """
    coords, z_list = xyz_to_coords_and_element_numbers(xyz)
    coordinates = torch.tensor([coords], requires_grad=True, device=device)
    species = torch.tensor([z_list], device=device)
    energy = model((species, coordinates)).energies
    sp = hartree_to_si(energy.item(), kilo=True)
    return sp


def to_Z(element: str) -> int:
    """
    Return the element number of an element.
    
    Args:
        element (str): the element symbol. For example: "H", "Br"
    
    Returns:
        The element number. For the example: 1, 35
    """
    return elements.get(element.capitalize(), None)


def run_force(xyz, device, model):
    """
    Compute the force matrix.
    """
    coords, z_list = xyz_to_coords_and_element_numbers(xyz)
    coordinates = torch.tensor([coords], requires_grad=True, device=device)
    species = torch.tensor([z_list], device=device)
    energy = model((species, coordinates)).energies
    derivative = torch.autograd.grad(energy.sum(), coordinates)[0]
    force = -derivative
    force = force.squeeze().numpy().tolist()
    return force


def run_opt(xyz,
            model,
            constraints: dict = None,
            fmax: float = 0.001,
            steps: int | None = None,
            engine: str = 'SciPyFminBFGS',
            ):
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread arc/job/env_run.py Outdated
Comment on lines +50 to +58
parts = Path(python_executable).resolve().parts
if "envs" in parts:
idx = parts.index("envs")
if idx + 1 < len(parts):
return parts[idx + 1]
raise ValueError(
f"Cannot derive an env name from {python_executable!r}; "
"expected a path of the form '.../envs/<env>/bin/python'."
)
Comment thread arc/job/env_run.py Outdated

ARC runs inside ``arc_env``. Several adapters (AutoTST, GCN, TorchANI)
shell out to scripts that live in their *own* envs (``tst_env``,
``ts_gcn_env``, ``tani_env``). Running the target env's ``python``
Comment thread arc/job/adapters/ts/gcn_ts.py Outdated
f'--ts_xyz_path {ts_path}']
command = '; '.join(commands)
output = subprocess.run(command, shell=True, executable='/bin/bash')
# Routed via `conda run -n ts_gcn_env` so arc_env's activation vars don't
Comment thread arc/job/env_run.py Outdated
Comment on lines +96 to +118
def run_in_conda_env(
python_executable: str,
script_path: str,
*script_args: str,
check: bool = False,
) -> subprocess.CompletedProcess:
"""Run ``python script_path *script_args`` inside the env that owns
``python_executable``, isolated from ARC's process env.

Returns the :class:`subprocess.CompletedProcess`. ``check=True``
raises on non-zero exit. Args are passed as a list, so no shell
quoting concerns.
"""
env_name = env_name_from_python(python_executable)
launcher, extra_flags = _detect_launcher()
argv = [
launcher, "run", *extra_flags,
"-n", env_name,
"python", script_path,
*script_args,
]
logger.debug("env-run: %s", " ".join(argv))
return subprocess.run(argv, check=check)
@calvinp0 calvinp0 force-pushed the env-run-isolation branch 2 times, most recently from 6f8ec55 to a34c20a Compare April 30, 2026 08:30
@alongd
Copy link
Copy Markdown
Member

alongd commented Apr 30, 2026

The CI failure here is an env drift for our tani_env, fix is here (can cherry-pick it)

Copy link
Copy Markdown
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. one comment

Comment thread arc/job/env_run.py Outdated
*script_args,
]
logger.debug("env-run: %s", " ".join(argv))
return subprocess.run(argv, check=check)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should centralize here the capture of stdout and stderr. Let's put it all here and log for better maintainability

@calvinp0 calvinp0 force-pushed the env-run-isolation branch from c90be88 to 86b3fc2 Compare April 30, 2026 10:40
calvinp0 and others added 2 commits April 30, 2026 13:41
…or env isolation

ARC runs inside `arc_env`. The AutoTST, GCN, and TorchANI adapters
shell out to scripts that live in their *own* envs (`tst_env`,
`ts_gcn`, `tani_env`). The previous flow was:

    subprocess.run("source ~/.bashrc; <target_env_python> <script>",
                   shell=True)

which invokes the target env's interpreter directly without
deactivating arc_env first. ARC's exported activation vars
(`BABEL_LIBDIR`, `LD_LIBRARY_PATH`, `CONDA_PREFIX`, ...) stay bound to
arc_env's paths in the child, and shared libraries in the target env
then resolve plugins against the wrong tree — producing ABI-mismatch
crashes (most visibly OpenBabel plugin loading in tst_env).

Add `arc/job/env_run.py` exposing `run_in_conda_env(python, script,
*args)`. It:

- Derives the env *prefix* from the configured python path
  (`<prefix>/bin/python`). Using a prefix path rather than `-n <name>`
  avoids assuming the env lives under a literal `envs/` segment, so
  `CONDA_ENVS_PATH` overrides and bare-prefix layouts (e.g.
  `/scratch/conda_envs/<env>/bin/python`) work without special-casing.
- Detects an available launcher in preference order: the active one
  per `CONDA_EXE` / `MAMBA_EXE`, then `conda` → `mamba` → `micromamba`
  on PATH.
- Selects the right stdio flag per launcher: conda/mamba need
  `--no-capture-output` (so they forward child stdio rather than
  buffering internally); micromamba streams by default and rejects
  the flag, so it is omitted. Decided by launcher basename so symlinks
  and `MAMBA_EXE`-points-at-micromamba setups still get the right
  behavior.
- Captures stdout and stderr (`capture_output=True, text=True`) and
  logs centrally — `logger.warning` on non-zero return with the full
  argv and both streams, `logger.debug` on success — so call sites
  don't each re-implement capture and error reporting. The captured
  streams remain on the returned `CompletedProcess` for callers that
  want to inspect them.
- Passes args as a list — no shell, no quoting concerns.

Routing through `<launcher> run -p <prefix>` triggers the target env's
`activate.d` hooks, which is what rebinds the leaked activation vars
to the correct paths.

Wire the three callers (`torch_ani.py`, `ts/autotst_ts.py`,
`ts/gcn_ts.py`) over to the helper and drop the now-unused
`subprocess` imports. Their `if returncode: logger.warning(...)`
blocks stay — they add job/species context the helper doesn't have.

Add `from __future__ import annotations` to `autotst_script.py` and
`tani_script.py`. Both use PEP 604 union syntax (`X | Y`) for parity
with the rest of ARC, but run under tst_env (Python 3.9) and tani_env
(version varies); the future-import defers annotation evaluation so
they import cleanly on any Python ≥3.7.

Tests in `env_run_test.py` cover prefix derivation for both standard
and `CONDA_ENVS_PATH` layouts, launcher-flag selection by basename,
the `CONDA_EXE`/`MAMBA_EXE` → PATH-lookup detection chain, argv
construction, and the success/failure logging branches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants