env_run: isolate adapter subprocesses from arc_env activation vars#883
env_run: isolate adapter subprocesses from arc_env activation vars#883
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a centralized helper for running adapter scripts inside their own conda-family environments to prevent arc_env activation variables (e.g., BABEL_LIBDIR, LD_LIBRARY_PATH, CONDA_PREFIX) from leaking into subprocesses and causing ABI/plugin-loading issues.
Changes:
- Add
arc/job/env_run.pywithrun_in_conda_env()that launches scripts viaconda/mamba/micromamba run. - Update AutoTST, GCN, and TorchANI adapters to use
run_in_conda_env()instead ofsubprocess.run(..., shell=True). - Add
from __future__ import annotationsto the AutoTST/TorchANI adapter scripts.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
arc/job/env_run.py |
New helper for launcher detection and cross-env script execution via *mamba/conda run. |
arc/job/adapters/ts/gcn_ts.py |
Switch GCN subprocess invocation to run_in_conda_env(). |
arc/job/adapters/ts/autotst_ts.py |
Switch AutoTST subprocess invocation to run_in_conda_env(). |
arc/job/adapters/torch_ani.py |
Switch TorchANI subprocess invocation to run_in_conda_env(). |
arc/job/adapters/scripts/tani_script.py |
Add future import intended to support postponed annotations. |
arc/job/adapters/scripts/autotst_script.py |
Add future import intended to support postponed annotations. |
Comments suppressed due to low confidence (2)
arc/job/adapters/scripts/autotst_script.py:47
from __future__ import annotationsdoes not make PEP 604 union syntax (str | None) parse on Python 3.9 (it still raisesSyntaxErrorat import time). Iftst_envis 3.9, this script needs to replaceX | Yannotations withtyping.Optional/typing.Union(or require Python >= 3.10 in that env).
# tst_env is Python 3.9; this script uses PEP 604 union syntax
# (``str | None``) for parity with the rest of ARC. The future-import
# defers annotation evaluation so 3.9 doesn't choke at def-time.
from __future__ import annotations
import argparse
import numpy as np
import os
import yaml
from autotst.reaction import Reaction
def parse_command_line_arguments(command_line_args=None):
"""
Parse command-line arguments.
This uses the :mod:`argparse` module, which ensures that the command-line arguments are
sensible, parses them, and returns them.
"""
parser = argparse.ArgumentParser(description='AutoTST')
parser.add_argument('reaction_label', metavar='Reaction', type=str, nargs=1,
help='a string representation of a reaction, e.g., CCCC+[O]O_[CH2]CCC+OO')
parser.add_argument('output_path', metavar='The output path', type=str, nargs=1,
help='a string representation of the output path to store the results')
parser.add_argument('-t', '--mkl_num_threads', metavar='The number of threads path', type=str, nargs=1,
help='a string representation of the number of threads to use', default='1')
args = parser.parse_args(command_line_args)
# Process args to set correct default values and format
args.reaction_label = args.reaction_label[0]
args.output_path = args.output_path[0]
return args
def main(reaction_label: str | None = None,
output_path: str | None = None,
) -> None:
arc/job/adapters/scripts/tani_script.py:88
- Same issue here: this file contains PEP 604 union annotations like
int | None, which will not import on Python < 3.10 even withfrom __future__ import annotations. Either update the target env requirement to Python >= 3.10 or change these annotations totyping.Optional/typing.Unionsotani_envcan run under older Pythons.
# tani_env's Python may predate PEP 604 (``X | Y`` unions); this
# future-import defers annotation evaluation so the script imports on
# any Python ≥3.7 regardless of the env's interpreter version.
from __future__ import annotations
import argparse
import os
import yaml
from ase import Atoms
from ase.constraints import FixInternals
from ase.optimize import BFGS
from ase.optimize.sciopt import Converged, OptimizerConvergenceError, SciPyFminBFGS, SciPyFminCG
import torch
import torchani
from torchani.ase import Calculator
elements = {'H': 1, 'He': 2, 'Li': 3, 'Be': 4, 'B': 5, 'C': 6, 'N': 7, 'O': 8,
'F': 9, 'Ne': 10, 'Na': 11, 'Mg': 12, 'Al': 13, 'Si': 14, 'P': 15, 'S': 16, 'Cl': 17,
'Ar': 18, 'K': 19, 'Ca': 20, 'Sc': 21, 'Ti': 22,'V': 23,'Cr': 24, 'Mn': 25, 'Fe': 26,
'Co': 27, 'Ni': 28, 'Cu': 29, 'Zn': 30, 'Ga': 31, 'Ge': 32, 'As': 33, 'Se': 34, 'Br': 35,
'Kr': 36, 'Rb': 37, 'Sr': 38, 'Y': 39, 'Zr': 40, 'Nb': 41, 'Mo': 42, 'Tc': 43, 'Ru': 44,
'Rh': 45, 'Pd': 46, 'Ag': 47, 'Cd': 48, 'In': 49, 'Sn': 50, 'Sb': 51, 'Te': 52, 'I': 53,
'Xe': 54, 'Cs': 55, 'Ba': 56, 'La': 57, 'Ce': 58, 'Pr': 59, 'Nd': 60, 'Pm': 61, 'Sm': 62,
'Eu': 63, 'Gd': 64, 'Tb': 65, 'Dy': 66, 'Ho': 67, 'Er': 68, 'Tm': 69, 'Yb': 70, 'Lu': 71,
'Hf': 72, 'Ta': 73, 'W': 74, 'Re': 75, 'Os': 76, 'Ir': 77, 'Pt': 78, 'Au': 79, 'Hg': 80,
'Tl': 81, 'Pb': 82, 'Bi': 83, 'Po': 84,'At': 85, 'Rn': 86, 'Fr': 87, 'Ra': 88, 'Ac': 89,
'Th': 90, 'Pa': 91, 'U': 92, 'Np': 93, 'Pu': 94, 'Am': 95, 'Cm': 96, 'Bk': 97, 'Cf': 98,
'Es': 99,'Fm': 100, 'Md': 101, 'No': 102,'Lr': 103, 'Rf': 104, 'Db': 105, 'Sg': 106,
'Bh': 107, 'Hs': 108, 'Mt': 109, 'Ds': 110, 'Rg': 111, 'Cn': 112, 'Nh': 113, 'Fl': 114,
'Mc': 115, 'Lv': 116, 'Ts': 117,'Og': 118}
def run_sp(xyz, device, model):
"""
Run a single-point energy calculation.
"""
coords, z_list = xyz_to_coords_and_element_numbers(xyz)
coordinates = torch.tensor([coords], requires_grad=True, device=device)
species = torch.tensor([z_list], device=device)
energy = model((species, coordinates)).energies
sp = hartree_to_si(energy.item(), kilo=True)
return sp
def to_Z(element: str) -> int:
"""
Return the element number of an element.
Args:
element (str): the element symbol. For example: "H", "Br"
Returns:
The element number. For the example: 1, 35
"""
return elements.get(element.capitalize(), None)
def run_force(xyz, device, model):
"""
Compute the force matrix.
"""
coords, z_list = xyz_to_coords_and_element_numbers(xyz)
coordinates = torch.tensor([coords], requires_grad=True, device=device)
species = torch.tensor([z_list], device=device)
energy = model((species, coordinates)).energies
derivative = torch.autograd.grad(energy.sum(), coordinates)[0]
force = -derivative
force = force.squeeze().numpy().tolist()
return force
def run_opt(xyz,
model,
constraints: dict = None,
fmax: float = 0.001,
steps: int | None = None,
engine: str = 'SciPyFminBFGS',
):
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| parts = Path(python_executable).resolve().parts | ||
| if "envs" in parts: | ||
| idx = parts.index("envs") | ||
| if idx + 1 < len(parts): | ||
| return parts[idx + 1] | ||
| raise ValueError( | ||
| f"Cannot derive an env name from {python_executable!r}; " | ||
| "expected a path of the form '.../envs/<env>/bin/python'." | ||
| ) |
|
|
||
| ARC runs inside ``arc_env``. Several adapters (AutoTST, GCN, TorchANI) | ||
| shell out to scripts that live in their *own* envs (``tst_env``, | ||
| ``ts_gcn_env``, ``tani_env``). Running the target env's ``python`` |
| f'--ts_xyz_path {ts_path}'] | ||
| command = '; '.join(commands) | ||
| output = subprocess.run(command, shell=True, executable='/bin/bash') | ||
| # Routed via `conda run -n ts_gcn_env` so arc_env's activation vars don't |
| def run_in_conda_env( | ||
| python_executable: str, | ||
| script_path: str, | ||
| *script_args: str, | ||
| check: bool = False, | ||
| ) -> subprocess.CompletedProcess: | ||
| """Run ``python script_path *script_args`` inside the env that owns | ||
| ``python_executable``, isolated from ARC's process env. | ||
|
|
||
| Returns the :class:`subprocess.CompletedProcess`. ``check=True`` | ||
| raises on non-zero exit. Args are passed as a list, so no shell | ||
| quoting concerns. | ||
| """ | ||
| env_name = env_name_from_python(python_executable) | ||
| launcher, extra_flags = _detect_launcher() | ||
| argv = [ | ||
| launcher, "run", *extra_flags, | ||
| "-n", env_name, | ||
| "python", script_path, | ||
| *script_args, | ||
| ] | ||
| logger.debug("env-run: %s", " ".join(argv)) | ||
| return subprocess.run(argv, check=check) |
6f8ec55 to
a34c20a
Compare
|
The CI failure here is an env drift for our tani_env, fix is here (can cherry-pick it) |
| *script_args, | ||
| ] | ||
| logger.debug("env-run: %s", " ".join(argv)) | ||
| return subprocess.run(argv, check=check) |
There was a problem hiding this comment.
We should centralize here the capture of stdout and stderr. Let's put it all here and log for better maintainability
c90be88 to
86b3fc2
Compare
…or env isolation
ARC runs inside `arc_env`. The AutoTST, GCN, and TorchANI adapters
shell out to scripts that live in their *own* envs (`tst_env`,
`ts_gcn`, `tani_env`). The previous flow was:
subprocess.run("source ~/.bashrc; <target_env_python> <script>",
shell=True)
which invokes the target env's interpreter directly without
deactivating arc_env first. ARC's exported activation vars
(`BABEL_LIBDIR`, `LD_LIBRARY_PATH`, `CONDA_PREFIX`, ...) stay bound to
arc_env's paths in the child, and shared libraries in the target env
then resolve plugins against the wrong tree — producing ABI-mismatch
crashes (most visibly OpenBabel plugin loading in tst_env).
Add `arc/job/env_run.py` exposing `run_in_conda_env(python, script,
*args)`. It:
- Derives the env *prefix* from the configured python path
(`<prefix>/bin/python`). Using a prefix path rather than `-n <name>`
avoids assuming the env lives under a literal `envs/` segment, so
`CONDA_ENVS_PATH` overrides and bare-prefix layouts (e.g.
`/scratch/conda_envs/<env>/bin/python`) work without special-casing.
- Detects an available launcher in preference order: the active one
per `CONDA_EXE` / `MAMBA_EXE`, then `conda` → `mamba` → `micromamba`
on PATH.
- Selects the right stdio flag per launcher: conda/mamba need
`--no-capture-output` (so they forward child stdio rather than
buffering internally); micromamba streams by default and rejects
the flag, so it is omitted. Decided by launcher basename so symlinks
and `MAMBA_EXE`-points-at-micromamba setups still get the right
behavior.
- Captures stdout and stderr (`capture_output=True, text=True`) and
logs centrally — `logger.warning` on non-zero return with the full
argv and both streams, `logger.debug` on success — so call sites
don't each re-implement capture and error reporting. The captured
streams remain on the returned `CompletedProcess` for callers that
want to inspect them.
- Passes args as a list — no shell, no quoting concerns.
Routing through `<launcher> run -p <prefix>` triggers the target env's
`activate.d` hooks, which is what rebinds the leaked activation vars
to the correct paths.
Wire the three callers (`torch_ani.py`, `ts/autotst_ts.py`,
`ts/gcn_ts.py`) over to the helper and drop the now-unused
`subprocess` imports. Their `if returncode: logger.warning(...)`
blocks stay — they add job/species context the helper doesn't have.
Add `from __future__ import annotations` to `autotst_script.py` and
`tani_script.py`. Both use PEP 604 union syntax (`X | Y`) for parity
with the rest of ARC, but run under tst_env (Python 3.9) and tani_env
(version varies); the future-import defers annotation evaluation so
they import cleanly on any Python ≥3.7.
Tests in `env_run_test.py` cover prefix derivation for both standard
and `CONDA_ENVS_PATH` layouts, launcher-flag selection by basename,
the `CONDA_EXE`/`MAMBA_EXE` → PATH-lookup detection chain, argv
construction, and the success/failure logging branches.
Summary
ARC runs inside
arc_env, but the AutoTST, GCN, and TorchANI adapters shell out to scripts that live in their own envs (tst_env,ts_gcn_env,tani_env). The previous code path was:This invokes the target env's interpreter directly without deactivating
arc_envfirst. ARC's exported activation vars (BABEL_LIBDIR,LD_LIBRARY_PATH,CONDA_PREFIX, ...) stay bound toarc_env's paths in the child, so shared libraries in the target env then resolve plugins against the wrong tree — producing ABI-mismatch crashes (most visibly OpenBabel plugin loading intst_env).What this PR does
arc/job/env_run.pyexposingrun_in_conda_env(python, script, *args), which routes the subprocess through<launcher> run -n <env>so the target env'sactivate.dhooks fire and rebind the leaked vars to the correct paths.CONDA_EXE/MAMBA_EXE, thenconda→mamba→micromambaon PATH. Selects the right stdio flag per launcher (--no-capture-outputfor conda/mamba, omitted for micromamba which rejects it).arc/job/adapters/torch_ani.py,arc/job/adapters/ts/autotst_ts.py,arc/job/adapters/ts/gcn_ts.py) over to the helper and drops the now-unusedsubprocessimports.from __future__ import annotationstoautotst_script.pyandtani_script.pyso their PEP 604 union syntax (X | Y) imports cleanly under whatever Python version the target env ships (tst_env is 3.9; tani_env varies).Test plan
condais the active launcher; verify also undermamba/micromambaif available