Skip to content

[Question] Does DrugEx v3 support R-group enumeration with specified attachment points? #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
guosijia opened this issue May 29, 2025 · 0 comments

Comments

@guosijia
Copy link

Question

We are trying to perform R-group enumeration using DrugEx v3 and would like to clarify whether DrugEx supports specifying attachment points for fragment-based molecule generation.

Background

We understand that DrugEx v3 can generate molecules based on input fragments, but we're uncertain about its capability to handle specific attachment points for R-group enumeration tasks.

Current Approach and Issues

Approach 1: Using fragments with dummy atoms ([*:1])

Input scaffold:

O=C(CCC(F)(F)F)NC(c1cnn2cc(C(C3CCC(F)(F)CC3)[*:1])nc2c1)C1CC1

Command used:

python -m drugex.dataset \
    -b ${base_dir} \
    -i scaffold_with_star.tsv \
    -mc SMILES \
    -o rgroup_data \
    -mt graph \
    -s  

Issue: The generated graph data file (rgroup_data_graph.txt) contains only column headers but no actual data (empty matrix), preventing further molecule generation.

File content example:

C0 C1 C2 C3 C4 ... C399
(no data rows)

Approach 2: Removing dummy atoms from fragments

Modified scaffold:

O=C(CCC(F)(F)F)NC(c1cnn2cc(C(C3CCC(F)(F)CC3))nc2c1)C1CC1

Issue: While this approach generates valid graph data and molecules successfully, most generated molecules do not grow from our intended attachment point. The model seems to modify the scaffold at random positions rather than the specific location where the [*:1] was originally placed.

Questions

  1. Does DrugEx v3 natively support R-group enumeration with specified attachment points?

  2. Is there a correct way to handle dummy atoms ([:1], [:2], etc.) in DrugEx input fragments?

  3. If attachment point specification is not directly supported, what would be the recommended workflow for R-group enumeration tasks?

  4. Are there any plans to support explicit attachment point specification in future versions?

Expected Behavior

We would like to:

  • Input a scaffold with clearly marked attachment points (e.g., [*:1])
  • Generate molecules that grow specifically from these marked positions
  • Maintain the core scaffold structure while only modifying the R-groups at specified locations

Environment

  • DrugEx version: v3.4.5
  • Python version: 3.8+
  • Operating System: Ubuntu

Any guidance or clarification would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant