Skip to content

Conversation

@b0d0nne11
Copy link
Contributor

Fixes #714.

Fixes ins or dup variants spanning the intron/exon or exon/intron boundary where the splice site & region remain completely intact.

@b0d0nne11 b0d0nne11 requested a review from a team as a code owner January 23, 2024 19:25
@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch from 0530e68 to 7f18cfd Compare January 23, 2024 19:27
@b0d0nne11
Copy link
Contributor Author

We found some examples of duplications where the original logic here didn't shift the variant far enough to to get the expected result. It turned out that for these variants it's not possible to write the shifted version as a duplication since in a duplication the alt will always follow the ref. I've added some logic to rewrite these shifted variants as insertions before attempting to map them back to var_ps and added tests to include these cases.

@gostachowiak
Copy link

gostachowiak commented Feb 7, 2024

High-level explanation of this pull request:

  • HGVS nomenclature has the 3' shifting rule. so all cdots and pdots are shifted to the right
  • However, 3' shifting is arbitrary and a necessary evil for nomenclature purposes. But biology doesn't care about the 3' shifting rule
  • Consider an example: positive strand gene, first 8 bases of the intron is duplicated
  • cdot would be +1_+8dup
  • currently no pdot would be calculated because both positions have an offset
  • but now consider the biology-- after the duplication, there are 2 splice sites on the left side of the intron. Which is more likely to be used for splicing?
  • We obviously can't know for sure, but it seems to me that the most logical assumption is that it will use the "inner" splice site for splicing, leaving the extra inserted material within the coding sequence, resulting in a frameshift pdot. Why is this the best assumption?
    • When that splice site is used, the entire intronic sequence is totally intact
    • Seems "safer" to calculate a pdot for this to rescue the variant-- otherwise downstream applications will most likely be filtering out this variant because it's an insertion after the 8th position in the intron
  • To handle this type of situation in the most general way possible, this is the approach:
    • calculate pdot the normal way.
    • if empty, shift the cdot in the REVERSE direction, and calculate the pdot again. if you get a result, use it
    • basically, if you can get a pdot with either forward or reverse shifting, that means the entire intron is intact and we should bring in the pdot
  • note that the 3' shifting rule is still respected both for the cdot and pdot nomenclature
    • reverse shifting is only used as a tool when calculating pdot from cdot, which does not violate HGVS nomenclature

@gostachowiak
Copy link

I also wanted to mention that we discovered this because some fraction of FLT3 ITDs currently get missed when using the hgvs package (including one added to the unit tests). So it is a high impact issue.

@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch 2 times, most recently from f2f89da to 08cea20 Compare February 8, 2024 15:34
@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Mar 11, 2024
@ahwagner ahwagner removed the stale Issue is stale and subject to automatic closing label Mar 11, 2024
@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Apr 11, 2024
@b0d0nne11
Copy link
Contributor Author

@reece or @ahwagner can you remove the stale label here? We would still like to get this merged is possible. Thanks!

@jsstevenson jsstevenson added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing labels Apr 12, 2024
@b0d0nne11
Copy link
Contributor Author

We found a case where trying to map the shifted variant causes an HGVSInvalidVariantError. I've added logic to handle this and a test case. The variant is NM_182758.2:c.2953-31_2953-26dup. As part of the shifting procedure, mapping this to the g type yielded an unexpected transformation to NC_000015.9:g.53815545_53815550delinsC that caused problems with later steps. I'm simply handling the error here since we don't want to consider invalid variants.

@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch from 08cea20 to 13f7e36 Compare May 24, 2024 16:24
@b0d0nne11
Copy link
Contributor Author

Also rebased on main

@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch from 13f7e36 to efdf6d0 Compare June 4, 2024 18:17
@b0d0nne11
Copy link
Contributor Author

Rebased on main

@b0d0nne11
Copy link
Contributor Author

Made a few non-functional edits:

  • Rebased on main
  • Moved tests to an isolated test file
  • Squashed some commits

@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch 2 times, most recently from fd853be to 10613ad Compare September 18, 2024 20:34
@korikuzma korikuzma removed the keep alive exempt issue from staleness checks label Dec 1, 2024
@github-actions
Copy link

github-actions bot commented Jan 1, 2025

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@b0d0nne11
Copy link
Contributor Author

b0d0nne11 commented Sep 4, 2025

Hi @korikuzma @ahwagner et all, I've removed the flags on sequence variants that I added previously as we found a way to handle this in the logic on our end. I've also rebased on main so clear up any conflicts. I think this is ready for a review when you guys have time. Thanks!

@github-actions
Copy link

This issue is stale because it has been open 180 days with no activity.
Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Sep 12, 2025
@b0d0nne11
Copy link
Contributor Author

Hi again @korikuzma @ahwagner @reece, can I get the stale label removed here? We would still like to get this merged if possible. Also if you have any feedback that would be great. Thanks!

@reece reece removed keep alive exempt issue from staleness checks stale Issue is stale and subject to automatic closing labels Oct 1, 2025
@reece
Copy link
Member

reece commented Oct 1, 2025

Hi @b0d0nne11 - I removed the stale label.

I noticed that there are conflicts with main. We'll need to resolve those in order to review.

I'm going to be without a computer for a week (😱) but I'll take a look when I return.

@b0d0nne11
Copy link
Contributor Author

Hey @reece, thanks! I rebased on master and took care of the merge conflicts. I'll also be out next week so no rush. Whenever you have time would be great.

@github-actions
Copy link

This issue is stale because it has been open 180 days with no activity.
Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Oct 11, 2025
@b0d0nne11
Copy link
Contributor Author

b0d0nne11 commented Oct 13, 2025

Hi @reece, looks like this got marked stale again. Can you take that off?

@reece reece removed the stale Issue is stale and subject to automatic closing label Oct 14, 2025
@github-actions
Copy link

This issue is stale because it has been open 180 days with no activity.
Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Oct 21, 2025
Copy link
Member

@reece reece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b0d0nne11- This is a great contribution! Thank you! I especially like the shift functions to explore alternative representations. And I appreciate your care in enabling users to opt into new behavior without impinging on existing functionality.

We're working toward a 2.0 release in December. I'd love to include this feature.

prevalidation_level = EXTRINSIC
replace_reference = True
ins_at_boundary_is_intronic = True
shift_over_boundary = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b0d0nne11: Would you please add a short comment above each of these describing what they do? Also note that False + DEFAULT correspond to hgvs 1.x code behavior and the team's interpretation of HGVS Nomenclature.

return alt_data

def _get_variant_region(self):
def get_variant_region(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity of these cases warrants explicit tests. Would you please implement those? I think that it might be enough to roll these cases into the cases block of test_714.py with an explicit call to get_variant_region().

ID00049 NC_000010.10:g.89692921dupA NM_000314.4:c.405dupA NP_000305.3:p.(Cys136Metfs*44)
ID00050 NC_000010.10:g.89692923_89692939delGTGCATATTTATTACAT NM_000314.4:c.407_423delGTGCATATTTATTACAT NP_000305.3:p.(Cys136Serfs*38)
ID00051 NC_000010.10:g.89712015C>A NM_000314.4:c.633C>A NP_000305.3:p.(Cys211*)
ID00052 NC_000010.10:g.89685314dupT NM_000314.4:c.209dupT NP_000305.3:p.?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment why these tests were removed. It scares me to remove tests because it implies that the tests were wrong and therefore that the code was passing something that it should have. These particular tests were contributed by some extremely experienced clinical geneticists at Invitae (in ~2015).

To be clear, we should remove them if they're wrong. I just want to know that rationale explicitly. And, while I can see the two deleted tests might be questionable, the rationale that I would use would implicate most or all of these p.? tests, not just these two.

var.gene = symbol
return var

def _var_c_shifts(self, var_c, alt_ac, alt_aln_method):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for the _var_c_shifts and _var_g_shifts methods.

@reece reece requested a review from andreasprlic October 27, 2025 18:31
@github-actions github-actions bot removed the stale Issue is stale and subject to automatic closing label Oct 28, 2025
duplications as insertions so that the variant is shifted farther
than would normally be possible using the HGVS notation."""
var_g = copy.deepcopy(var_g)
normalizer = hgvs.normalizer.Normalizer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than creating this here on every call of _var_g_shift_with_rewrite, should we move this (perhaps even as an optional argument) into the VariantMapper.__init__()?

normalizer = hgvs.normalizer.Normalizer(
self.hdp, alt_aln_method=alt_aln_method, validate=False, shuffle_direction=shuffle_direction
)
var_g = normalizer.normalize(var_g)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not test this, but I suspect that calling normalize() is slow. What would happen if we would not call the normalizer here?

@andreasprlic
Copy link
Member

Overall I am very happy with this PR, and I would like to approve it, once Reece's comments have been addressed (mostly about testing). This is an improvement that shows great attention to detail and that solves an important edge case in a way that can be configured (and the default is compliant with HGVS nomenclature). Thanks for putting in all this effort! This is greatly appreciated.

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

This issue is stale because it has been open 180 days with no activity.
Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issue is stale and subject to automatic closing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

c_to_p at intron/exon boundary where splice region is preserved

7 participants