-
Notifications
You must be signed in to change notification settings - Fork 98
Fix ins or dups where splice region is preserved #719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix ins or dups where splice region is preserved #719
Conversation
0530e68 to
7f18cfd
Compare
|
We found some examples of duplications where the original logic here didn't shift the variant far enough to to get the expected result. It turned out that for these variants it's not possible to write the shifted version as a duplication since in a duplication the alt will always follow the ref. I've added some logic to rewrite these shifted variants as insertions before attempting to map them back to |
|
High-level explanation of this pull request:
|
|
I also wanted to mention that we discovered this because some fraction of FLT3 ITDs currently get missed when using the hgvs package (including one added to the unit tests). So it is a high impact issue. |
f2f89da to
08cea20
Compare
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
|
We found a case where trying to map the shifted variant causes an |
08cea20 to
13f7e36
Compare
|
Also rebased on main |
13f7e36 to
efdf6d0
Compare
|
Rebased on main |
efdf6d0 to
eaf2ef6
Compare
|
Made a few non-functional edits:
|
fd853be to
10613ad
Compare
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
7a6c23f to
236cdd0
Compare
|
Hi @korikuzma @ahwagner et all, I've removed the flags on sequence variants that I added previously as we found a way to handle this in the logic on our end. I've also rebased on main so clear up any conflicts. I think this is ready for a review when you guys have time. Thanks! |
|
This issue is stale because it has been open 180 days with no activity. |
|
Hi again @korikuzma @ahwagner @reece, can I get the stale label removed here? We would still like to get this merged if possible. Also if you have any feedback that would be great. Thanks! |
|
Hi @b0d0nne11 - I removed the stale label. I noticed that there are conflicts with main. We'll need to resolve those in order to review. I'm going to be without a computer for a week (😱) but I'll take a look when I return. |
236cdd0 to
6303c8f
Compare
|
Hey @reece, thanks! I rebased on master and took care of the merge conflicts. I'll also be out next week so no rush. Whenever you have time would be great. |
|
This issue is stale because it has been open 180 days with no activity. |
|
Hi @reece, looks like this got marked stale again. Can you take that off? |
|
This issue is stale because it has been open 180 days with no activity. |
reece
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b0d0nne11- This is a great contribution! Thank you! I especially like the shift functions to explore alternative representations. And I appreciate your care in enabling users to opt into new behavior without impinging on existing functionality.
We're working toward a 2.0 release in December. I'd love to include this feature.
| prevalidation_level = EXTRINSIC | ||
| replace_reference = True | ||
| ins_at_boundary_is_intronic = True | ||
| shift_over_boundary = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b0d0nne11: Would you please add a short comment above each of these describing what they do? Also note that False + DEFAULT correspond to hgvs 1.x code behavior and the team's interpretation of HGVS Nomenclature.
| return alt_data | ||
|
|
||
| def _get_variant_region(self): | ||
| def get_variant_region(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complexity of these cases warrants explicit tests. Would you please implement those? I think that it might be enough to roll these cases into the cases block of test_714.py with an explicit call to get_variant_region().
| ID00049 NC_000010.10:g.89692921dupA NM_000314.4:c.405dupA NP_000305.3:p.(Cys136Metfs*44) | ||
| ID00050 NC_000010.10:g.89692923_89692939delGTGCATATTTATTACAT NM_000314.4:c.407_423delGTGCATATTTATTACAT NP_000305.3:p.(Cys136Serfs*38) | ||
| ID00051 NC_000010.10:g.89712015C>A NM_000314.4:c.633C>A NP_000305.3:p.(Cys211*) | ||
| ID00052 NC_000010.10:g.89685314dupT NM_000314.4:c.209dupT NP_000305.3:p.? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment why these tests were removed. It scares me to remove tests because it implies that the tests were wrong and therefore that the code was passing something that it should have. These particular tests were contributed by some extremely experienced clinical geneticists at Invitae (in ~2015).
To be clear, we should remove them if they're wrong. I just want to know that rationale explicitly. And, while I can see the two deleted tests might be questionable, the rationale that I would use would implicate most or all of these p.? tests, not just these two.
| var.gene = symbol | ||
| return var | ||
|
|
||
| def _var_c_shifts(self, var_c, alt_ac, alt_aln_method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for the _var_c_shifts and _var_g_shifts methods.
| duplications as insertions so that the variant is shifted farther | ||
| than would normally be possible using the HGVS notation.""" | ||
| var_g = copy.deepcopy(var_g) | ||
| normalizer = hgvs.normalizer.Normalizer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than creating this here on every call of _var_g_shift_with_rewrite, should we move this (perhaps even as an optional argument) into the VariantMapper.__init__()?
| normalizer = hgvs.normalizer.Normalizer( | ||
| self.hdp, alt_aln_method=alt_aln_method, validate=False, shuffle_direction=shuffle_direction | ||
| ) | ||
| var_g = normalizer.normalize(var_g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not test this, but I suspect that calling normalize() is slow. What would happen if we would not call the normalizer here?
|
Overall I am very happy with this PR, and I would like to approve it, once Reece's comments have been addressed (mostly about testing). This is an improvement that shows great attention to detail and that solves an important edge case in a way that can be configured (and the default is compliant with HGVS nomenclature). Thanks for putting in all this effort! This is greatly appreciated. |
|
This issue is stale because it has been open 180 days with no activity. |
Fixes #714.
Fixes ins or dup variants spanning the intron/exon or exon/intron boundary where the splice site & region remain completely intact.