-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Added findInText for ArXivIdentifier #14760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Added findInText for ArXivIdentifier #14760
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||
jablib/src/main/java/org/jabref/model/entry/identifier/ArXivIdentifier.java
Outdated
Show resolved
Hide resolved
|
Can you reuse the existing regex pattern? I think it can be easily reduced, and |
as the parse() method requires strict validation, this is not the case for findInText(), since it is only responsible for locating an arXiv identifier. is there is possibility to modifying the existing? is so it will good to adjust with that. |
Yes, the |
You’re right, the strictness mainly comes from how the pattern is used (matches() in parse() vs find() in findInText()), not from the pattern itself. The existing pattern in parse() is primarily designed for full-string validation, which is why I initially used a simpler pattern for searching inside text. I agree that the same pattern could be reused with find() in findInText(), and I’m happy to refactor it that way if you think it would be cleaner. |
|
@InAnYan |
|
Hmm, I would look into the code that automatically determines the identifier type. Which method it uses |
|
Yeah, it also would be useful if you change this method |
|
|
@InAnYan |
|
Since your PR is not merged, your changes should be in |
|
Your pull request conflicts with the target branch. Please merge with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line. |
8501ef9 to
596e358
Compare




User description
Closes #14659
This change improves arXiv identifier detection when pasting arXiv URLs that include URL fragments, such as links copied from arxiv.org HTML pages. JabRef now correctly recognizes these identifiers and fetches the corresponding entries. Unit tests were added to cover the fixed behavior.
Steps to test
Open JabRef.
Create a new empty library or open any existing library.
Use BibTeX → New entry from plain text (or paste into the search / fetch dialog).
Paste an arXiv URL copied from an arXiv HTML page, for example: https://arxiv.org/html/2503.08641v1#bib.bib5
Confirm that JabRef correctly detects the identifier as arXiv and fetches the corresponding entry.
Mandatory checks
CHANGELOG.mdin a way that is understandable for the average user (if change is visible to the user)PR Type
Bug fix, Tests
Description
Added
findInText()method toArXivIdentifierfor robust identifier extractionHandles arXiv URLs with fragments by stripping them before parsing
Uses regex pattern to extract identifiers from various URL formats
Updated
CompositeIdFetcherandIdentifierto use new methodAdded comprehensive unit tests covering edge cases
Diagram Walkthrough
flowchart LR A["User pastes arXiv URL<br/>with fragment"] -->|"e.g., arxiv.org/html/...#bib"| B["ArXivIdentifier.findInText()"] B -->|"Strip fragment"| C["Clean text"] C -->|"Try direct parse"| D{Success?} D -->|"Yes"| E["Return identifier"] D -->|"No"| F["Apply regex pattern"] F -->|"Match found"| G["Parse matched text"] G --> E F -->|"No match"| H["Return empty"]File Walkthrough
CompositeIdFetcher.java
Update to use new findInText methodjablib/src/main/java/org/jabref/logic/importer/CompositeIdFetcher.java
ArXivIdentifier.parse()toArXivIdentifier.findInText()inperformSearchById()methodArXivIdentifier.java
Add findInText method with fragment handlingjablib/src/main/java/org/jabref/model/entry/identifier/ArXivIdentifier.java
findInText()static method for robust identifier extractionsplit("#")[0]matching
arxiv.org/abs,arxiv.org/html,arxiv.org/pdf, and plain identifiersv1) as optional componentIdentifier.java
Update Identifier factory methodjablib/src/main/java/org/jabref/model/entry/identifier/Identifier.java
from()method to useArXivIdentifier.findInText()instead ofparse()ArXivIdentifierTest.java
Add comprehensive findInText testsjablib/src/test/java/org/jabref/model/entry/identifier/ArXivIdentifierTest.java
findInTextFindsArxivFromHtmlUrlWithFragment()findInTextFindsArxivInsideText()findInTextReturnsEmptyForNonArxivText()CHANGELOG.md
Document arXiv identifier improvementCHANGELOG.md
fragments
findInTextfor ArXivIdentifier #14659