Add citation context extraction from PDF Related Work sections #14602

Muskan244 · 2025-12-14T13:07:22Z

This PR implements citation context extraction from PDF Related Work sections. When viewing an entry with an attached PDF, users can extract citations and their surrounding context, match them to library entries, and automatically add the context descriptions to cited entries' comment fields in the format [sourceCitationKey]: context description.
Changes:

Add new "Show tab 'Citation contexts'" setting in Entry editor preferences
Create CitationContextIntegrationService to orchestrate PDF extraction and matching
Create PdfSectionExtractor to identify Related Work and References sections
Create CitationContextExtractor to parse citation markers and surrounding text
Add UI component to display extraction results with clickable apply functionality
Contexts are written to cited entries' comment-{username} field when applied

Steps to test

Open JabRef and load a library with at least one entry that has a PDF attached
Go to Options → Preferences → Entry editor and ensure "Show tab 'Citation contexts'" is enabled
Open an entry that has an academic PDF attached (ideally one with a Related Work or Literature Review section)
Click on the Citation contexts tab in the entry editor
Click "Extract from this PDF" button
Wait for extraction to complete - you should see a table with:
- Citation markers found (e.g., "(Smith 2020)", "[1]")
- Cited entry (matched library entry or "Not found")
- Context text (the surrounding sentences)
- Status (Existing, New entry, or Unmatched)
Select the contexts you want to apply using the checkboxes
Click "Apply selected" button
Check the cited entries in your library - they should now have a comment-{yourusername} field containing the context in format: [SourcePdfKey]: description text

Mandatory checks

I own the copyright of the code submitted and I license it under the MIT license
I manually tested my changes in running JabRef (always required)
I added JUnit tests for changes (if applicable)
I added screenshots in the PR description (if change is visible to the user)
I described the change in CHANGELOG.md in a way that is understandable for the average user (if change is visible to the user)
[/] I checked the user documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request updating file(s) in https://github.com/JabRef/user-documentation/tree/main/en.

Implements Issue JabRef#14085: Extract citation contexts from academic PDFs and add them to cited entries' comment fields. Changes: - Add Citation contexts tab in entry editor with extraction workflow UI - Create CitationContextExtractor to parse citation markers from PDF text - Create PdfSectionExtractor to identify Related Work and References sections - Create PdfReferenceParser to parse bibliography entries from PDFs - Create CitationMatcher to match citation markers to reference entries - Create LibraryEntryResolver to match references to library entries - Create CitationCommentWriter to write contexts to comment-{username} field - Add CitationContext and ReferenceEntry data models - Add preference to enable/disable Citation contexts tab - Display clickable extraction results with match status in table UI - Add new cited entries to library when applying contexts

jablib/src/main/java/org/jabref/logic/citation/contextextractor/CitationCommentWriter.java

…_en.properties and Replace custom calculateSimilarity method with existing StringSimilarity class in CitationCommentWriter

jablib/src/main/java/org/jabref/logic/citation/contextextractor/CitationContextExtractor.java

jablib/src/main/java/org/jabref/logic/citation/contextextractor/CitationMatcher.java

jablib/src/main/java/org/jabref/logic/citation/contextextractor/LibraryEntryResolver.java

jablib/src/main/java/org/jabref/model/citation/CitationContext.java

jabgui/src/main/java/org/jabref/gui/preferences/JabRefGuiPreferences.java

…o use assertInstanceOf instead of assertTrue with instanceof check

@nonnull

- Use AuthorListParser to extract first author family name in CitationMatcher and LibraryEntryResolver instead of custom regex - Extract inline regex patterns as constants (BRACKETS_PATTERN, WHITESPACE_PATTERN) in CitationMatcher - Replace Objects.requireNonNull with jspecify @nonnull annotations in CitationContext record

…oading and fallback

…bref into fix-for-issue-14085

jablib/src/main/java/module-info.java

Muskan244 · 2025-12-18T14:27:12Z

Future work.

Can you also add a functionality to create new entries based on the text?

Add a new tab "Related work text"

Just to clarify, for this PR should I focus only on the current changes, and treat the “Related work text” tab as a follow-up, or do you want it included here as well?

@nonnull

…Objects.requireNonNull with @nonnull, simplify getUsername() to return String, update tests accordingly, remove null-related tests, and remove logging of tinylog

…bref into fix-for-issue-14085

koppor · 2025-12-18T15:26:36Z

Future work.

Can you also add a functionality to create new entries based on the text?

Add a new tab "Related work text"

Just to clarify, for this PR should I focus only on the current changes, and treat the “Related work text” tab as a follow-up, or do you want it included here as well?

Follow up. Needs more thought....

Knowing that https://github.com/koppor/magic-merge-commit exists, you could start in parallel...

Muskan244 · 2025-12-18T16:03:00Z

Future work.
Can you also add a functionality to create new entries based on the text?
Add a new tab "Related work text"

Just to clarify, for this PR should I focus only on the current changes, and treat the “Related work text” tab as a follow-up, or do you want it included here as well?

Follow up. Needs more thought....

Knowing that koppor/magic-merge-commit exists, you could start in parallel...

Got it, I’ll focus on the current changes for this PR and treat the “Related work text” tab as a follow-up and start exploring it in parallel.

github-actions · 2025-12-22T03:38:07Z

Your pull request conflicts with the target branch.

Please merge with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line.

koppor · 2025-12-22T15:08:15Z

/review

qodo-code-review · 2025-12-22T15:08:44Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 13109 - Partially compliant Compliant requirements: Non-compliant requirements: Make `org.jabref.logic.pseudonymization.Pseudonymization` available via the CLI. Provide a CLI user experience similar to the consistency check command. Use `org.jabref.cli.CheckConsistency` as implementation reference for the CLI command structure and behavior. Requires further human verification:
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The author-key matching branch checks the wrong optional (`authorYearMatch`) before returning `authorKeyMatch`, which can prevent author-key matches from ever being returned and can also log incorrect diagnostics. Optional<ReferenceEntry> authorYearMatch = matchAuthorYearMarker(normalizedMarker, references); if (authorYearMatch.isPresent()) { LOGGER.debug("Found author-year match for '{}'", citationMarker); return authorYearMatch; } Optional<ReferenceEntry> authorKeyMatch = matchAuthorKeyMarker(normalizedMarker, references); if (authorYearMatch.isPresent()) { LOGGER.debug("Found author-key match for '{}'", citationMarker); return authorKeyMatch; } UX/Logic The `Apply selected` button enablement only depends on whether the table has items, not on whether any row is selected and matched; this can lead to a clickable action that immediately shows “No selection” and feels broken. Consider binding disable state to “any selected & matched rows” instead. Button applyButton = new Button(Localization.lang("Apply selected")); applyButton.setGraphic(IconTheme.JabRefIcons.ADD.getGraphicNode()); applyButton.setOnAction(e -> applySelectedContexts()); applyButton.setDisable(true); resultsTable.getItems().addListener((javafx.collections.ListChangeListener<ExtractedContextRow>) change -> { applyButton.setDisable(resultsTable.getItems().isEmpty()); }); Performance Regex patterns are recompiled inside `extractMarker` (`Pattern.compile(...)` calls), which is avoidable overhead when parsing many references. Consider reusing the existing static patterns (or making new ones static finals) to reduce allocations and improve throughput. private String extractMarker(String text, int index) { Matcher numericBracketedMatcher = Pattern.compile("^\\s\\[(\\d{1,3})\\]").matcher(text); if (numericBracketedMatcher.find()) { return "[" + numericBracketedMatcher.group(1) + "]"; } Matcher numericDottedMatcher = Pattern.compile("^\\s(\\d{1,3})\\.\\s").matcher(text); if (numericDottedMatcher.find()) { return "[" + numericDottedMatcher.group(1) + "]"; }

Muskan244 · 2025-12-30T12:32:31Z

Hi! Just checking in to see if there's anything more I should do.

…bref into fix-for-issue-14085

palukku · 2025-12-31T02:50:34Z

Minor detail: if I add a pdf after seeing the "no pdf attached" error message and switch back to the citation contexts tab it still shows the same message. I have to deselect the entry and reselect it to update the citation contexts.

But idk if this is your fault or the design of JabaFX, just a thing I stumbled upon.

palukku

First feedback when looking through.
Didn't have the time to think through most of the newly created classes and logic till now. I will try to look into those next year xD

palukku · 2025-12-31T02:54:48Z

jabgui/src/main/java/org/jabref/gui/preferences/ai/AiTabViewModel.java

+            Map.entry(AiTemplate.CITATION_CONTEXT_EXTRACTION_SYSTEM_MESSAGE, new SimpleStringProperty()),
+            Map.entry(AiTemplate.CITATION_CONTEXT_EXTRACTION_USER_MESSAGE, new SimpleStringProperty())
    );


I don't see them in the AI settings templates

palukku · 2025-12-31T03:20:25Z

jablib/src/main/resources/l10n/JabRef_en.properties

+Found\ %0\ citation\ context(s),\ but\ none\ could\ be\ matched\ to\ library\ entries.\ Ensure\ the\ cited\ papers\ are\ in\ your\ library\ with\ matching\ author\ names\ and\ years.=Found %0 citation context(s), but none could be matched to library entries. Ensure the cited papers are in your library with matching author names and years.
+Found\ %0\ citation\ context(s)...=Found %0 citation context(s)...
+Found\ %0\ citation\ context(s)\:\ %1\ matched,\ %2\ unmatched.\ Select\ which\ to\ apply.=Found %0 citation context(s): %1 matched, %2 unmatched. Select which to apply.
+New\ entry=New entry


Is duplicate, we already have "New\ Entry"

palukku · 2025-12-31T03:24:02Z

jablib/src/main/java/org/jabref/model/pdf/PdfDocumentSections.java

+    private static final List<String> CITATION_RELEVANT_SECTIONS = List.of(
+            "related work",
+            "literature review",
+            "background",
+            "previous work",
+            "state of the art",
+            "related studies",
+            "theoretical background",
+            "prior work"
+    );


Maybe this could be configurable so I can use it in other languages as well (could be a follow up pr, thats fine)

palukku · 2025-12-31T03:31:18Z

jablib/src/main/java/org/jabref/model/citation/ReferenceEntry.java

+        Objects.requireNonNull(rawText, "Raw text cannot be null");
+        Objects.requireNonNull(marker, "Marker cannot be null");
+        Objects.requireNonNull(authors, "Authors optional cannot be null");
+        Objects.requireNonNull(title, "Title optional cannot be null");
+        Objects.requireNonNull(year, "Year optional cannot be null");
+        Objects.requireNonNull(journal, "Journal optional cannot be null");
+        Objects.requireNonNull(volume, "Volume optional cannot be null");
+        Objects.requireNonNull(pages, "Pages optional cannot be null");
+        Objects.requireNonNull(doi, "DOI optional cannot be null");
+        Objects.requireNonNull(url, "URL optional cannot be null");


We use jspecify for nullness checks: https://devdocs.jabref.org/decisions/0052-jspecify-nullable-annotations.html

palukku · 2025-12-31T03:36:32Z

jablib/src/main/java/org/jabref/logic/citation/contextextractor/PdfReferenceParser.java

+            case NUMERIC_BRACKETED ->
+                    references.addAll(splitByPattern(normalizedText, Pattern.compile("(?=\\[\\d{1,3}\\])")));
+            case NUMERIC_DOTTED ->
+                    references.addAll(splitByPattern(normalizedText, Pattern.compile("(?=(?:^|\\n)\\d{1,3}\\.\\s)")));
+            case AUTHOR_YEAR ->
+                    references.addAll(splitByBlankLinesOrIndentation(normalizedText));
+            case AUTHOR_KEY ->
+                    references.addAll(splitByPattern(normalizedText, Pattern.compile("(?=\\[[A-Z][a-zA-Z]+\\d{2,4}[a-z]?\\])")));


Can't you extract those too or why can't you use already existing patterns like AUTHOR_KEY_MARKER_PATTERN?

calixtus · 2025-12-31T09:00:58Z

Minor detail: if I add a pdf after seeing the "no pdf attached" error message and switch back to the citation contexts tab it still shows the same message. I have to deselect the entry and reselect it to update the citation contexts.

But idk if this is your fault or the design of JabaFX, just a thing I stumbled upon.

Just means that somewhere a listener is missing

Muskan244 added 2 commits December 14, 2025 18:08

Add CHANGELOG.md entry

69aed45

github-actions bot added good third issue status: changes-required Pull requests that are not yet complete labels Dec 14, 2025