-
Couldn't load subscription status.
- Fork 2
feat: added support for modernbert embeddings #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 1862dfd in 1 minute and 18 seconds
More details
- Looked at
853lines of code in7files - Skipped
1files when reviewing. - Skipped posting
6drafted comments based on config settings.
1. .pre-commit-config.yaml:20
- Draft comment:
The pytest pre-commit hook is commented out. Consider enabling it to ensure tests are run before commits, which helps catch issues early. - Reason this comment was not posted:
Confidence changes required:50%
The PR introduces a new feature for ModernBERT embeddings, but the pre-commit hook for pytest is commented out. This could lead to untested code being merged.
2. gptme_rag/cli.py:28
- Draft comment:
Consider moving the default_persist_dir to a configuration file or environment variable for better flexibility and maintainability. - Reason this comment was not posted:
Confidence changes required:30%
The PR introduces a new feature for ModernBERT embeddings, but the pre-commit hook for pytest is commented out. This could lead to untested code being merged.
3. gptme_rag/cli.py:207
- Draft comment:
The warning about changing the embedding model is useful, but consider logging it as well for better traceability. - Reason this comment was not posted:
Confidence changes required:20%
The PR introduces a new feature for ModernBERT embeddings, but the pre-commit hook for pytest is commented out. This could lead to untested code being merged.
4. gptme_rag/embeddings.py:42
- Draft comment:
Consider making the batch size configurable to allow users to optimize for their specific hardware capabilities. - Reason this comment was not posted:
Confidence changes required:20%
The PR introduces a new feature for ModernBERT embeddings, but the pre-commit hook for pytest is commented out. This could lead to untested code being merged.
5. gptme_rag/indexing/indexer.py:55
- Draft comment:
Theembedding_functionparameter is not type-annotated. Consider adding type annotations for better code clarity and type checking. - Reason this comment was not posted:
Confidence changes required:20%
The PR introduces a new feature for ModernBERT embeddings, but the pre-commit hook for pytest is commented out. This could lead to untested code being merged.
6. gptme_rag/indexing/indexer.py:144
- Draft comment:
The logic for determining if a collection needs to be recreated is complex. Consider refactoring this into a separate method for clarity and maintainability. - Reason this comment was not posted:
Confidence changes required:30%
The PR introduces a new feature for ModernBERT embeddings, but the pre-commit hook for pytest is commented out. This could lead to untested code being merged.
Workflow ID: wflow_kzTR8OfiVC2gW1WH
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on c5a5e11 in 18 seconds
More details
- Looked at
18lines of code in1files - Skipped
0files when reviewing. - Skipped posting
0drafted comments based on config settings.
Workflow ID: wflow_nUhRL9vtds2H2f56
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9 +/- ##
==========================================
- Coverage 50.39% 46.60% -3.80%
==========================================
Files 9 10 +1
Lines 1127 1311 +184
==========================================
+ Hits 568 611 +43
- Misses 559 700 +141 ☔ View full report in Codecov by Sentry. |
Probably needs a lot of cleanup, I kinda just let gptme run with it.
Important
Added support for ModernBERT embeddings, updated CLI and indexing functionalities, and enhanced documentation and testing.
gptme_rag/embeddings.py.cli.pyto include options forembedding-functionanddevice.index()andsearch()functions incli.pyto utilize ModernBERT.ChunkMergerandSearchOutputFormatterclasses incli.pyfor improved chunk handling and output formatting.Indexerclass inindexer.pyto integrate ModernBERT embeddings.DocumentProcessorindocument_processor.pyto log chunk creation details..pre-commit-config.yamlto comment out local pytest hook.sentence-transformersto dependencies inpyproject.toml.README.mdto include instructions for using ModernBERT embeddings.test_context_assembler.pyto ensure context assembly handles token limits and includes all components.This description was created by
for c5a5e11. It will automatically update as commits are pushed.