Skip to content

unique_id_from_tool: clarify values and usage #12463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ $ docker compose build --build-arg uid=1000
|`unittests/scans/<parser_dir>/{many_vulns,no_vuln,one_vuln}.json` | Sample files containing meaningful data for unit tests. The minimal set.
|`unittests/tools/test_<parser_name>_parser.py` | Unit tests of the parser.
|`dojo/settings/settings.dist.py` | If you want to use a modern hashcode based deduplication algorithm
|`docs/content/en/connecting_your_tools/parsers/<file/api>/<parser_file>.md` | Documentation, what kind of file format is required and how it should be obtained
|`docs/content/en/connecting_your_tools/parsers/<file/api>/<parser_file>.md` | Documentation, what kind of file format is required and how it should be obtained


## Factory contract

Expand Down Expand Up @@ -145,7 +145,7 @@ Very bad example:
Various file formats are handled through libraries. In order to keep DefectDojo slim and also don't extend the attack surface, keep the number of libraries used minimal and take other parsers as an example.

#### defusedXML in favour of lxml
As xml is by default an unsecure format, the information parsed from various xml output has to be parsed in a secure way. Within an evaluation, we determined that defusedXML is the library which we will use in the future to parse xml files in parsers as this library is rated more secure. Thus, we will only accept PRs with the defusedxml library.
As xml is by default an unsecure format, the information parsed from various xml output has to be parsed in a secure way. Within an evaluation, we determined that defusedXML is the library which we will use in the future to parse xml files in parsers as this library is rated more secure. Thus, we will only accept PRs with the defusedxml library.

### Not all attributes are mandatory

Expand Down Expand Up @@ -232,7 +232,8 @@ Bad example (DIY):

By default a new parser uses the 'legacy' deduplication algorithm documented at https://documentation.defectdojo.com/usage/features/#deduplication-algorithms

Please use a pre-defined deduplication algorithm where applicable.
Please use a pre-defined deduplication algorithm where applicable. When using the `unique_id_from_tool` or `vuln_id_from_tool` fields in the hash code configuration, it's important that these are uqniue for the finding and constant over time across subsequent scans. If this is not the case, the values can still be useful to set on the finding model without using them for deduplication.
The values must be coming from the report directly and must not be something that is calculated by the parser internally.

## Unit tests

Expand Down Expand Up @@ -366,4 +367,3 @@ Please add a new .md file in [`docs/content/en/connecting_your_tools/parsers`] w
* A link to the scanner itself - (e.g. GitHub or vendor link)

Here is an example of a completed Parser documentation page: [https://github.com/DefectDojo/django-DefectDojo/blob/master/docs/content/en/connecting_your_tools/parsers/file/acunetix.md](https://github.com/DefectDojo/django-DefectDojo/blob/master/docs/content/en/connecting_your_tools/parsers/file/acunetix.md)

18 changes: 18 additions & 0 deletions dojo/db_migrations/0229_alter_finding_unique_id_from_tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 5.1.8 on 2025-05-19 16:14

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('dojo', '0228_alter_jira_username_password'),
]

operations = [
migrations.AlterField(
model_name='finding',
name='unique_id_from_tool',
field=models.CharField(blank=True, help_text='Vulnerability technical id from the source tool. Allows to track unique vulnerabilities over time across subsequent scans.', max_length=500, null=True, verbose_name='Unique ID from tool'),
),
]
2 changes: 1 addition & 1 deletion dojo/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -2562,7 +2562,7 @@ class Finding(models.Model):
blank=True,
max_length=500,
verbose_name=_("Unique ID from tool"),
help_text=_("Vulnerability technical id from the source tool. Allows to track unique vulnerabilities."))
help_text=_("Vulnerability technical id from the source tool. Allows to track unique vulnerabilities over time across subsequent scans."))
vuln_id_from_tool = models.CharField(null=True,
blank=True,
max_length=500,
Expand Down
2 changes: 2 additions & 0 deletions dojo/settings/settings.dist.py
Original file line number Diff line number Diff line change
Expand Up @@ -1433,6 +1433,8 @@ def saml2_attrib_map_format(din):
# legacy one with multiple conditions (default mode)
DEDUPE_ALGO_LEGACY = "legacy"
# based on dojo_finding.unique_id_from_tool only (for checkmarx detailed, or sonarQube detailed for example)
# When using the `unique_id_from_tool` or `vuln_id_from_tool` fields for dedupication, it's important that these are uqniue for the finding and constant over time across subsequent scans.
# If this is not the case, the values can still be useful to set on the finding model without using them for deduplication.
DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL = "unique_id_from_tool"
# based on dojo_finding.hash_code only
DEDUPE_ALGO_HASH_CODE = "hash_code"
Expand Down