Description
Describe the bug
In some cases it appears that the populated_purldb
pipeline on a project that previously did load_sbom
fails.
Invalid purl: type is a required argument.
Traceback:
File "/opt/scancodeio/aboutcode/pipeline/__init__.py", line 199, in execute
step(self)
File "/opt/scancodeio/scanpipe/pipelines/populate_purldb.py", line 48, in populate_purldb_with_discovered_dependencies
purldb.populate_purldb_with_discovered_dependencies(
File "/opt/scancodeio/scanpipe/pipes/purldb.py", line 369, in populate_purldb_with_discovered_dependencies
packages = [{"purl": purl} for purl in get_unique_resolved_purls(project)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/scancodeio/scanpipe/pipes/purldb.py", line 298, in get_unique_resolved_purls
return {str(PackageURL(*values)) for values in distinct_combinations}
^^^^^^^^^^^^^^^^^^^
File "/opt/scancodeio/.venv/lib/python3.12/site-packages/packageurl/__init__.py", line 369, in __new__
raise ValueError(f"Invalid purl: {key} is a required argument.")
The PURLs in the SBOM look like this: pkg:pypi/[email protected]
System configuration
- Which version of ScanCode.io are you running?
- 34.9.5 + custom patch for RQ worker
- Are you running the app using Docker?
- No; custom helm charts running on Kubernetes
- On which OS?
- Linux
- What inputs are you using?
- An SBOM containing python packages generated with cdxgen
- Which pipeline are you running?
load_sbom
populate_purldb
To Reproduce
Update 2025-03-26:
- It appears that any SBOM that results in dependencies being identified by
load_sbom
will trigger this error - Package URL and type columns are not populated in the table shown by ScanCode.io (see screenshot) which may indicate that the internal model is not properly populated or
populate_purldb
is access the wrong fields
Original:
Not quite clear what the cause is. We are using an SBOM containing pypi packages that trigger the exception after the following steps:
- Use Load Packages from SBOM in DejaCode. It will create a project and pipeline in ScanCode.io
- Once the packages have been imported into DejaCode, manually add the pipeline
populate_purldb
- The intention was to make PurlDB gather information on all identified packages
- Instead of a success the job fails within seconds due to the error above
Expected behavior
The expected behavior is that populate_purldb
works with regular PURLs found in SBOMs, may this be pypi or other packages managers.
Ideally this would also be run by default if PurlDB is configured, since otherwise this requires manual intervention when these pipelines get triggered from DejaCode. If the SBOM contains no download URL just the PURL, then it is essential that the PurlDB is being populated so that Improve Packages from PurlDB
has data to work with