Skip to content

Exception when running populate_purldb on project where previous run was load_sbom #1644

Open
@ghsa-retrieval

Description

@ghsa-retrieval

Describe the bug
In some cases it appears that the populated_purldb pipeline on a project that previously did load_sbom fails.

Invalid purl: type is a required argument.

Traceback:
  File "/opt/scancodeio/aboutcode/pipeline/__init__.py", line 199, in execute
    step(self)
  File "/opt/scancodeio/scanpipe/pipelines/populate_purldb.py", line 48, in populate_purldb_with_discovered_dependencies
    purldb.populate_purldb_with_discovered_dependencies(
  File "/opt/scancodeio/scanpipe/pipes/purldb.py", line 369, in populate_purldb_with_discovered_dependencies
    packages = [{"purl": purl} for purl in get_unique_resolved_purls(project)]
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/scanpipe/pipes/purldb.py", line 298, in get_unique_resolved_purls
    return {str(PackageURL(*values)) for values in distinct_combinations}
                ^^^^^^^^^^^^^^^^^^^
  File "/opt/scancodeio/.venv/lib/python3.12/site-packages/packageurl/__init__.py", line 369, in __new__
    raise ValueError(f"Invalid purl: {key} is a required argument.")

The PURLs in the SBOM look like this: pkg:pypi/[email protected]

System configuration

  • Which version of ScanCode.io are you running?
    • 34.9.5 + custom patch for RQ worker
  • Are you running the app using Docker?
    • No; custom helm charts running on Kubernetes
  • On which OS?
    • Linux
  • What inputs are you using?
    • An SBOM containing python packages generated with cdxgen
  • Which pipeline are you running?
    • load_sbom
    • populate_purldb

To Reproduce
Update 2025-03-26:

  • It appears that any SBOM that results in dependencies being identified by load_sbom will trigger this error
  • Package URL and type columns are not populated in the table shown by ScanCode.io (see screenshot) which may indicate that the internal model is not properly populated or populate_purldb is access the wrong fields

Original:
Not quite clear what the cause is. We are using an SBOM containing pypi packages that trigger the exception after the following steps:

  1. Use Load Packages from SBOM in DejaCode. It will create a project and pipeline in ScanCode.io
  2. Once the packages have been imported into DejaCode, manually add the pipeline populate_purldb
  • The intention was to make PurlDB gather information on all identified packages
  • Instead of a success the job fails within seconds due to the error above

Expected behavior
The expected behavior is that populate_purldb works with regular PURLs found in SBOMs, may this be pypi or other packages managers.

Ideally this would also be run by default if PurlDB is configured, since otherwise this requires manual intervention when these pipelines get triggered from DejaCode. If the SBOM contains no download URL just the PURL, then it is essential that the PurlDB is being populated so that Improve Packages from PurlDB has data to work with

Screenshots
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions