Skip to content

Commit 7bb4499

Browse files
Restructure pipelines for verbosity (#1074)
* Restructure pipelines for verbosity Remove scan_codebase_packages pipeline, and restructure inspect_packages pipeline into load_sbom and resolve_packages pipelines. Reference: #1035 Reference: #1034 Signed-off-by: Ayan Sinha Mahapatra <[email protected]> * Refactor functions and improve docstrings Reference: #1074 Signed-off-by: Ayan Sinha Mahapatra <[email protected]> * Add unittests for new functions Signed-off-by: Ayan Sinha Mahapatra <[email protected]> * Update docs and add CHANGELOG entry Signed-off-by: Ayan Sinha Mahapatra <[email protected]> * Improve docstrings for pipelines Suggested-by: Philippe Ombredanne <[email protected]> Signed-off-by: Ayan Sinha Mahapatra <[email protected]> --------- Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
1 parent 100b64e commit 7bb4499

18 files changed

+434
-164
lines changed

CHANGELOG.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,20 @@ v33.2.0 (unreleased)
1717

1818
https://github.com/nexB/scancode.io/issues/1071
1919

20+
- Rename pipeline for consistency and precision:
21+
* scan_codebase_packages: inspect_packages
22+
23+
Restructure the inspect_manifest pipeline into:
24+
* load_sbom: for loading SPDX/CycloneDX SBOMs and ABOUT files
25+
* resolve_dependencies: for resolving package dependencies
26+
* inspect_packages: gets package data from package manifests/lockfiles
27+
28+
A data migration is included to facilitate the migration of existing data.
29+
Only the new names are available in the web UI but the REST API and CLI are backward
30+
compatible with the old names.
31+
https://github.com/nexB/scancode.io/issues/1034
32+
https://github.com/nexB/scancode.io/discussions/1035
33+
2034
v33.1.0 (2024-02-02)
2135
--------------------
2236

docs/automation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ automation methods such as a cron job or a git hook::
2727
"https://github.com/nexB/scancode.io/archive/refs/tags/v32.4.0.zip",
2828
]
2929
PIPELINES = [
30-
"scan_codebase_package",
30+
"inspect_packages",
3131
"find_vulnerabilities",
3232
]
3333
EXECUTE_NOW = True

docs/built-in-pipelines.rst

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,22 @@ Load Inventory
7272
:members:
7373
:member-order: bysource
7474

75+
.. _pipeline_load_sbom:
76+
77+
Load SBOM
78+
---------
79+
.. autoclass:: scanpipe.pipelines.load_sbom.LoadSBOM()
80+
:members:
81+
:member-order: bysource
82+
83+
.. _pipeline_resolve_dependencies:
84+
85+
Resolve Dependencies
86+
--------------------
87+
.. autoclass:: scanpipe.pipelines.resolve_dependencies.ResolveDependencies()
88+
:members:
89+
:member-order: bysource
90+
7591
.. _pipeline_map_deploy_to_develop:
7692

7793
Map Deploy To Develop
@@ -126,14 +142,6 @@ Scan Codebase
126142
:members:
127143
:member-order: bysource
128144

129-
.. _pipeline_scan_codebase_package:
130-
131-
Scan Codebase Package
132-
---------------------
133-
.. autoclass:: scanpipe.pipelines.scan_codebase_packages.ScanCodebasePackages()
134-
:members:
135-
:member-order: bysource
136-
137145
.. _pipeline_scan_single_package:
138146

139147
Scan Single Package

docs/faq.rst

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,27 @@ Here are some general guidelines based on different input scenarios:
2525

2626
- If you have a **Docker image** as input, use the
2727
:ref:`analyze_docker_image <pipeline_analyze_docker_image>` pipeline.
28-
- For a full **codebase compressed as an archive**, choose the
28+
- For a full **codebase compressed as an archive**, optionally also with
29+
it's **pre-resolved dependenices**, and want to detect all the packages
30+
present linked with their respective files, use the
2931
:ref:`scan_codebase <pipeline_scan_codebase>` pipeline.
30-
- If you have a **single package archive**, opt for the
32+
- If you have a **single package archive**, and you want to get information
33+
on licenses, copyrights and package metadata for it, opt for the
3134
:ref:`scan_single_package <pipeline_scan_single_package>` pipeline.
3235
- When dealing with a **Linux root filesystem** (rootfs), the
3336
:ref:`analyze_root_filesystem_or_vm_image <pipeline_analyze_root_filesystem>` pipeline
3437
is the appropriate choice.
3538
- For processing the results of a **ScanCode-toolkit scan** or **ScanCode.io scan**,
3639
use the :ref:`load_inventory <pipeline_load_inventory>` pipeline.
37-
- When you have **manifest files**, such as a
38-
**CycloneDX BOM, SPDX document, lockfile**, etc.,
39-
use the :ref:`inspect_packages <pipeline_inspect_packages>` pipeline.
40+
- When you want to import **SPDX/CycloneDX SBOMs or ABOUT files** into a project,
41+
use the :ref:`load_sbom <pipeline_load_sbom>` pipeline.
42+
- When you have **lockfiles or other package manifests** in a codebase and you want to
43+
resolve packages from their package requirements, use the
44+
:ref:`resolve_dependencies <pipeline_resolve_dependencies>` pipeline.
45+
- When you have application **package archives/codebases** and optionally also
46+
their **pre-resolved dependenices** and you want to **inspect packages**
47+
present in the package manifests and dependency, use the
48+
:ref:`inspect_packages <pipeline_inspect_packages>` pipeline.
4049
- For scenarios involving both a **development and deployment codebase**, consider using
4150
the :ref:`map_deploy_to_develop <pipeline_map_deploy_to_develop>` pipeline.
4251

scanpipe/apps.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,7 @@ def get_new_pipeline_name(pipeline_name):
178178
"inspect_manifest": "inspect_packages",
179179
"deploy_to_develop": "map_deploy_to_develop",
180180
"scan_package": "scan_single_package",
181+
"scan_codebase_packages": "inspect_packages",
181182
}
182183
if new_name := pipeline_old_names_mapping.get(pipeline_name):
183184
warnings.warn(
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Generated by Django 5.0.1 on 2024-02-09 15:05
2+
3+
from django.db import migrations
4+
5+
6+
pipeline_old_names_mapping = {
7+
"scan_codebase_packages": "inspect_packages",
8+
}
9+
10+
11+
def rename_pipelines_data(apps, schema_editor):
12+
Run = apps.get_model("scanpipe", "Run")
13+
for old_name, new_name in pipeline_old_names_mapping.items():
14+
Run.objects.filter(pipeline_name=old_name).update(pipeline_name=new_name)
15+
16+
17+
def reverse_rename_pipelines_data(apps, schema_editor):
18+
Run = apps.get_model("scanpipe", "Run")
19+
for old_name, new_name in pipeline_old_names_mapping.items():
20+
Run.objects.filter(pipeline_name=new_name).update(pipeline_name=old_name)
21+
22+
23+
class Migration(migrations.Migration):
24+
dependencies = [
25+
("scanpipe", "0052_run_selected_groups"),
26+
]
27+
28+
operations = [
29+
migrations.RunPython(
30+
rename_pipelines_data,
31+
reverse_code=reverse_rename_pipelines_data,
32+
),
33+
]

scanpipe/pipelines/inspect_packages.py

Lines changed: 24 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -21,32 +21,23 @@
2121
# Visit https://github.com/nexB/scancode.io for support and download.
2222

2323
from scanpipe.pipelines.scan_codebase import ScanCodebase
24-
from scanpipe.pipes import resolve
25-
from scanpipe.pipes import update_or_create_package
24+
from scanpipe.pipes import scancode
2625

2726

2827
class InspectPackages(ScanCodebase):
2928
"""
30-
Inspect a codebase manifest files and resolve their associated packages.
29+
Inspect a codebase for packages and pre-resolved dependencies.
3130
32-
Supports resolved packages for:
33-
- Python: using nexB/python-inspector, supports requirements.txt and
34-
setup.py manifests as input
31+
This pipeline inspects a codebase for application packages
32+
and their dependencies using package manifests and dependency
33+
lockfiles. It does not resolve dependencies, it does instead
34+
collect already pre-resolved dependencies from lockfiles, and
35+
direct dependencies (possibly not resolved) as found in
36+
package manifests' dependency sections.
3537
36-
Supports:
37-
- BOM: SPDX document, CycloneDX BOM, AboutCode ABOUT file
38-
- Python: requirements.txt, setup.py, setup.cfg, Pipfile.lock
39-
- JavaScript: yarn.lock lockfile, npm package-lock.json lockfile
40-
- Java: Java JAR MANIFEST.MF, Gradle build script
41-
- Ruby: RubyGems gemspec manifest, RubyGems Bundler Gemfile.lock
42-
- Rust: Rust Cargo.lock dependencies lockfile, Rust Cargo.toml package manifest
43-
- PHP: PHP composer lockfile, PHP composer manifest
44-
- NuGet: nuspec package manifest
45-
- Dart: pubspec manifest, pubspec lockfile
46-
- OS: FreeBSD compact package manifest, Debian installed packages database
47-
48-
Full list available at https://scancode-toolkit.readthedocs.io/en/
49-
doc-update-licenses/reference/available_package_parsers.html
38+
See documentation for the list of supported package manifests and
39+
dependency lockfiles:
40+
https://scancode-toolkit.readthedocs.io/en/stable/reference/available_package_parsers.html
5041
"""
5142

5243
@classmethod
@@ -55,46 +46,19 @@ def steps(cls):
5546
cls.copy_inputs_to_codebase_directory,
5647
cls.extract_archives,
5748
cls.collect_and_create_codebase_resources,
49+
cls.flag_empty_files,
5850
cls.flag_ignored_resources,
59-
cls.get_manifest_inputs,
60-
cls.get_packages_from_manifest,
61-
cls.create_resolved_packages,
51+
cls.scan_for_application_packages,
6252
)
6353

64-
def get_manifest_inputs(self):
65-
"""Locate all the manifest files from the project's input/ directory."""
66-
self.manifest_resources = resolve.get_manifest_resources(self.project)
67-
68-
def get_packages_from_manifest(self):
69-
"""Get packages data from manifest files."""
70-
self.resolved_packages = []
71-
72-
if not self.manifest_resources.exists():
73-
self.project.add_warning(
74-
description="No manifests found for resolving packages",
75-
model="get_packages_from_manifest",
76-
)
77-
return
78-
79-
for resource in self.manifest_resources:
80-
if packages := resolve.resolve_packages(resource.location):
81-
self.resolved_packages.extend(packages)
82-
else:
83-
self.project.add_error(
84-
description="No packages could be resolved for",
85-
model="get_packages_from_manifest",
86-
details={"path": resource.path},
87-
)
88-
89-
def create_resolved_packages(self):
90-
"""Create the resolved packages and their dependencies in the database."""
91-
for package_data in self.resolved_packages:
92-
package_data = resolve.set_license_expression(package_data)
93-
dependencies = package_data.pop("dependencies", [])
94-
update_or_create_package(self.project, package_data)
95-
96-
for dependency_data in dependencies:
97-
resolved_package = dependency_data.get("resolved_package")
98-
if resolved_package:
99-
resolved_package.pop("dependencies", [])
100-
update_or_create_package(self.project, resolved_package)
54+
def scan_for_application_packages(self):
55+
"""
56+
Scan resources for package information to add DiscoveredPackage
57+
and DiscoveredDependency objects from detected package data.
58+
"""
59+
# `assemble` is set to False because here in this pipeline we
60+
# only detect package_data in resources and create
61+
# Package/Dependency instances directly instead of assembling
62+
# the packages and assigning files to them
63+
scancode.scan_for_application_packages(self.project, assemble=False)
64+
scancode.process_package_data(self.project)

scanpipe/pipelines/scan_codebase_packages.py renamed to scanpipe/pipelines/load_sbom.py

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,18 @@
2121
# Visit https://github.com/nexB/scancode.io for support and download.
2222

2323
from scanpipe.pipelines.scan_codebase import ScanCodebase
24-
from scanpipe.pipes import scancode
24+
from scanpipe.pipes import resolve
2525

2626

27-
class ScanCodebasePackages(ScanCodebase):
27+
class LoadSBOM(ScanCodebase):
2828
"""
29-
Scan a codebase for PURLs without assembling full packages/dependencies.
29+
Load package data from one or more SBOMs.
3030
31-
This Pipeline is intended for gathering PURL information from a
32-
codebase without the overhead of full package assembly.
31+
Supported SBOMs:
32+
- SPDX document
33+
- CycloneDX BOM
34+
Other formats:
35+
- AboutCode .ABOUT files for package curations.
3336
"""
3437

3538
@classmethod
@@ -40,12 +43,27 @@ def steps(cls):
4043
cls.collect_and_create_codebase_resources,
4144
cls.flag_empty_files,
4245
cls.flag_ignored_resources,
43-
cls.scan_for_application_packages,
46+
cls.get_sbom_inputs,
47+
cls.get_packages_from_sboms,
48+
cls.create_packages_from_sboms,
4449
)
4550

46-
def scan_for_application_packages(self):
47-
"""Scan unknown resources for packages information."""
48-
# `assemble` is set to False because here in this pipeline we
49-
# only detect package_data in resources without creating
50-
# Package/Dependency instances, to get all the purls from a codebase.
51-
scancode.scan_for_application_packages(self.project, assemble=False)
51+
def get_sbom_inputs(self):
52+
"""Locate all the SBOMs among the codebase resources."""
53+
self.manifest_resources = resolve.get_manifest_resources(self.project)
54+
55+
def get_packages_from_sboms(self):
56+
"""Get packages data from SBOMs."""
57+
self.packages = resolve.get_packages(
58+
project=self.project,
59+
package_registry=resolve.sbom_registry,
60+
manifest_resources=self.manifest_resources,
61+
model="get_packages_from_sboms",
62+
)
63+
64+
def create_packages_from_sboms(self):
65+
"""Create the packages and dependencies from the SBOM, in the database."""
66+
resolve.create_packages_and_dependencies(
67+
project=self.project,
68+
packages=self.packages,
69+
)

scanpipe/pipelines/populate_purldb.py

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@
2222

2323
from scanpipe.pipelines import Pipeline
2424
from scanpipe.pipes import purldb
25-
from scanpipe.pipes import scancode
2625

2726

2827
class PopulatePurlDB(Pipeline):
@@ -36,7 +35,6 @@ def steps(cls):
3635
return (
3736
cls.populate_purldb_with_discovered_packages,
3837
cls.populate_purldb_with_discovered_dependencies,
39-
cls.populate_purldb_with_detected_purls,
4038
)
4139

4240
def populate_purldb_with_discovered_packages(self):
@@ -50,26 +48,3 @@ def populate_purldb_with_discovered_dependencies(self):
5048
purldb.populate_purldb_with_discovered_dependencies(
5149
project=self.project, logger=self.log
5250
)
53-
54-
def populate_purldb_with_detected_purls(self):
55-
"""Add DiscoveredPackage to PurlDB."""
56-
no_packages_and_no_dependencies = all(
57-
[
58-
not self.project.discoveredpackages.exists(),
59-
not self.project.discovereddependencies.exists(),
60-
]
61-
)
62-
# Even when there are no packages/dependencies, resource level
63-
# package data could be detected (i.e. when we detect packages,
64-
# but skip the assembly step that creates
65-
# package/dependency instances)
66-
if no_packages_and_no_dependencies:
67-
packages = scancode.get_packages_with_purl_from_resources(self.project)
68-
purls = [{"purl": package.purl} for package in packages]
69-
70-
self.log(f"Populating PurlDB with {len(purls):,d} " "detected PURLs"),
71-
purldb.feed_purldb(
72-
packages=purls,
73-
chunk_size=100,
74-
logger=self.log,
75-
)

0 commit comments

Comments
 (0)