Skip to content

Document matching algorithm interface #593

@tschmidtb51

Description

@tschmidtb51

We need to document the matching algorithm interface for CSAF asset matching systems and CSAF SBOM matching systems. Both work similar:

Input

match(product_tree, asset_database_connection, matching_threshold)

 - resp. - 
 
match(product_tree, sbom_database_connection, matching_threshold)

Output

for each product_id in product_tree:
   a list of tuples (asset_id, probability, matching_reason)

- resp -

for each product_id in product_tree:
   a list of tuples (sbom_component_id, probability, matching_reason)

Note 1: The invoking function needs to store the context - as product ids are document local in a CSAF document.
Note 2: This covers only the one CSAF document to many/one assets/SBOMs. Please note that view of many matched CSAF documents to one assets/SBOMs is also a part of the requirements. It should be easy to generate that view once the matching is done - just by selecting the corresponding query criteria over all matches.

Matching

Algorithm - priorities:

  1. Match based on product_identification_helper. Different ones might imply a different confidence: An sbom_url or serial_number might be stronger than a cpe.
  2. Match based on the categorized strings (value of name) in the branches (e. g. vendor, product_name, product_version).
  3. Match on the human-readable full_product_name_t/name.

The algorithm may end after it created a sufficient result - it can, but does not have to go through all steps.

Edit: The experience shows, we also want to provide a matching_threshold that allows us to fine tune what the lowest probability is that we get results for (a matching_threshold of 0 would give for each asset/SBOM component the probability that it matched with (which might be 0 if those are completely different)) and the matching_reason which provides insights into the confidence and helps debugging (a direct match on a serial number would potentially better than a match on the human-readable string).

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions