We need to document the matching algorithm interface for CSAF asset matching systems and CSAF SBOM matching systems. Both work similar:
Input
match(product_tree, asset_database_connection, matching_threshold)
- resp. -
match(product_tree, sbom_database_connection, matching_threshold)
Output
for each product_id in product_tree:
a list of tuples (asset_id, probability, matching_reason)
- resp -
for each product_id in product_tree:
a list of tuples (sbom_component_id, probability, matching_reason)
Note 1: The invoking function needs to store the context - as product ids are document local in a CSAF document.
Note 2: This covers only the one CSAF document to many/one assets/SBOMs. Please note that view of many matched CSAF documents to one assets/SBOMs is also a part of the requirements. It should be easy to generate that view once the matching is done - just by selecting the corresponding query criteria over all matches.
Matching
Algorithm - priorities:
- Match based on
product_identification_helper. Different ones might imply a different confidence: An sbom_url or serial_number might be stronger than a cpe.
- Match based on the categorized strings (value of
name) in the branches (e. g. vendor, product_name, product_version).
- Match on the human-readable
full_product_name_t/name.
The algorithm may end after it created a sufficient result - it can, but does not have to go through all steps.
Edit: The experience shows, we also want to provide a matching_threshold that allows us to fine tune what the lowest probability is that we get results for (a matching_threshold of 0 would give for each asset/SBOM component the probability that it matched with (which might be 0 if those are completely different)) and the matching_reason which provides insights into the confidence and helps debugging (a direct match on a serial number would potentially better than a match on the human-readable string).
We need to document the matching algorithm interface for CSAF asset matching systems and CSAF SBOM matching systems. Both work similar:
Input
Output
Note 1: The invoking function needs to store the context - as product ids are document local in a CSAF document.
Note 2: This covers only the one CSAF document to many/one assets/SBOMs. Please note that view of many matched CSAF documents to one assets/SBOMs is also a part of the requirements. It should be easy to generate that view once the matching is done - just by selecting the corresponding query criteria over all matches.
Matching
Algorithm - priorities:
product_identification_helper. Different ones might imply a different confidence: Ansbom_urlorserial_numbermight be stronger than acpe.name) in thebranches(e. g.vendor,product_name,product_version).full_product_name_t/name.The algorithm may end after it created a sufficient result - it can, but does not have to go through all steps.
Edit: The experience shows, we also want to provide a
matching_thresholdthat allows us to fine tune what the lowest probability is that we get results for (amatching_thresholdof0would give for each asset/SBOM component the probability that it matched with (which might be 0 if those are completely different)) and thematching_reasonwhich provides insights into the confidence and helps debugging (a direct match on a serial number would potentially better than a match on the human-readable string).