[Security] Fix CRITICAL vulnerability: V-001 #6219
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Security Fix
This PR addresses a CRITICAL severity vulnerability detected by our security scanner.
Security Impact Assessment
Evidence: Proof-of-Concept Exploitation Demo
This demonstration shows how the vulnerability could be exploited to help you understand its severity and prioritize remediation.
How This Vulnerability Can Be Exploited
The vulnerability in
mediapipe/model_maker/python/vision/object_detector/dataset_util.pyinvolves the use of MD5 hashing for data integrity checks, such as verifying the contents of training datasets or cached model artifacts. An attacker can exploit this by generating MD5 collisions—creating two different datasets (one legitimate, one malicious) that produce the same hash—allowing the malicious dataset to bypass validation and be used in model training, potentially leading to poisoned object detection models that misclassify objects in production applications. To demonstrate this, the following Python script uses known MD5 collision techniques (based on the 2004 Wang et al.The vulnerability in
mediapipe/model_maker/python/vision/object_detector/dataset_util.pyinvolves the use of MD5 hashing for data integrity checks, such as verifying the contents of training datasets or cached model artifacts. An attacker can exploit this by generating MD5 collisions—creating two different datasets (one legitimate, one malicious) that produce the same hash—allowing the malicious dataset to bypass validation and be used in model training, potentially leading to poisoned object detection models that misclassify objects in production applications.To demonstrate this, the following Python script uses known MD5 collision techniques (based on the 2004 Wang et al. attack) to create two colliding byte sequences. It then simulates interaction with MediaPipe's
dataset_util.pyby importing and using its hashing function (assuming it's exposed or can be invoked via the Model Maker API). In a real attack, an attacker could provide a malicious dataset file that collides with a legitimate one, tricking the system into accepting it during dataset preparation for object detector training. This PoC assumes the attacker has access to provide or modify dataset files (e.g., via a compromised pipeline or user input), which is plausible in ML workflows where datasets are uploaded or shared.Exploitation Impact Assessment
Vulnerability Details
V-001mediapipe/model_maker/python/vision/object_detector/dataset_util.pyChanges Made
This automated fix addresses the vulnerability by applying security best practices.
Files Modified
mediapipe/model_maker/python/core/utils/file_util.pymediapipe/model_maker/python/text/text_classifier/dataset.pymediapipe/model_maker/python/vision/object_detector/dataset_util.pyVerification
This fix has been automatically verified through:
🤖 This PR was automatically generated.