Skip to content

Conversation

@Sanktrip
Copy link
Collaborator

  • Users are now able to configure PII settings

Closes #220

Sankalpa Tripathee and others added 30 commits January 21, 2026 09:41
Implemented regex pattern functionality
@github-actions
Copy link
Contributor

Code Review: PII Settings Feature

Thank you for implementing the PII settings feature! This is a valuable addition that allows users to configure guardrails for personally identifiable information. Below is my detailed review.


🔴 Critical Issues (Must Fix)

1. Example Workflows Using Local References

Files: All example YAML files in examples/ directory
Issue: Changed from uses: h2oai/h2ogpte-action@v0.2.2-beta to uses: ./

This breaks the examples for external users who copy these workflows. The ./ reference only works within this repository during testing.

Required Change:

# Revert to:
uses: h2oai/h2ogpte-action@v0.2.2-beta
# Or update to the new version tag when released

2. Unsafe YAML Parsing Without Documentation

File: src/core/services/h2ogpte/h2ogpte.ts (Lines 436-439)
Issue: Commented out JSON parsing and replaced with yaml.load() without explanation

// This is commented out:
// const guardrailsSettingsPayload = JSON.parse(guardrailsSettings);

// But this is used instead:
const res = GuardRailsSchema.safeParse(yaml.load(guardrailsSettings));

Concerns:

  • Changes the expected input format from JSON to YAML without documenting why
  • yaml.load() can execute arbitrary code with untrusted input (security risk)
  • Commented code should be removed if not needed

Required Changes:

  1. Either remove the commented JSON.parse code or document why YAML is preferred
  2. Consider using yaml.safeLoad() instead of yaml.load() for security
  3. Update documentation to clarify that both JSON and YAML formats are supported

🟡 Major Issues (Should Fix)

1. Breaking API Change

File: src/core/data/utils/attachment-upload.ts
Issue: Function signature changed from uploadAttachmentsToH2oGPTe(attachmentUrlMap) returning string|null to uploadAttachmentsToH2oGPTe(collectionId, attachmentUrlMap) returning void

This is a breaking change. While the internal caller has been updated, consider:

  • Documenting this as a breaking change in release notes
  • Checking if any external code depends on this function

2. Poor Error Messages

File: src/core/services/h2ogpte/h2ogpte.ts (Lines 443-446)

if (!res.success) {
  throw new Error("Invalid guardrails settings");
}

Required Change:

if (!res.success) {
  throw new Error(`Invalid guardrails settings: ${res.error.message}`);
}

This provides users with actionable information about what's wrong with their configuration.

3. Missing Error Handling for Collection Cleanup

File: src/entrypoint.ts (Lines 70-77)

If createGuardRailsSettings() fails after createCollection() succeeds, the collection is left orphaned.

Suggested Fix:

let collectionId: string | null = null;
try {
  collectionId = await createCollection();
  
  await createGuardRailsSettings(
    collectionId,
    process.env.GUARDRAILS_SETTINGS,
  );
  
  await uploadAttachmentsToH2oGPTe(
    collectionId,
    githubData.attachmentUrlMap,
  );
} catch (error) {
  if (collectionId) {
    await deleteCollection(collectionId).catch(() => {
      // Log but don't throw - original error is more important
    });
  }
  throw error;
}

🔵 Minor Issues (Nice to Fix)

1. Unexplained Timeout Change

File: action.yml (Line 44)
Changed default from 3600s to 3400s without explanation. Consider reverting or documenting the reason.

2. Missing Null Check

File: src/entrypoint.ts (Line 71)
createCollection() could return null, but this isn't validated before use.

Suggested Fix:

collectionId = await createCollection();
if (!collectionId) {
  throw new Error("Failed to create collection");
}

3. Debug Logging Issue

File: src/core/services/h2ogpte/h2ogpte.ts (Line 449)

// Current (won't display object properly):
core.debug(`Guardrails settings payload: ${guardrailsSettingsPayload}`);

// Better:
core.debug(`Guardrails settings payload: ${JSON.stringify(guardrailsSettingsPayload)}`);

4. Strict Schema May Cause Future Issues

File: src/core/services/h2ogpte/types.ts (Line 125)

Using .strict() means any extra properties will fail validation. If the API adds new fields in the future, this will break. Consider:

  • Using .passthrough() to allow extra fields
  • Or documenting that only these exact fields are allowed

5. Example Workflows Need Documentation

Files: All example YAML files
The hardcoded guardrails settings lack comments explaining what each setting does. Add inline comments like:

guardrails_settings: |
  {
    "presidio_labels_to_flag": ["CREDIT_CARD", "IBAN_CODE"],  # Flag credit cards and bank codes
    "pii_labels_to_flag": ["ACCOUNTNUMBER", "CREDITCARDNUMBER"],  # Flag account numbers
    "pii_detection_parse_action": "redact",  # Redact PII during parsing
    "pii_detection_llm_input_action": "redact",  # Redact PII before sending to LLM
    "pii_detection_llm_output_action": "redact"  # Redact PII in LLM responses
  }

✅ Positive Aspects

  1. Excellent Test Coverage - 318 lines of comprehensive tests covering edge cases, error scenarios, and various configurations
  2. Good Use of Zod - Runtime type validation provides safety for API inputs
  3. Improved Modularity - Separating collection creation from attachment upload improves code organization
  4. Proper Dependencies - Added necessary packages (js-yaml, zod) with TypeScript definitions

Summary

Overall Assessment: This PR adds valuable PII protection functionality, but has several issues that need to be addressed before merging.

Priority Actions:

  1. ✅ Fix the example workflow references (critical for users)
  2. ✅ Address the YAML parsing security concern
  3. ✅ Improve error messages for better debugging
  4. ✅ Add error handling for collection cleanup

Stats:

  • 🔴 Critical Issues: 6
  • 🟡 Major Issues: 3
  • 🔵 Minor Issues: 10
  • ✅ Positive Aspects: 4

Please address the critical and major issues before merging. The minor issues can be addressed in follow-up PRs if needed. Great work on the comprehensive test coverage! 🎉

Copy link
Collaborator

@MillenniumForce MillenniumForce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sankalpa!! I added some comments.

One final addition, would you be able to adjust the configuration docs and usage examples please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enterprise Settings: PII

2 participants