Skip to content

Allow admins to customize Content Safety violation messages with Markdown support #989

Description

@paullizer

Summary

Admins should be able to customize the user-facing language shown when a chat message is blocked by Content Safety. The customized message should support Markdown so organizations can provide clearer policy language, links to internal acceptable-use policies, escalation instructions, or localized wording.

Today, the blocked message text is hard-coded in the chat backend when Azure Content Safety blocks a user message. The chat UI already renders safety messages through the existing Markdown + DOMPurify path, so this feature should expose the message as an admin-configurable setting while preserving sanitized rendering.

User Value

Organizations often need their own compliance, legal, HR, or security wording for safety violations. A configurable message lets admins:

  • Match the organization's tone and policy language.
  • Add links to internal acceptable-use or responsible AI policies.
  • Explain what the user should do next.
  • Support environment-specific or localized wording without code changes.

Proposed Behavior

Add an Admin Settings field under the existing Content Safety section that lets admins configure the blocked-message template.

The template should:

  • Support Markdown.
  • Have a safe default that matches or improves the current behavior.
  • Allow approved placeholders for violation context, such as:
    • {reasons}
    • {triggered_categories}
    • {blocklist_matches}
    • {violation_link}
  • Be used when a user message is blocked by Content Safety in chat.
  • Fall back to the default message if the admin-provided template is blank or invalid.
  • Render safely in the chat UI using the existing local Markdown and sanitization assets.

Acceptance Criteria

  • Admin Settings includes a Content Safety violation message template field in the existing Content Safety section.
  • The setting is persisted with the rest of app settings and has a documented default value.
  • When Content Safety blocks a user message, the persisted safety chat message uses the configured template instead of hard-coded copy.
  • Markdown in the configured message renders in the chat UI for both newly blocked messages and previously loaded safety messages.
  • Rendered Markdown is sanitized before insertion into the DOM.
  • The implementation does not add CDN-hosted JavaScript or frontend runtime dependencies.
  • Placeholder values are escaped/sanitized and cannot inject unsafe HTML or scripts.
  • Blank or malformed templates fall back to the default violation message.
  • Functional test coverage verifies default behavior, customized Markdown behavior, placeholder substitution, and fallback behavior.
  • Admin/settings documentation is updated to describe the template and supported placeholders.

Notes

Relevant implementation areas:

  • Backend hard-coded blocked message: application/single_app/route_backend_chats.py
  • Safety message persistence helper: application/single_app/route_backend_chats.py
  • Default settings: application/single_app/functions_settings.py
  • Admin Content Safety settings UI: application/single_app/templates/admin_settings.html
  • Admin settings JS behavior: application/single_app/static/js/admin/admin_settings.js
  • Safety message rendering path: application/single_app/static/js/chat/chat-messages.js

Current behavior builds a blocked message like:

  • Your message was blocked by Content Safety.
  • Reason
  • Triggered categories
  • Blocklist matches

The existing chat UI already routes safety messages through marked and DOMPurify, so this feature should reuse that local sanitized Markdown rendering path rather than introducing new frontend assets.

Open Questions

  • Should admins be able to preview the rendered Markdown in Admin Settings?
  • Should placeholders be required to preserve violation details, or should admins be allowed to hide category/blocklist details from end users?
  • Should the message support multiple templates in the future, such as separate templates for category severity blocks vs. blocklist matches?
  • Should there be a maximum template length?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Pending Evaluation

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions