Theme
Content Safety
Content Safety protects your bots and users from adversarial prompts, PII leakage, and harmful content. Three features can be enabled independently.
Feature Toggles
| Feature | Description |
|---|---|
| Google Model Armor | Cloud-based adversarial prompt defense — detects prompt injection, jailbreak attempts, and harmful content |
| Guardrails | Pattern-based content filtering — blocks messages matching configured patterns and masks PII |
| Fraud Detector | Conversation-level fraud analysis — detects social engineering, PII extraction, and data leakage |
Model Armor Configuration
When Model Armor is enabled, configure:
| Setting | Description |
|---|---|
| Location | GCP region where Model Armor runs (must match your deployment) |
| Template ID | Model Armor template identifier (created in GCP console or via Terraform) |
| RAI Sensitivity | Responsible AI sensitivity level: Low, Medium, or High |
| Injection Sensitivity | Prompt injection detection sensitivity: Low, Medium, or High |
TIP
Start with Medium sensitivity for both RAI and injection detection. Increase to High if you see false negatives. Lower to Low if legitimate messages are being blocked.
Guardrails Configuration
When Guardrails is enabled:
- PII Masking — toggle to automatically redact detected PII (SSN, credit card, phone numbers, etc.) from messages before they reach the bot or agent
- Blocked Patterns — a list of regex patterns or keywords that block messages when matched. Add patterns one at a time and remove them with the delete button.
Content safety page showing feature toggle cards for Google Model Armor, Guardrails, and Fraud Detector, with Model Armor configuration panel expanded showing location, template ID, RAI sensitivity, and injection sensitivity settings
PII masking examples showing a conversation message with SSN digits replaced by asterisks and credit card number partially redacted, demonstrating the guardrails PII masking feature
Customization
| Setting | Description |
|---|---|
| Block Message | The message shown to users when their input is blocked (customizable text) |
| Log Blocked Messages | Toggle whether blocked messages are logged for review |
Saving
Click Save to persist all content safety settings. Changes take effect immediately for new conversations.
