Content Safety

Content Safety protects your bots and users from adversarial prompts, PII leakage, and harmful content. Three features can be enabled independently.

Feature Toggles

Feature	Description
Google Model Armor	Cloud-based adversarial prompt defense — detects prompt injection, jailbreak attempts, and harmful content
Guardrails	Pattern-based content filtering — blocks messages matching configured patterns and masks PII
Fraud Detector	Conversation-level fraud analysis — detects social engineering, PII extraction, and data leakage

Model Armor Configuration

When Model Armor is enabled, configure:

Setting	Description
Location	GCP region where Model Armor runs (must match your deployment)
Template ID	Model Armor template identifier (created in GCP console or via Terraform)
RAI Sensitivity	Responsible AI sensitivity level: Low, Medium, or High
Injection Sensitivity	Prompt injection detection sensitivity: Low, Medium, or High

TIP

Start with Medium sensitivity for both RAI and injection detection. Increase to High if you see false negatives. Lower to Low if legitimate messages are being blocked.

Guardrails Configuration

When Guardrails is enabled:

PII Masking — toggle to automatically redact detected PII (SSN, credit card, phone numbers, etc.) from messages before they reach the bot or agent
Blocked Patterns — a list of regex patterns or keywords that block messages when matched. Add patterns one at a time and remove them with the delete button.

imageContent safety page showing feature toggle cards for Google Model Armor, Guardrails, and Fraud Detector, with Model Armor configuration panel expanded showing location, template ID, RAI sensitivity, and injection sensitivity settings

Content safety toggles and Model Armor configuration

imagePII masking examples showing a conversation message with SSN digits replaced by asterisks and credit card number partially redacted, demonstrating the guardrails PII masking feature

PII masking examples

Customization

Setting	Description
Block Message	The message shown to users when their input is blocked (customizable text)
Log Blocked Messages	Toggle whether blocked messages are logged for review

Saving

Click Save to persist all content safety settings. Changes take effect immediately for new conversations.

Content Safety ​

Feature Toggles ​

Model Armor Configuration ​

Guardrails Configuration ​

Customization ​