Theme
Voice Input Node
The Voice Input node listens for the caller's speech, transcribes it using automatic speech recognition (ASR), and stores the resulting text in a session variable. It supports configurable timeouts, language selection, and custom vocabulary hints.
When to Use
- You need to capture spoken input from a caller in a voice bot.
- You want to transcribe free-form speech for processing by an LLM or condition node.
- You need to capture a specific piece of information (name, account number) via voice.
- You want to use speech context hints to improve recognition accuracy for domain-specific terms.
Configuration
| Property | Description | Default |
|---|---|---|
| ASR Integration | Optional override for the ASR integration. Leave empty to use the tenant default. | Empty |
| Output Variable | The session variable to store the transcribed text. | voice_input |
| Language | The ASR language. Available: en-US, en-GB, es-ES, es-MX, fr-FR, de-DE, it-IT, pt-BR, ja-JP, zh-CN. | en-US |
Advanced Settings
| Property | Description | Default |
|---|---|---|
| Listen Timeout | Maximum time to listen for speech (1,000 to 60,000 ms). | 10000 (10 seconds) |
| Silence Timeout | How long to wait after the caller stops speaking before finalizing input (500 to 10,000 ms). | 2000 (2 seconds) |
| Single Utterance | Stop listening after the first pause in speech. | true |
| Speech Context Hints | A list of words or phrases to boost recognition accuracy (e.g., product names, account number formats). | Empty |
Voice Input node config panel showing ASR integration selector, output variable field, language dropdown with options like en-US, en-GB, es-ES, listen timeout and silence timeout inputs, and speech context hints list
Channel Behavior
| Channel | Behavior |
|---|---|
| Voice | Listens for speech via the configured ASR provider |
| Web / SMS / WhatsApp | Waits for a text message from the user (ASR is skipped) |
Handles
| Handle | Direction | Description |
|---|---|---|
| Input | In | Receives execution from the previous node |
| Output | Out | Continues to the next node after speech is transcribed |
TIP
Add speech context hints for domain-specific vocabulary that ASR may struggle with. For example, product names, medical terms, or unusual proper nouns. This significantly improves transcription accuracy.
WARNING
If the caller does not speak within the listen timeout, the output variable will be empty. Use a Condition node after the Voice Input to handle the timeout case.
