Voice Input Node

The Voice Input node listens for the caller's speech, transcribes it using automatic speech recognition (ASR), and stores the resulting text in a session variable. It supports configurable timeouts, language selection, and custom vocabulary hints.

When to Use

You need to capture spoken input from a caller in a voice bot.
You want to transcribe free-form speech for processing by an LLM or condition node.
You need to capture a specific piece of information (name, account number) via voice.
You want to use speech context hints to improve recognition accuracy for domain-specific terms.

Configuration

Property	Description	Default
ASR Integration	Optional override for the ASR integration. Leave empty to use the tenant default.	Empty
Output Variable	The session variable to store the transcribed text.	`voice_input`
Language	The ASR language. Available: en-US, en-GB, es-ES, es-MX, fr-FR, de-DE, it-IT, pt-BR, ja-JP, zh-CN.	`en-US`

Advanced Settings

Property	Description	Default
Listen Timeout	Maximum time to listen for speech (1,000 to 60,000 ms).	`10000` (10 seconds)
Silence Timeout	How long to wait after the caller stops speaking before finalizing input (500 to 10,000 ms).	`2000` (2 seconds)
Single Utterance	Stop listening after the first pause in speech.	`true`
Speech Context Hints	A list of words or phrases to boost recognition accuracy (e.g., product names, account number formats).	Empty

imageVoice Input node config panel showing ASR integration selector, output variable field, language dropdown with options like en-US, en-GB, es-ES, listen timeout and silence timeout inputs, and speech context hints list

Voice Input configuration with language selector

Channel Behavior

Channel	Behavior
Voice	Listens for speech via the configured ASR provider
Web / SMS / WhatsApp	Waits for a text message from the user (ASR is skipped)

Handles

Handle	Direction	Description
Input	In	Receives execution from the previous node
Output	Out	Continues to the next node after speech is transcribed

TIP

Add speech context hints for domain-specific vocabulary that ASR may struggle with. For example, product names, medical terms, or unusual proper nouns. This significantly improves transcription accuracy.

WARNING

If the caller does not speak within the listen timeout, the output variable will be empty. Use a Condition node after the Voice Input to handle the timeout case.

Voice Input Node ​

When to Use ​

Configuration ​

Advanced Settings ​

Channel Behavior ​