Skip to content

Voice Input Node

The Voice Input node listens for the caller's speech, transcribes it using automatic speech recognition (ASR), and stores the resulting text in a session variable. It supports configurable timeouts, language selection, and custom vocabulary hints.

When to Use

  • You need to capture spoken input from a caller in a voice bot.
  • You want to transcribe free-form speech for processing by an LLM or condition node.
  • You need to capture a specific piece of information (name, account number) via voice.
  • You want to use speech context hints to improve recognition accuracy for domain-specific terms.

Configuration

PropertyDescriptionDefault
ASR IntegrationOptional override for the ASR integration. Leave empty to use the tenant default.Empty
Output VariableThe session variable to store the transcribed text.voice_input
LanguageThe ASR language. Available: en-US, en-GB, es-ES, es-MX, fr-FR, de-DE, it-IT, pt-BR, ja-JP, zh-CN.en-US

Advanced Settings

PropertyDescriptionDefault
Listen TimeoutMaximum time to listen for speech (1,000 to 60,000 ms).10000 (10 seconds)
Silence TimeoutHow long to wait after the caller stops speaking before finalizing input (500 to 10,000 ms).2000 (2 seconds)
Single UtteranceStop listening after the first pause in speech.true
Speech Context HintsA list of words or phrases to boost recognition accuracy (e.g., product names, account number formats).Empty
imageVoice Input node config panel showing ASR integration selector, output variable field, language dropdown with options like en-US, en-GB, es-ES, listen timeout and silence timeout inputs, and speech context hints list
Voice Input configuration with language selector

Channel Behavior

ChannelBehavior
VoiceListens for speech via the configured ASR provider
Web / SMS / WhatsAppWaits for a text message from the user (ASR is skipped)

Handles

HandleDirectionDescription
InputInReceives execution from the previous node
OutputOutContinues to the next node after speech is transcribed

TIP

Add speech context hints for domain-specific vocabulary that ASR may struggle with. For example, product names, medical terms, or unusual proper nouns. This significantly improves transcription accuracy.

WARNING

If the caller does not speak within the listen timeout, the output variable will be empty. Use a Condition node after the Voice Input to handle the timeout case.

OmniBots AI Bot Platform