WebSocket Streaming
The Kallglot WebSocket API provides real-time bidirectional streaming for audio data, transcripts, and translations. Connect to the WebSocket URL provided when you create a session.Connection
Connect to the WebSocket URL from the session creation response:Authentication
Include the session token in the Authorization header or as a query parameter:Message Format
All messages are JSON-encoded. Each message has atype field that indicates the message type.
Client → Server Messages
Audio Data
Send audio data to be processed:| Field | Type | Description |
|---|---|---|
type | string | Always audio |
data | string | Base64-encoded audio data |
encoding | string | Audio encoding: pcm_16, pcm_8, mulaw, alaw |
sample_rate | number | Sample rate in Hz (8000, 16000, 24000, 48000) |
channels | number | Number of channels (1 for mono, 2 for stereo) |
For best transcription quality, use 16-bit PCM at 16kHz mono. Send audio in chunks of 100-200ms.
Control Messages
| Action | Description |
|---|---|
mute | Stop processing incoming audio |
unmute | Resume processing incoming audio |
pause_transcription | Stop generating transcripts |
resume_transcription | Resume generating transcripts |
End Stream
Signal that you’re done sending audio:Server → Client Messages
Transcript
Real-time transcription results:| Field | Type | Description |
|---|---|---|
type | string | Always transcript |
id | string | Segment identifier |
speaker | string | Speaker identification (agent, customer, or channel ID) |
language | string | Detected language code |
text | string | Transcribed text |
is_final | boolean | Whether this is the final version of this segment |
start_time | number | Segment start time in seconds |
confidence | number | Transcription confidence (0-1) |
Interim results (
is_final: false) are sent as the user speaks. Final results replace interim results with the same id.Translation
Translated text:Audio Response
Translated speech audio (for bidirectional translation mode):Status Events
Session status updates:| Status | Description |
|---|---|
connecting | Establishing provider connection |
connected | Session is fully active |
provider_disconnected | Phone call ended |
ending | Session is ending |
ended | Session has ended |
Error Events
Connection Example
Browser Example
Connection Lifecycle
Best Practices
Use appropriate audio chunk sizes
Use appropriate audio chunk sizes
Send audio in 100-200ms chunks for optimal latency and transcription quality. Smaller chunks increase network overhead; larger chunks increase perceived latency.
Handle interim results correctly
Handle interim results correctly
Interim results (
is_final: false) should be displayed but may be revised. Replace them when you receive the final result with the same id.Implement reconnection logic
Implement reconnection logic
WebSocket connections may drop due to network issues. Implement exponential backoff reconnection and resume streaming.
Buffer audio during disconnection
Buffer audio during disconnection
If the connection drops briefly, buffer audio data and send it when reconnected to avoid gaps in transcription.
Handle back-pressure
Handle back-pressure
If you’re sending audio faster than it can be processed, you may receive a
throttle control message. Pause sending until you receive a resume message.Error Codes
| Code | Description | Recoverable |
|---|---|---|
audio_format_invalid | Unsupported audio format | Yes |
audio_sample_rate_invalid | Unsupported sample rate | Yes |
token_expired | Session token has expired | No |
session_ended | Session has already ended | No |
rate_limit_exceeded | Too many messages | Yes |
internal_error | Server error | Retry |