Documentation Index
Fetch the complete documentation index at: https://developer.kallglot.com/llms.txt
Use this file to discover all available pages before exploring further.
WebSocket Streaming
The Kallglot WebSocket API provides real-time bidirectional streaming for audio data, transcripts, and translations. Connect to the WebSocket URL provided when you create a session.Connection
Connect to the WebSocket URL from the session creation response:Authentication
Include the session token as a query parameter:Stream tokens expire after 5 minutes and are validated only when the WebSocket connects.
Message Format
All messages are JSON-encoded. Each message has atype field that indicates the message type.
Client → Server Messages
Audio Data
Send audio data to be processed:| Field | Type | Description |
|---|---|---|
type | string | Always audio.input |
sequence | number | Sequence number to preserve order |
timestamp_ms | number | Client timestamp in milliseconds |
speaker | string | Identifies the speaker |
audio.encoding | string | Must be mulaw |
audio.sample_rate_hz | number | Must be 8000 |
audio.payload | string | Base64-encoded mu-law audio (one channel) |
API v1 currently accepts inbound stream audio only as mu-law (
encoding: mulaw) at 8 kHz mono. Anything else responds with session.error and invalid_audio_format. Send chunks roughly 100–200 ms apart for predictable latency (typical telphony framing).Ping
{ "type": "pong", "session_id": "sess_..." }.
End session (WebSocket)
To end the session from the streaming client:{ "type": "session.ended", "session_id": "...", "reason": "explicit_end" }, then closes the receive loop.
Typical flow:
audio.input to send microphone audio, ping occasionally, session.end when finished. Muting or pausing transcripts is not available as separate socket commands today.Server → Client Messages
Transcript
Real-time transcription results:| Field | Type | Description |
|---|---|---|
type | string | transcript.partial or transcript.final |
speaker | string | Speaker identification (agent, customer, or channel ID) |
language | string | Detected language code |
text | string | Transcribed text |
sequence | number | Incrementing segment index (transcript.final only) |
confidence | number | Detection confidence when available (transcript.final only) |
timestamp | string | Segment timestamp (ISO 8601; transcript.final only) |
translation | object | Present on transcript.final when translation ran; { "language", "text" } |
Interim results (
transcript.partial) stream while audio is processed. Finals (transcript.final) finalize a segment for the session transcript once Kallglot has a stable recognition result (and include translation payload when configured).Audio Response
Translated speech audio (for bidirectional translation mode):Status Events
Session status updates:| Type | Description |
|---|---|
session.ready | Session is fully active and provider connected |
session.ended | Session has ended |
Session error events
| Field | Description |
|---|---|
type | Always session.error |
session_id | Session id |
error.code | Machine-readable code (for example unknown message type uses invalid_message; oversize payloads use message_too_large; bad audio encoding uses codes from errors) |
error.message | Human-readable explanation |
Connection Example
Browser Example
Provide
downsampleEncodeMulawFrames (or reuse your PSTN codecs) so the browser emits mono 8 kHz mu-law Base64 payloads; PCM at 48 kHz is not accepted directly.Connection Lifecycle
Best Practices
Use appropriate audio chunk sizes
Use appropriate audio chunk sizes
Send audio in 100-200ms chunks for optimal latency and transcription quality. Smaller chunks increase network overhead; larger chunks increase perceived latency.
Handle interim results correctly
Handle interim results correctly
Interim payloads use
type: transcript.partial; treat them as non-final UI hints until transcript.final arrives with a sequence.Implement reconnection logic
Implement reconnection logic
WebSocket connections may drop due to network issues. Implement exponential backoff reconnection and resume streaming.
Buffer audio during disconnection
Buffer audio during disconnection
If the connection drops briefly, buffer audio data and send it when reconnected to avoid gaps in transcription.
Socket error payloads
These appear onsession.error as error.code while the HTTP connection stays open:
| Code | When |
|---|---|
invalid_audio_format | audio.input encoding or sample rate is not mu-law @ 8000 Hz |
invalid_message | Payload is not valid JSON |
message_too_large | Incoming WebSocket message exceeds the documented size ceiling |