WebSocket Streaming

The Kallglot WebSocket API provides real-time bidirectional streaming for audio data, transcripts, and translations. Connect to the WebSocket URL provided when you create a session.

Connection

Connect to the WebSocket URL from the session creation response:

wss://api.kallglot.com/v1/sessions/{session_id}/connect?token={stream_token}

Authentication

Include the session token as a query parameter:

const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);

Stream tokens expire after 5 minutes and are validated only when the WebSocket connects.

Message Format

All messages are JSON-encoded. Each message has a type field that indicates the message type.

Client → Server Messages

Audio Data

Send audio data to be processed:

{
  "type": "audio.input",
  "sequence": 1729481920,
  "timestamp_ms": 1729481920000,
  "speaker": "customer",
  "audio": {
    "encoding": "mulaw",
    "sample_rate_hz": 8000,
    "payload": "base64-encoded-audio-data"
  }
}

Field	Type	Description
`type`	string	Always `audio.input`
`sequence`	number	Sequence number to preserve order
`timestamp_ms`	number	Client timestamp in milliseconds
`speaker`	string	Identifies the speaker
`audio.encoding`	string	Must be `mulaw`
`audio.sample_rate_hz`	number	Must be `8000`
`audio.payload`	string	Base64-encoded mu-law audio (one channel)

API v1 currently accepts inbound stream audio only as mu-law (encoding: mulaw) at 8 kHz mono. Anything else responds with session.error and invalid_audio_format. Send chunks roughly 100–200 ms apart for predictable latency (typical telphony framing).

Ping

{
  "type": "ping"
}

Server responds with { "type": "pong", "session_id": "sess_..." }.

End session (WebSocket)

To end the session from the streaming client:

{
  "type": "session.end"
}

The server persists session end side effects and responds with { "type": "session.ended", "session_id": "...", "reason": "explicit_end" }, then closes the receive loop.

Typical flow: audio.input to send microphone audio, ping occasionally, session.end when finished. Muting or pausing transcripts is not available as separate socket commands today.

Server → Client Messages

Transcript

Real-time transcription results:

{
  "type": "transcript.partial",
  "speaker": "customer",
  "language": "de",
  "text": "Ich habe eine "
}

{
  "type": "transcript.final",
  "sequence": 1,
  "speaker": "customer",
  "language": "de",
  "text": "Ich habe eine Frage",
  "confidence": 0.94,
  "timestamp": "2026-03-26T11:03:58.902000Z",
  "translation": {
    "language": "en",
    "text": "I have a question"
  }
}

Field	Type	Description
`type`	string	`transcript.partial` or `transcript.final`
`speaker`	string	Speaker identification (`agent`, `customer`, or channel ID)
`language`	string	Detected language code
`text`	string	Transcribed text
`sequence`	number	Incrementing segment index (`transcript.final` only)
`confidence`	number	Detection confidence when available (`transcript.final` only)
`timestamp`	string	Segment timestamp (ISO 8601; `transcript.final` only)
`translation`	object	Present on `transcript.final` when translation ran; `{ "language", "text" }`

Interim results (transcript.partial) stream while audio is processed. Finals (transcript.final) finalize a segment for the session transcript once Kallglot has a stable recognition result (and include translation payload when configured).

Audio Response

Translated speech audio (for bidirectional translation mode):

{
  "type": "audio.output",
  "sequence": 12,
  "timestamp_ms": 1729482055000,
  "speaker": "translated_agent",
  "audio": {
    "encoding": "mulaw",
    "sample_rate_hz": 8000,
    "payload": "base64-encoded-mulaw-bytes"
  }
}

Status Events

Session status updates:

{
  "type": "session.ready",
  "session_id": "sess_01HXYZ"
}

Type	Description
`session.ready`	Session is fully active and provider connected
`session.ended`	Session has ended

Session error events

{
  "type": "session.error",
  "session_id": "sess_01HXYZ123456789",
  "error": {
    "code": "invalid_audio_format",
    "message": "Only mulaw audio at 8000Hz is supported in API v1"
  }
}

Field	Description
`type`	Always `session.error`
`session_id`	Session id
`error.code`	Machine-readable code (for example unknown message type uses `invalid_message`; oversize payloads use `message_too_large`; bad audio encoding uses codes from `errors`)
`error.message`	Human-readable explanation

Connection Example

import WebSocket from 'ws';

const session = await createSession();

const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);

ws.on('open', () => {
  console.log('Connected to Kallglot stream');
});

ws.on('message', (data) => {
  const message = JSON.parse(data);

  switch (message.type) {
    case 'transcript.partial':
      console.log(`[Partial] ${message.speaker}: ${message.text}`);
      break;

    case 'transcript.final':
      console.log(`[${message.speaker}] ${message.text}`);
      if (message.translation) {
        console.log(`  → ${message.translation.text}`);
      }
      break;

    case 'audio.output':
      // Play translated audio
      playAudio(Buffer.from(message.audio.payload, 'base64'));
      break;

    case 'session.ready':
      console.log('Session ready');
      break;

    case 'session.error':
      console.error(`Error [${message.error.code}]: ${message.error.message}`);
      break;

    case 'session.ended':
      console.log('Session ended:', message.reason);
      ws.close();
      break;
  }
});

ws.on('close', (code, reason) => {
  console.log(`Connection closed: ${code} - ${reason}`);
});

ws.on('error', (error) => {
  console.error('WebSocket error:', error);
});

// Send audio data
function sendAudio(audioBuffer) {
  ws.send(JSON.stringify({
    type: 'audio.input',
    sequence: Date.now(),
    timestamp_ms: Date.now(),
    speaker: 'customer',
    audio: {
      payload: audioBuffer.toString('base64'),
      encoding: 'mulaw',
      sample_rate_hz: 8000
    }
  }));
}

// End session from the client (mirrors REST POST /v1/sessions/:id/end)
function endSession() {
  ws.send(JSON.stringify({ type: 'session.end' }));
}

Browser Example

// Create session (call your backend)
const session = await fetch('/api/sessions', {
  method: 'POST',
  body: JSON.stringify({ mode: 'bidirectional_translation' })
}).then(r => r.json());

// Connect to WebSocket
const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);

// Get microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

source.connect(processor);
processor.connect(audioContext.destination);

processor.onaudioprocess = (e) => {
  const inputData = e.inputBuffer.getChannelData(0);
  const pcm16 = new Int16Array(inputData.length);

  for (let i = 0; i < inputData.length; i++) {
    pcm16[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
  }

  const mulawBase64 = downsampleEncodeMulawFrames(pcm16);
  if (!mulawBase64 || ws.readyState !== WebSocket.OPEN) {
    return;
  }

  ws.send(JSON.stringify({
    type: 'audio.input',
    sequence: Date.now(),
    timestamp_ms: Date.now(),
    speaker: 'customer',
    audio: {
      payload: mulawBase64,
      encoding: 'mulaw',
      sample_rate_hz: 8000
    }
  }));
};

Provide downsampleEncodeMulawFrames (or reuse your PSTN codecs) so the browser emits mono 8 kHz mu-law Base64 payloads; PCM at 48 kHz is not accepted directly.

Connection Lifecycle

Best Practices

Use appropriate audio chunk sizes

Send audio in 100-200ms chunks for optimal latency and transcription quality. Smaller chunks increase network overhead; larger chunks increase perceived latency.

Handle interim results correctly

Interim payloads use type: transcript.partial; treat them as non-final UI hints until transcript.final arrives with a sequence.

Implement reconnection logic

WebSocket connections may drop due to network issues. Implement exponential backoff reconnection and resume streaming.

Buffer audio during disconnection

If the connection drops briefly, buffer audio data and send it when reconnected to avoid gaps in transcription.

Socket error payloads

These appear on session.error as error.code while the HTTP connection stays open:

Code	When
`invalid_audio_format`	`audio.input` encoding or sample rate is not mu-law @ 8000 Hz
`invalid_message`	Payload is not valid JSON
`message_too_large`	Incoming WebSocket message exceeds the documented size ceiling

Invalid or expired stream tokens and ended sessions normally fail during connect via WebSocket close codes (see Error Codes).

Getting Started

Core Concepts

Sessions

Streaming

Recordings

Analysis

Webhooks

Guides

Reference

WebSocket Streaming

WebSocket Streaming

Connection

Authentication

Message Format

Client → Server Messages

Audio Data

Ping

End session (WebSocket)

Server → Client Messages

Transcript

Audio Response

Status Events

Session error events

Connection Example

Browser Example

Connection Lifecycle

Best Practices

Socket error payloads

Getting Started

Core Concepts

Sessions

Streaming

Recordings

Analysis

Webhooks

Guides

Reference

Documentation Index

​WebSocket Streaming

​Connection

​Authentication

​Message Format

​Client → Server Messages

​Audio Data

​Ping

​End session (WebSocket)

​Server → Client Messages

​Transcript

​Audio Response

​Status Events

​Session error events

​Connection Example

​Browser Example

​Connection Lifecycle

​Best Practices

​Socket error payloads

WebSocket Streaming

Connection

Authentication

Message Format

Client → Server Messages

Audio Data

Ping

End session (WebSocket)

Server → Client Messages

Transcript

Audio Response

Status Events

Session error events

Connection Example

Browser Example

Connection Lifecycle

Best Practices

Socket error payloads