Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.kallglot.com/llms.txt

Use this file to discover all available pages before exploring further.

WebSocket Streaming

The Kallglot WebSocket API provides real-time bidirectional streaming for audio data, transcripts, and translations. Connect to the WebSocket URL provided when you create a session.

Connection

Connect to the WebSocket URL from the session creation response:
wss://api.kallglot.com/v1/sessions/{session_id}/connect?token={stream_token}

Authentication

Include the session token as a query parameter:
const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);
Stream tokens expire after 5 minutes and are validated only when the WebSocket connects.

Message Format

All messages are JSON-encoded. Each message has a type field that indicates the message type.

Client → Server Messages

Audio Data

Send audio data to be processed:
{
  "type": "audio.input",
  "sequence": 1729481920,
  "timestamp_ms": 1729481920000,
  "speaker": "customer",
  "audio": {
    "encoding": "mulaw",
    "sample_rate_hz": 8000,
    "payload": "base64-encoded-audio-data"
  }
}
FieldTypeDescription
typestringAlways audio.input
sequencenumberSequence number to preserve order
timestamp_msnumberClient timestamp in milliseconds
speakerstringIdentifies the speaker
audio.encodingstringMust be mulaw
audio.sample_rate_hznumberMust be 8000
audio.payloadstringBase64-encoded mu-law audio (one channel)
API v1 currently accepts inbound stream audio only as mu-law (encoding: mulaw) at 8 kHz mono. Anything else responds with session.error and invalid_audio_format. Send chunks roughly 100–200 ms apart for predictable latency (typical telphony framing).

Ping

{
  "type": "ping"
}
Server responds with { "type": "pong", "session_id": "sess_..." }.

End session (WebSocket)

To end the session from the streaming client:
{
  "type": "session.end"
}
The server persists session end side effects and responds with { "type": "session.ended", "session_id": "...", "reason": "explicit_end" }, then closes the receive loop.
Typical flow: audio.input to send microphone audio, ping occasionally, session.end when finished. Muting or pausing transcripts is not available as separate socket commands today.

Server → Client Messages

Transcript

Real-time transcription results:
{
  "type": "transcript.partial",
  "speaker": "customer",
  "language": "de",
  "text": "Ich habe eine "
}
{
  "type": "transcript.final",
  "sequence": 1,
  "speaker": "customer",
  "language": "de",
  "text": "Ich habe eine Frage",
  "confidence": 0.94,
  "timestamp": "2026-03-26T11:03:58.902000Z",
  "translation": {
    "language": "en",
    "text": "I have a question"
  }
}
FieldTypeDescription
typestringtranscript.partial or transcript.final
speakerstringSpeaker identification (agent, customer, or channel ID)
languagestringDetected language code
textstringTranscribed text
sequencenumberIncrementing segment index (transcript.final only)
confidencenumberDetection confidence when available (transcript.final only)
timestampstringSegment timestamp (ISO 8601; transcript.final only)
translationobjectPresent on transcript.final when translation ran; { "language", "text" }
Interim results (transcript.partial) stream while audio is processed. Finals (transcript.final) finalize a segment for the session transcript once Kallglot has a stable recognition result (and include translation payload when configured).

Audio Response

Translated speech audio (for bidirectional translation mode):
{
  "type": "audio.output",
  "sequence": 12,
  "timestamp_ms": 1729482055000,
  "speaker": "translated_agent",
  "audio": {
    "encoding": "mulaw",
    "sample_rate_hz": 8000,
    "payload": "base64-encoded-mulaw-bytes"
  }
}

Status Events

Session status updates:
{
  "type": "session.ready",
  "session_id": "sess_01HXYZ"
}
TypeDescription
session.readySession is fully active and provider connected
session.endedSession has ended

Session error events

{
  "type": "session.error",
  "session_id": "sess_01HXYZ123456789",
  "error": {
    "code": "invalid_audio_format",
    "message": "Only mulaw audio at 8000Hz is supported in API v1"
  }
}
FieldDescription
typeAlways session.error
session_idSession id
error.codeMachine-readable code (for example unknown message type uses invalid_message; oversize payloads use message_too_large; bad audio encoding uses codes from errors)
error.messageHuman-readable explanation

Connection Example

import WebSocket from 'ws';

const session = await createSession();

const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);

ws.on('open', () => {
  console.log('Connected to Kallglot stream');
});

ws.on('message', (data) => {
  const message = JSON.parse(data);

  switch (message.type) {
    case 'transcript.partial':
      console.log(`[Partial] ${message.speaker}: ${message.text}`);
      break;

    case 'transcript.final':
      console.log(`[${message.speaker}] ${message.text}`);
      if (message.translation) {
        console.log(`  → ${message.translation.text}`);
      }
      break;

    case 'audio.output':
      // Play translated audio
      playAudio(Buffer.from(message.audio.payload, 'base64'));
      break;

    case 'session.ready':
      console.log('Session ready');
      break;

    case 'session.error':
      console.error(`Error [${message.error.code}]: ${message.error.message}`);
      break;

    case 'session.ended':
      console.log('Session ended:', message.reason);
      ws.close();
      break;
  }
});

ws.on('close', (code, reason) => {
  console.log(`Connection closed: ${code} - ${reason}`);
});

ws.on('error', (error) => {
  console.error('WebSocket error:', error);
});

// Send audio data
function sendAudio(audioBuffer) {
  ws.send(JSON.stringify({
    type: 'audio.input',
    sequence: Date.now(),
    timestamp_ms: Date.now(),
    speaker: 'customer',
    audio: {
      payload: audioBuffer.toString('base64'),
      encoding: 'mulaw',
      sample_rate_hz: 8000
    }
  }));
}

// End session from the client (mirrors REST POST /v1/sessions/:id/end)
function endSession() {
  ws.send(JSON.stringify({ type: 'session.end' }));
}

Browser Example

// Create session (call your backend)
const session = await fetch('/api/sessions', {
  method: 'POST',
  body: JSON.stringify({ mode: 'bidirectional_translation' })
}).then(r => r.json());

// Connect to WebSocket
const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);

// Get microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

source.connect(processor);
processor.connect(audioContext.destination);

processor.onaudioprocess = (e) => {
  const inputData = e.inputBuffer.getChannelData(0);
  const pcm16 = new Int16Array(inputData.length);

  for (let i = 0; i < inputData.length; i++) {
    pcm16[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
  }

  const mulawBase64 = downsampleEncodeMulawFrames(pcm16);
  if (!mulawBase64 || ws.readyState !== WebSocket.OPEN) {
    return;
  }

  ws.send(JSON.stringify({
    type: 'audio.input',
    sequence: Date.now(),
    timestamp_ms: Date.now(),
    speaker: 'customer',
    audio: {
      payload: mulawBase64,
      encoding: 'mulaw',
      sample_rate_hz: 8000
    }
  }));
};
Provide downsampleEncodeMulawFrames (or reuse your PSTN codecs) so the browser emits mono 8 kHz mu-law Base64 payloads; PCM at 48 kHz is not accepted directly.

Connection Lifecycle

Best Practices

Send audio in 100-200ms chunks for optimal latency and transcription quality. Smaller chunks increase network overhead; larger chunks increase perceived latency.
Interim payloads use type: transcript.partial; treat them as non-final UI hints until transcript.final arrives with a sequence.
WebSocket connections may drop due to network issues. Implement exponential backoff reconnection and resume streaming.
If the connection drops briefly, buffer audio data and send it when reconnected to avoid gaps in transcription.

Socket error payloads

These appear on session.error as error.code while the HTTP connection stays open:
CodeWhen
invalid_audio_formataudio.input encoding or sample rate is not mu-law @ 8000 Hz
invalid_messagePayload is not valid JSON
message_too_largeIncoming WebSocket message exceeds the documented size ceiling
Invalid or expired stream tokens and ended sessions normally fail during connect via WebSocket close codes (see Error Codes).