Skip to main content

WebSocket Streaming

The Kallglot WebSocket API provides real-time bidirectional streaming for audio data, transcripts, and translations. Connect to the WebSocket URL provided when you create a session.

Connection

Connect to the WebSocket URL from the session creation response:
wss://stream.kallglot.com/v1/sessions/{session_id}/connect

Authentication

Include the session token in the Authorization header or as a query parameter:
// Header authentication (preferred)
const ws = new WebSocket(session.stream.url, {
  headers: {
    'Authorization': `Bearer ${session.stream.token}`
  }
});

// Query parameter (for browsers that don't support headers)
const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);

Message Format

All messages are JSON-encoded. Each message has a type field that indicates the message type.

Client → Server Messages

Audio Data

Send audio data to be processed:
{
  "type": "audio",
  "data": "base64-encoded-audio-data",
  "encoding": "pcm_16",
  "sample_rate": 16000,
  "channels": 1
}
FieldTypeDescription
typestringAlways audio
datastringBase64-encoded audio data
encodingstringAudio encoding: pcm_16, pcm_8, mulaw, alaw
sample_ratenumberSample rate in Hz (8000, 16000, 24000, 48000)
channelsnumberNumber of channels (1 for mono, 2 for stereo)
For best transcription quality, use 16-bit PCM at 16kHz mono. Send audio in chunks of 100-200ms.

Control Messages

{
  "type": "control",
  "action": "mute"
}
ActionDescription
muteStop processing incoming audio
unmuteResume processing incoming audio
pause_transcriptionStop generating transcripts
resume_transcriptionResume generating transcripts

End Stream

Signal that you’re done sending audio:
{
  "type": "end"
}

Server → Client Messages

Transcript

Real-time transcription results:
{
  "type": "transcript",
  "id": "seg_001",
  "speaker": "customer",
  "language": "de",
  "text": "Ich habe eine Frage",
  "is_final": false,
  "start_time": 3.5,
  "confidence": 0.94
}
FieldTypeDescription
typestringAlways transcript
idstringSegment identifier
speakerstringSpeaker identification (agent, customer, or channel ID)
languagestringDetected language code
textstringTranscribed text
is_finalbooleanWhether this is the final version of this segment
start_timenumberSegment start time in seconds
confidencenumberTranscription confidence (0-1)
Interim results (is_final: false) are sent as the user speaks. Final results replace interim results with the same id.

Translation

Translated text:
{
  "type": "translation",
  "id": "seg_001",
  "source_language": "de",
  "target_language": "en",
  "source_text": "Ich habe eine Frage",
  "text": "I have a question",
  "is_final": true
}

Audio Response

Translated speech audio (for bidirectional translation mode):
{
  "type": "audio",
  "id": "audio_001",
  "data": "base64-encoded-audio-data",
  "encoding": "pcm_16",
  "sample_rate": 24000,
  "segment_id": "seg_001"
}

Status Events

Session status updates:
{
  "type": "status",
  "status": "connected",
  "message": "Provider connection established"
}
StatusDescription
connectingEstablishing provider connection
connectedSession is fully active
provider_disconnectedPhone call ended
endingSession is ending
endedSession has ended

Error Events

{
  "type": "error",
  "code": "audio_format_invalid",
  "message": "Unsupported audio encoding 'mp3'. Use pcm_16, mulaw, or alaw.",
  "recoverable": true
}

Connection Example

import WebSocket from 'ws';

const session = await createSession();

const ws = new WebSocket(session.stream.url, {
  headers: {
    'Authorization': `Bearer ${session.stream.token}`
  }
});

ws.on('open', () => {
  console.log('Connected to Kallglot stream');
});

ws.on('message', (data) => {
  const message = JSON.parse(data);

  switch (message.type) {
    case 'transcript':
      if (message.is_final) {
        console.log(`[${message.speaker}] ${message.text}`);
      }
      break;

    case 'translation':
      console.log(`Translation: ${message.text}`);
      break;

    case 'audio':
      // Play translated audio
      playAudio(Buffer.from(message.data, 'base64'));
      break;

    case 'status':
      console.log(`Status: ${message.status}`);
      break;

    case 'error':
      console.error(`Error: ${message.message}`);
      if (!message.recoverable) {
        ws.close();
      }
      break;
  }
});

ws.on('close', (code, reason) => {
  console.log(`Connection closed: ${code} - ${reason}`);
});

ws.on('error', (error) => {
  console.error('WebSocket error:', error);
});

// Send audio data
function sendAudio(audioBuffer) {
  ws.send(JSON.stringify({
    type: 'audio',
    data: audioBuffer.toString('base64'),
    encoding: 'pcm_16',
    sample_rate: 16000,
    channels: 1
  }));
}

// End stream gracefully
function endStream() {
  ws.send(JSON.stringify({ type: 'end' }));
}

Browser Example

// Create session (call your backend)
const session = await fetch('/api/sessions', {
  method: 'POST',
  body: JSON.stringify({ mode: 'bidirectional_translation' })
}).then(r => r.json());

// Connect to WebSocket
const ws = new WebSocket(`${session.stream.url}?token=${session.stream.token}`);

// Get microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

source.connect(processor);
processor.connect(audioContext.destination);

processor.onaudioprocess = (e) => {
  const inputData = e.inputBuffer.getChannelData(0);
  const pcm16 = new Int16Array(inputData.length);

  for (let i = 0; i < inputData.length; i++) {
    pcm16[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
  }

  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({
      type: 'audio',
      data: btoa(String.fromCharCode(...new Uint8Array(pcm16.buffer))),
      encoding: 'pcm_16',
      sample_rate: 16000,
      channels: 1
    }));
  }
};

Connection Lifecycle

Best Practices

Send audio in 100-200ms chunks for optimal latency and transcription quality. Smaller chunks increase network overhead; larger chunks increase perceived latency.
Interim results (is_final: false) should be displayed but may be revised. Replace them when you receive the final result with the same id.
WebSocket connections may drop due to network issues. Implement exponential backoff reconnection and resume streaming.
If the connection drops briefly, buffer audio data and send it when reconnected to avoid gaps in transcription.
If you’re sending audio faster than it can be processed, you may receive a throttle control message. Pause sending until you receive a resume message.

Error Codes

CodeDescriptionRecoverable
audio_format_invalidUnsupported audio formatYes
audio_sample_rate_invalidUnsupported sample rateYes
token_expiredSession token has expiredNo
session_endedSession has already endedNo
rate_limit_exceededToo many messagesYes
internal_errorServer errorRetry