Voice Agent Quickstart: Build in 10 Minutes with Orga SDK (Copy-Paste Code)

Feb 10, 2026

Deploying real-time voice agents requires managing complex audio streams and ensuring ultra-low latency for a natural conversational flow. The Orga AI SDK is designed to streamline this process, allowing developers to integrate multimodal capabilities (voice and vision) with just a few lines of code.

In this voice agent quickstart, we will cover everything from installation to launching an active session using our specialized streaming infrastructure.

Deploying real-time voice agents requires managing complex audio streams and ensuring ultra-low latency for a natural conversational flow. The Orga AI SDK is designed to streamline this process, allowing developers to integrate multimodal capabilities (voice and vision) with just a few lines of code.

In this voice agent quickstart, we will cover everything from installation to launching an active session using our specialized streaming infrastructure.



Prerequisites

Before getting started, make sure you have:

  1. An active account in the Orga AI Dashboard.

  2. Your personal API Key.

  3. Node.js installed in your development environment.

Step 1: Installing the SDK

The Orga SDK is the primary tool for interacting with our multimodal agents. You can add it to your JavaScript or TypeScript project using your preferred package manager:

Bash

npm install @orga-ai/sdk
# or
yarn add @orga-ai/sdk
npm install @orga-ai/sdk
# or
yarn add @orga-ai/sdk
npm install @orga-ai/sdk
# or
yarn add @orga-ai/sdk


Step 2: Agent and Client Configuration

To initiate communication, you first need to configure the client with your API Key and define the agent’s parameters, such as the model and system instructions.

JavaScript

import { OrgaClient } from '@orga-ai/sdk';

const client = new OrgaClient({
  apiKey: 'YOUR_API_KEY_HERE',
});

const startAgent = async () => {
  const agent = await client.createAgent({
    model: 'orga-multimodal-v1', // Official Orga Model
    instructions: 'You are an Orga technical assistant. Answer clearly and directly.',
    voice: 'shimmer', // Voice profile configuration
  });

  await agent.connect();
  console.log('Session started: The agent is now listening.');
};
import { OrgaClient } from '@orga-ai/sdk';

const client = new OrgaClient({
  apiKey: 'YOUR_API_KEY_HERE',
});

const startAgent = async () => {
  const agent = await client.createAgent({
    model: 'orga-multimodal-v1', // Official Orga Model
    instructions: 'You are an Orga technical assistant. Answer clearly and directly.',
    voice: 'shimmer', // Voice profile configuration
  });

  await agent.connect();
  console.log('Session started: The agent is now listening.');
};
import { OrgaClient } from '@orga-ai/sdk';

const client = new OrgaClient({
  apiKey: 'YOUR_API_KEY_HERE',
});

const startAgent = async () => {
  const agent = await client.createAgent({
    model: 'orga-multimodal-v1', // Official Orga Model
    instructions: 'You are an Orga technical assistant. Answer clearly and directly.',
    voice: 'shimmer', // Voice profile configuration
  });

  await agent.connect();
  console.log('Session started: The agent is now listening.');
};

Step 3: Managing Audio and Video Events

The Orga SDK operates asynchronously, emitting events based on the conversation's state. For a seamless integration, it is essential to set up listeners for these key events:

  • onConnect: Confirmation that the WebSocket tunnel is successfully open.

  • onSpeechStarted: Triggered when the agent detects the user has started speaking.

  • onSpeechFinished: Indicates that the agent's response has ended.

JavaScript

agent.on('speech-started', () => {
  console.log('Agent is processing your voice input...');
});

agent.on('text-delta', (delta) => {
  // Useful for displaying real-time transcripts in the UI
  console.log('Receiving text stream:', delta);
});
agent.on('speech-started', () => {
  console.log('Agent is processing your voice input...');
});

agent.on('text-delta', (delta) => {
  // Useful for displaying real-time transcripts in the UI
  console.log('Receiving text stream:', delta);
});
agent.on('speech-started', () => {
  console.log('Agent is processing your voice input...');
});

agent.on('text-delta', (delta) => {
  // Useful for displaying real-time transcripts in the UI
  console.log('Receiving text stream:', delta);
});

Step 4: Secure Session Termination

To optimize resource usage and token consumption, always ensure the connection is closed once the interaction ends:

JavaScript

const endConversation = async () => {
  await agent.disconnect();
  console.log('Connection closed successfully.');
};
const endConversation = async () => {
  await agent.disconnect();
  console.log('Connection closed successfully.');
};
const endConversation = async () => {
  await agent.disconnect();
  console.log('Connection closed successfully.');
};

Conclusion

By leveraging the Orga SDK abstraction layer, there is no need to configure separate audio servers or complex model orchestration systems. With these steps, you now have a functional agent capable of maintaining fluid, real-time dialogues.

Next Steps:

Try Orga now

Connect to Platform to build agents that can see, hear, and speak in real time.

Male developer looking at AI code on the screen.

Try Orga now

Connect to Platform to build agents that can see, hear, and speak in real time.

Female developer looking at her screen with AI code displayed around her.

Try Orga now

Connect to Platform to build agents that can see, hear, and speak in real time.

Female developer looking at her screen with AI code displayed around her.