Skip to content

Talk Mode & TTS

Talk Mode & TTS

Talk Mode enables voice conversations with the agent — speech-to-text input and ElevenLabs text-to-speech output. It works via the OpenClaw iOS app, the TUI, and the control UI.

Prerequisites

  • An ElevenLabs account with an API key
  • OpenClaw with TTS support (2026.2.17+)
  • For mobile: the OpenClaw iOS app built from source and paired to the gateway

Configuration

1. Add the ElevenLabs API Key

Add the key to your .env file:

Terminal window
echo "ELEVENLABS_API_KEY=your-key-here" >> ~/.openclaw/.env

Document it in .env.example (already done if using the Lobster repo config):

ELEVENLABS_API_KEY=your-elevenlabs-api-key

2. Configure the Talk Section

Add the talk section to openclaw.json:

{
"talk": {
"voiceId": "1SM7GgM6IMuvQlz2BwM3",
"modelId": "eleven_v3",
"outputFormat": "mp3_44100_128",
"apiKey": "${ELEVENLABS_API_KEY}",
"interruptOnSpeech": true
}
}
FieldPurpose
voiceIdElevenLabs voice ID (see Choosing a Voice)
modelIdElevenLabs model — eleven_v3 is the latest multilingual model
outputFormatAudio format — mp3_44100_128 is high quality, widely compatible
apiKey${VAR} reference to your ElevenLabs API key
interruptOnSpeechStop TTS playback when the user starts speaking

3. Enable the TTS Tool

The tts tool must be in the agent’s alsoAllow list:

{
"tools": {
"alsoAllow": ["tts", ...]
}
}

4. Configure the SAG Skill (Optional)

The sag CLI provides advanced voice features — auditioning voices, file output, and speaker playback:

Terminal window
brew install sag

Add the skill config to openclaw.json:

{
"skills": {
"entries": {
"sag": {
"apiKey": "${ELEVENLABS_API_KEY}"
}
}
}
}

5. Restart the Gateway

Terminal window
openclaw gateway restart

Choosing a Voice

Browse voices at elevenlabs.io/voice-library. Each voice has an ID you can copy.

Default voice: Mark — Casual, Relaxed and Light (1SM7GgM6IMuvQlz2BwM3).

To change the voice, update talk.voiceId in openclaw.json and restart the gateway.

You can also audition voices via the sag CLI skill — ask the agent to “try a different voice” and it can use sag to preview options.

Using Talk Mode

Via iOS App

  1. Build the OpenClaw iOS app from source (see the OpenClaw GitHub repo)
  2. Pair it to your gateway (scan the QR code from the dashboard)
  3. Tap the microphone icon to start a voice conversation
  4. Speech is transcribed to text, sent to the agent, and the reply is spoken back via ElevenLabs

Via TUI

Terminal window
openclaw tui

Use /tts on to enable voice replies in the terminal interface.

Via Control UI

Open the dashboard at http://127.0.0.1:18789/ and use the Talk Mode interface.

Toggle Per-Session

/tts on # Enable voice replies
/tts off # Disable voice replies

Voice Replies in iMessage

When Talk Mode is active, the agent can send voice memos as iMessage attachments. This requires:

  • BlueBubbles Private API enabled (for attachment sending)
  • The tts tool in the agent’s allowed tools
  • Audio format compatible with iMessage (MP3 or CAF)

Cost Considerations

ElevenLabs charges per character of text synthesized. Voice replies for typical agent responses (1-3 sentences) use roughly 100-300 characters each. Monitor usage at elevenlabs.io/app/usage.

Talk Mode vs the tts Tool

These are two different things:

  • Talk Mode (gateway-native): The gateway automatically converts the agent’s text reply into audio using ElevenLabs. The agent just replies with plain text. This is what powers voice conversations via the iOS app, TUI, and control UI.
  • tts tool (agent-level): The agent explicitly calls the tts tool to generate an MP3 file. Use this for proactive voice messages (e.g., sending a voice memo via iMessage), storytelling with the sag skill, or when you want audio output outside of Talk Mode.

Important: In Talk Mode sessions, the agent should reply with normal text — not call the tts tool. Calling tts and replying NO_REPLY bypasses the gateway’s audio pipeline and the user hears nothing.

Pricing Tiers

TierCharacters/monthCostNotes
Free10,000$0Good for testing
Starter30,000$5/moRecommended for personal use
Creator100,000$22/moFor heavy voice usage

Monitor usage at elevenlabs.io/app/usage.