Question 1

How does zero-shot voice cloning work?

Accepted Answer

Chatterbox Turbo can clone any voice with just 5 seconds of reference audio. Simply upload your audio file, and the model will match the style, tone, and characteristics without any training or fine-tuning required.

Question 2

What paralinguistic tags are supported?

Accepted Answer

Chatterbox Turbo supports multiple natural vocal reaction tags including , , , , , , , and . These tags generate natural reactions in the cloned voice with matching emotional tone.

Question 3

How fast is Chatterbox Turbo?

Accepted Answer

Chatterbox Turbo generates speech up to 6× faster than real-time on GPU. This makes it perfect for real-time applications, voice assistants, and interactive media where speed is critical.

Question 4

What is the exaggeration parameter?

Accepted Answer

The exaggeration parameter (0.0-1.0) controls speech expressiveness. Lower values create monotone speech, while higher values make the voice more dramatic and expressive. Default is 0.25 for natural delivery.

Question 5

What audio formats are supported for input and output?

Accepted Answer

You can upload reference audio in MP3, WAV, or MPEG formats. Chatterbox Turbo generates high-quality audio output suitable for any professional use case.

Question 6

How is pricing calculated?

Accepted Answer

Chatterbox Turbo charges 6 credits per 1000 characters of text. Text under 1000 characters is rounded up to 1000. This makes it one of the most cost-effective professional TTS solutions available.

Question 7

What does the built-in watermarking do?

Accepted Answer

Every audio file generated by Chatterbox Turbo includes PerTh (Perceptual Threshold) Watermarker. This deep neural network watermarker embeds data in an imperceptible way, helping you track AI-generated content for responsible AI deployment without compromising audio quality.

Output format	High-quality audio
Reference audio	5 seconds required for cloning
Processing time	Up to 6× faster than real-time
Cost	6 credits per 1000 characters
Exaggeration range	0.0 - 1.0
Temperature range	0.05 - 5.0
License	MIT (Open Source)
Watermarking	Built-in PerTH

Chatterbox Turbo - Text to Speech

Save Your Audios

Community Audios

What Can Chatterbox Turbo Do?

Zero-Shot Voice Cloning

Paralinguistic Emotions

Emotion Exaggeration Control

Built-in Watermarking

Ultra-Fast Generation

Open Source & MIT Licensed

How to Use Chatterbox Turbo

Enter Your Text

Upload Reference Audio (Optional)

Adjust Settings

Generate & Download

Frequently Asked Questions