Question 1

What speakers are available in VibeVoice?

Accepted Answer

VibeVoice offers 6 speaker voices: Frank, Wayne, Carter, Emma, Grace, and Mike. Each voice has unique characteristics suitable for different content types, from narration to character voices.

Question 2

What is the CFG scale parameter?

Accepted Answer

CFG (Classifier-Free Guidance) scale controls how closely the generated speech adheres to the input text. Higher values (up to 3.0) increase text adherence, while lower values (as low as 0.5) allow more creative variation. The default value is 1.3 for balanced results.

Question 3

How does the seed parameter work?

Accepted Answer

The seed parameter allows you to control the randomization in generation. Using the same seed value with the same text will produce identical results, which is useful for reproducible generation and testing.

Question 4

What is the audio quality of VibeVoice output?

Accepted Answer

VibeVoice generates audio at 24kHz sample rate, providing high-quality, clear, and natural-sounding speech. The output is suitable for professional voiceover work and content creation.

Question 5

How fast is VibeVoice generation?

Accepted Answer

VibeVoice is optimized for fast generation, making it suitable for real-time applications and interactive media. Generation speed depends on text length and server load, but typically completes in seconds.

Question 6

Can I use VibeVoice for commercial projects?

Accepted Answer

Yes, you can use VibeVoice-generated audio for commercial projects including YouTube videos, podcasts, e-learning, audiobooks, advertisements, and more. Check the specific licensing terms for your use case.

Question 7

What is the maximum text length for VibeVoice?

Accepted Answer

VibeVoice supports long-form text input. For very long texts, consider splitting into multiple segments for optimal performance. The pricing is calculated per 1000 characters.

Question 8

How is pricing calculated for VibeVoice?

Accepted Answer

VibeVoice charges 6 credits per 1000 characters of text. Text under 1000 characters is rounded up to 1000. This makes it one of the most cost-effective TTS solutions available.

Output format	High-quality audio (MP3)
Sample rate	24kHz
Processing time	Fast generation
Cost	6 credits per 1000 characters
CFG scale range	0.5 - 3.0
Available speakers	6 voices (Frank, Wayne, Carter, Emma, Grace, Mike)
Reproducible generation	Yes (via seed parameter)

VibeVoice - Text to Speech

Save Your Audios

Community Audios

What Can VibeVoice Do?

Multiple Speaker Voices

Fast Generation

Adjustable CFG Scale

High-Quality Audio Output

Reproducible Generation

Open Source AI

How to Use VibeVoice

Enter Your Text

Select a Speaker

Adjust Settings (Optional)

Generate & Download

Frequently Asked Questions