Story321.com

VibeVoice - Text to Speech

Generate expressive speech from text using VibeVoice 0.5b. Fast, open-source AI voice synthesis with multiple speaker options.

Save Your Audios

Login to save, manage and share all your generated audios

Community Audios

What Can VibeVoice Do?

Multiple Speaker Voices

Choose from 6 different speaker voices including Frank, Wayne, Carter, Emma, Grace, and Mike. Each voice has unique characteristics for various content types.

Fast Generation

Generate speech quickly with optimized processing. Perfect for real-time applications, voice assistants, and interactive media.

Adjustable CFG Scale

Control adherence to text with the CFG scale parameter. Higher values increase text adherence, lower values allow more creative variation.

High-Quality Audio Output

Produces 24kHz sample rate audio for clear, natural-sounding speech. Suitable for professional voiceover work.

Reproducible Generation

Use seed values for reproducible results. Perfect for maintaining consistency across multiple generations of the same text.

Open Source AI

Built on open-source technology for transparency and community-driven improvements. High-quality voice synthesis accessible to everyone.

How to Use VibeVoice

1

Enter Your Text

Type or paste the script you want to convert to speech. VibeVoice will generate natural-sounding speech from your text.

2

Select a Speaker

Choose from 6 available speaker voices: Frank, Wayne, Carter, Emma, Grace, or Mike. Each voice has unique characteristics.

3

Adjust Settings (Optional)

Fine-tune the CFG scale to control text adherence. Use a seed value for reproducible results if needed.

4

Generate & Download

Click Generate to create your audio. Download the high-quality MP3 file for use in your projects.

Frequently Asked Questions

What speakers are available in VibeVoice?

VibeVoice offers 6 speaker voices: Frank, Wayne, Carter, Emma, Grace, and Mike. Each voice has unique characteristics suitable for different content types, from narration to character voices.

What is the CFG scale parameter?

CFG (Classifier-Free Guidance) scale controls how closely the generated speech adheres to the input text. Higher values (up to 3.0) increase text adherence, while lower values (as low as 0.5) allow more creative variation. The default value is 1.3 for balanced results.

How does the seed parameter work?

The seed parameter allows you to control the randomization in generation. Using the same seed value with the same text will produce identical results, which is useful for reproducible generation and testing.

What is the audio quality of VibeVoice output?

VibeVoice generates audio at 24kHz sample rate, providing high-quality, clear, and natural-sounding speech. The output is suitable for professional voiceover work and content creation.

How fast is VibeVoice generation?

VibeVoice is optimized for fast generation, making it suitable for real-time applications and interactive media. Generation speed depends on text length and server load, but typically completes in seconds.

Can I use VibeVoice for commercial projects?

Yes, you can use VibeVoice-generated audio for commercial projects including YouTube videos, podcasts, e-learning, audiobooks, advertisements, and more. Check the specific licensing terms for your use case.

What is the maximum text length for VibeVoice?

VibeVoice supports long-form text input. For very long texts, consider splitting into multiple segments for optimal performance. The pricing is calculated per 1000 characters.

How is pricing calculated for VibeVoice?

VibeVoice charges 6 credits per 1000 characters of text. Text under 1000 characters is rounded up to 1000. This makes it one of the most cost-effective TTS solutions available.

Pricing

Free tier available

Text to Speech6 credits per 1000 characters

Technical Specifications

Output formatHigh-quality audio (MP3)
Sample rate24kHz
Processing timeFast generation
Cost6 credits per 1000 characters
CFG scale range0.5 - 3.0
Available speakers6 voices (Frank, Wayne, Carter, Emma, Grace, Mike)
Reproducible generationYes (via seed parameter)