We are thrilled to announce the release of the FASTEST Voice LLM to date! Experience real-time speech streaming from text in 300ms or less. Dive in and test it using our Playground, available SDKs, or these Replit demos for both Nodejs and Python and a chatGPT integration.
At PlayHT, our vision revolves around redefining human interactions with AI agents. Whether it’s for customer support or sales calls, AI tutors, or bringing Gaming NPCs to life, our goal is to revolutionize the way humans communicate with generative AI agents.
And today we announce our latest milestone on the road to fulfilling that vision: the launch of PlayHT Turbo, a new version of our conversational voice model, PlayHT 2.0 that generates speech in under 300ms via network and < 100ms for on-premise solutions (soon).
Input Text Streaming
PlayHT 2.0 Turbo supports input text streaming. This feature seamlessly integrates with LLMs, like chatGPT. Simply feed the output stream of tokens/words from the LLM and the SDK will process the tokens in the best way that can balance both generating expressive contextual speech and reducing the TTFB (time to first byte).
Output Speech Streaming
Once Turbo receives text, it starts streaming audio in approximately 70ms. However, due to inevitable network costs, users typically receive the audio stream within a 200ms to 400ms window.
Check out our demo showcasing the integration with chatGPT with both input and output streaming:
Conversationalize Your Input
PlayHT 2.0 isn’t just any voice model. It was designed for conversations, and trained on over a million hours of conversational speech. This ensures almost any voice has an authentically human-like talking style.
But wait, there’s more! We’re introducing an additional feature to elevate this experience; you can now pass any text to the model, and the model will try its best to modify the text input to make it sound more human-like, check these examples:
Prompt: "Hello, play support speaking? Please hold on a sec, Let me just pull up your details real quick. Can you tell me your account email or your phone number? Okay, there you are. So, what are you actually looking for in the upgrade? Any specific features or stuff that you've got your eye on?"
Notice how the second generation has is more human-like and conversational. We are enabling this beta feature soon for all users, it will be configurable through the API.
A New Playground
We have built a playground where you can test the API and all its features from one interface without a need to write code. Here is a quick run through of all the main controls and functionalities of the playground:
- Voice Cloning: Instantly cloning any voice or accent from a mere 30-second speech sample.
- Model Selection: Choose between our High-Quality 2.0 model (latency < 1 second) or the Turbo model (300ms latency).
- Voice Library: Select from an array of pre-built voices suitable for diverse use-cases.
- Emotion & Style Guidance: Add an emotional layer such as Anger, Happiness, Sadness, etc. Adjust emotion intensity using the Style Guidance slider.
- Output Format: Our models support multiple formats: mp3, wav, pcm, mulaw, flac, and ogg.
- Temperature: Regulate variance. Lower temperatures yield predictable results, while higher ones introduce more variability.
- Voice Guidance: Control voice uniqueness. Lower numbers make your voice sound more generic, while higher values amplify its distinctiveness.
We’re introducing two new SDKs for NodeJS and Python, making the integration of PlayHT 2.0 Turbo into your products a breeze:
For those who don’t use Nodejs or Python, our HTTP API remains at your disposal. However, to experience the lowest latency, we recommend our SDKs, as they utilize the gRPC API.
Create Delightful Conversations
Ready to redefine Human-AI communication? Build the next AI Therapist, AI Tutor, Gaming NPCs, or Personal Assistants that actually sound human? We built this API for you, get started now for free, and join our discord and show us what you are building!