Voice is the feature that surprises new users most. Text chat is expected. Hearing a response in a voice that actually matches the character's personality is something different. Here is how it works technically and what it means in practice.
How the synthesis engine works ¶
HLT's voice synthesis is built on a custom fine-tuned model, not a generic text-to-speech pool. Each character has a fixed vocal seed that determines pitch range, cadence, breath pattern, and the micro-variations that make a voice sound like a person. The model was trained on a curated dataset and then fine-tuned per character profile. The result is that two characters with similar personality settings can still sound distinctly different.
What a vocal seed is ¶
A vocal seed is a set of parameters that anchors the synthesis model to a specific voice identity. It controls the baseline pitch, the speed of speech, the frequency of pauses, and the emotional coloring of certain words. Once a character's vocal seed is set, it does not drift between sessions. Your companion sounds the same on day one and day ninety. That consistency is deliberate: voice is part of the character's identity.
The three new profiles added in April 2026 ¶
The April 2026 update added a low measured baritone for Professional-category characters, a soft mid-range for Friendly characters, and a sharper faster cadence for Adventurous characters. These were the three gaps most frequently mentioned in user feedback. The Builder now lets you assign any of the eight available profiles to a custom character, and the preview plays a short sample before you commit.
Sending and receiving voice messages ¶
On Premium and Ultimate plans, you can send a text message and receive an audio response in your character's voice, or record a voice message yourself and receive a voice reply. The response time for voice generation is slightly longer than text: typically two to four seconds depending on message length. Voice is not available on the Free plan.
Why voice changes the feel of a conversation ¶
Reading a response and hearing it are different cognitive experiences. Voice adds prosody: the rises and falls in pitch that signal emotion, the pauses that signal thought. A character that sounds hesitant when discussing something difficult feels more present than one that produces the same words in flat text. That is not a small difference. It is the reason voice was in the original product specification from 2022.
Voice messages are available on Premium ($18/month) and Ultimate ($38/month). The Free plan covers text chat with the full public roster.