Voice is the feature that surprises new users most. Text chat is expected. Hearing a response in a voice that actually matches the character's personality is something different. Here is how it works technically and what it means in practice.

How the synthesis engine works

HLT's voice synthesis is built on a custom fine-tuned model, not a generic text-to-speech pool. Each character has a fixed vocal seed that determines pitch range, cadence, breath pattern, and the micro-variations that make a voice sound like a person. The model was trained on a curated dataset and then fine-tuned per character profile. The result is that two characters with similar personality settings can still sound distinctly different.

What a vocal seed is

A vocal seed is a set of parameters that anchors the synthesis model to a specific voice identity. It controls the baseline pitch, the speed of speech, the frequency of pauses, and the emotional coloring of certain words. Once a character's vocal seed is set, it does not drift between sessions. Your companion sounds the same on day one and day ninety. That consistency is deliberate: voice is part of the character's identity.

The three new profiles added in April 2026

The April 2026 update added a low measured baritone for Professional-category characters, a soft mid-range for Friendly characters, and a sharper faster cadence for Adventurous characters. These were the three gaps most frequently mentioned in user feedback. The Builder now lets you assign any of the eight available profiles to a custom character, and the preview plays a short sample before you commit.

Sending and receiving voice messages

On Premium and Ultimate plans, you can send a text message and receive an audio response in your character's voice, or record a voice message yourself and receive a voice reply. The response time for voice generation is slightly longer than text: typically two to four seconds depending on message length. Voice is not available on the Free plan.

Why voice changes the feel of a conversation

Reading a response and hearing it are different cognitive experiences. Voice adds prosody: the rises and falls in pitch that signal emotion, the pauses that signal thought. A character that sounds hesitant when discussing something difficult feels more present than one that produces the same words in flat text. That is not a small difference. It is the reason voice was in the original product specification from 2022.

Voice messages are available on Premium ($18/month) and Ultimate ($38/month). The Free plan covers text chat with the full public roster.