What makes a synthetic voice sound human?

A convincing synthetic voice is not just about sound quality. To truly sound human, it needs to reflect the structures and musicality of real-life speech. This is what I can help you achieve.

As part of my PhD. research, I developed a linguistic model applied to Spanish that systematically describes the prosodic structures of different communicative acts. The properties of intensity, pitch, and duration (rhythm) were identified and validated using acoustic analysis.

However, these prosodic structures are unique to each language and each dialect (dialects are varieties of the same language). In other words, prosodic components combine differently, resulting in unique configurations. This model is not only applicable to Spanish, but also adaptable to other Western languages, as it is based on universal prosodic patterns that can be adjusted to different linguistic contexts.

What I Offer

I support teams working on synthetic voice systems (Text-to-Speech) by helping them understand how different speech acts actually sound and how they could organize their corpus to make the output more natural and communicatively accurate. My work is grounded in linguistic theory and real-world prosodic analysis.

My role is to provide strategic linguistic insight — not to implement the system, but to guide the process through well-founded recommendations.

1. Data classification by communicative function

I offer guidance on how utterances can be grouped by their function — for example: thanks, apologies, etc.
This classification serves as a foundation for more effective training and output design.

2. Prosodic feature identification

I describe the typical prosodic patterns that appear in different speech acts: pauses, elongations, reductions, rhythm, pitch, and so on.

3. Accessible representations for prosodic cues

I propose written representations of prosodic features, that teams can adapt for their systems.
These are meant to bridge the gap between linguistic theory and system input design.

4. Framework for system-level adaptation

I provide a conceptual structure that your team can use to adapt or enhance your voice system.
Implementation decisions remain entirely in your hands, with my input serving as a foundation for informed choices.

Helping your voice model sound intentional, human, and context-aware. Sample Analysis Preview

Prosody, speech structure, and naturalness in synthetic voice