The quality of AI-generated voices has improved rapidly in recent years, but there are still aspects of human speech that escape synthetic imitation. Sure, AI actors can deliver smooth corporate voiceovers for presentations and adverts, but more complex performances — a convincing rendition of Hamlet, for example — remain out of reach.
Sonantic, an AI voice startup, says it’s made a minor breakthrough in its development of audio deepfakes, creating a synthetic voice that can express subtleties like teasing and flirtation. The company says the key to its advance is the incorporation of non-speech sounds into its audio; training its AI models to recreate those small intakes of breath — tiny scoffs and half-hidden chuckles — that give real speech its stamp of biological authenticity.
“We chose love as a general theme,” Sonantic co-founder and CTO John Flynn tells The Verge. “But our research goal was to see if we could model subtle emotions. Bigger emotions are a little easier to capture.”
In the video below, you can hear the company’s attempt at a flirtatious AI — though whether or not you think it captures the nuances of human speech is a subjective question. On a first listen, I thought the voice was near-indistinguishable from that of a real person, but colleagues at The Verge say they instantly clocked it as a robot, pointing to the uncanny spaces left between certain words, and a slight synthetic crinkle in the pronunciation.