SPRINGFIELD, N.H. – Jessie Levine smiles and shakes her head when she hears the outgoing voicemail message on her iPhone.
“I sound young! And fast!” she marvels. “That person never, ever expected to talk like this.”
The message was recorded before Levine was diagnosed with Lou Gehrig’s disease, or ALS, in early 2015, and before the progressive motor neuron disease caused her speech to become slow and slurred. But as her ability to talk deteriorates, she’s exploring a new way to restore her voice via speech synthesis, or the artificial production of human speech.
The technology has been around for decades, but as devices shrink in size, efforts to customize them are expanding. Multiple companies and research groups are using speech synthesis engines to create voices from spoken samples, usually thousands of recorded sentences.
For example, CereProc, based in Edinburgh, Scotland, created a voice for the late film critic Roger Ebert several years before his death in 2013 by mining commentary tracks he’d recorded for movies.
But VocaliD, a Belmont, Massachusetts, company, is taking a different approach by creating custom voices using just a small sample from the recipient, even if they can’t speak.
Starting with just a tiny snippet of someone’s voice — a few seconds of saying “Ahhhh” — the company matches recipients with a “donor voice” — in Levine’s case, maybe a relative — and then blends the two together. The result is a sound file that can be plugged into any text-to-speech device.
“I have two sisters, one of whom has a lisp like I have, which I had before I had ALS. The other one, we all have this stuffiness to our speech,” said Levine, 45, the manager of Sullivan County, New Hampshire. “It never occurred to me that I could use their voices, adapt it to me, and then be able to use that.”
Company founder and CEO Rupal Patel is a speech technology professor on leave from Northeastern University. Her research found that people with severe communication disorders preserve the ability to control aspects of their voices, such as pitch and loudness. Those characteristics — what Patel calls the “melody of speech” — are also important for speaker identity, she said.
“There is a level of empowerment that comes with having the freedom to be able to communicate in your own voice, and that’s such an important thing, which I think has been overlooked,” Patel said.
No one would give a young girl a prosthetic leg meant for a grown man, she said, and voices should be no different.
The company delivered its first seven voices late last year and is working on about seven dozen more, which will cost $1,249 each. More than 14,000 people worldwide have donated their voices so far in a process that involves about six hours and 3,500 sentences read aloud.
One of the first recipients was 17-year-old Delaney Supple, of Needham, Massachusetts, who was born with cerebral palsy. She had been using a generic computerized voice but didn’t like it much; she makes a gagging gesture when her mother mentions it.
Some voice devices are controlled by eye movement or head movement. Delaney Supple types out her words on a tablet touch screen and then taps it to play them back.
Delaney likes her new voice. So does her mother, Erica Supple, who said it’s a much better fit.
“I love listening to it,” she said, “and it’s funny because when I first heard it … it sounded a little bit like her brother’s voice when he was younger.”