Something to shout about: Giving the voiceless their identity back
The work of CereProc has helped transform speech technology to a life changing tool.
Imagine losing your voice. Not through a cold or a fever. Not for a day or two but for the rest of your life. Think of the silence, unable to communicate with speech. Think again.
Edinburgh-based speech technology company, CereProc hit the headlines around the globe in 2010 for giving the American film critic Roger Ebert his voice back.
It sounds like science-fiction but they did it. They captured his speech from previous broadcasts, packaged it into software for him to communicate using his own voice.
Mr Ebert who had lost his ability to speak after thyroid cancer surgery was so pleased with his reconstructed voice he told the world about it on the Oprah Winfrey show.
Two years later and CereProc aim to develop their sophisticated technology for the mass market and give voices to the voiceless on a larger scale at affordable prices.
But its taken a long time for the industry to get this far as the co-founder of CereProc, Dr Mathew Aylett explained: “The history of speech technology is a colourful one in some respects because like a lot of technology it was said it was going to be brilliant and within the next two years everyone would be using it and of course that isn’t what happened.”
In fact, originally it was thought that speech technology would be for the telephone to call a computer and have a conversation. The classic example from the 1980s was of booking a flight over the phone, an idea that is redundant in the age of the internet.
In simple terms what the staff at CereProc do is make computers talk and give that voice expression and emotion.
These voices are created from recorded material of the person themselves or a recording of someone else. These can be programmed to included a variety of expressions and emotions giving the synthetic voice more personality.
This technology has the potential to make profound changes to how we interact with computers as mobile phones make us interact with the digital world in an increasingly eye-free, hands-free society. It could also encourage more people to use technology.
Dr Aylett explained: “What we’ve found is that people in their teens up to 30 are using all this technology but in fact a large portion of society isn’t using this technology such as older people, people with less technical backgrounds, people less educated and so forth. To a certain extent speech technology potentially make more of that technology more available to more people.
“My dad won’t use a keyboard he’s 86. He loves talking to people and would be perfectly happy using a system by saying ‘could you send an email to Mathew blah, blah, blah’. He used to dictate anyway when he was working.”
At the time that CereProc was established in 2005 the industry was geared almost entirely for the call centre market. As a result very plain and neutral voices were created largely because the less variation there was in the voice; the easier it was to knit the sounds together to build a synthetic voice.
But Dr Aylett and his team wanted more, he said: “What we wanted to do was look at character and to make voices which were more engaging allowing people to have some sense of personality behind the voice.”
They worked to make the expensive voice building process more efficient and cost effective. One particular method makes use of a model of different sounds. Dr Aylett explained: “The smart thing about this is that you can produce voices with smaller amounts of data because you can use models built by somebody else and then you can modify them slowly into that person.
“So it’s a bit like invasion of the body snatchers where the voice gradually starts to change into [another] voice.”
This model format means that less audio is required to be recorded and as a result makes is more cost effective. In the not-so-distant-future CereProc hope that people will be able to open an account on their website, record their voice and in return for payment the company will email that person a synthetic voice of themselves.
Dr Aylett said: “We’re still uncertain what people need but there is a real clinical requirement for voice replacement and our aim over the next 10 months is to put voice replacement into people’s hands at a price that is affordable.”
In regards to emotion and expression, the easiest way to inject that into a voice is by copying.
Dr Aylett said: “We were very interested in copying voices because if you can copy a voice you are taking the personality from that voice.”
As such by copying or capturing Roger Ebert’s voice in essence it was giving him part of his identity back. Something he regards so highly that CereProc cannot play the rebuilt voice, even for demo purposes, because Mr Ebert owns all of the rights.
Dr Aylett believes there is a large market for the development of this technology both for clinical purposes like Mr Ebert but also for commerce as the digital landscape changes.
He explained: “To a large extent our voices are part of our identities so when people lose their voices it’s really important to be able to replace them.
“Additionally, the more that our identities become smeared across cyberspace in social networking and whatever else there is a requirement to want to get that information produced for you as audio. That requires speech synthesis for the more you want to extend your own personality into that space.
“Companies are waking up to this now. A company used to spend a quarter of a million pounds on getting a logo changed then they would just use the same speech synthesiser for their website as their main competitors so they were completely schizophrenic in that respect.”
The development of smart phones and in particular the launch of the first iPhone in 2007 has created an explosion of possibilities for speech technologies. Dr Aylett said: “People are moving around with devices that are connected to the internet which they are using hands-free and eyes-free. So the emphasis has moved from call centres technology to Apps.
“If you connect that with social networking you have in effect a sea of data out there and speech synthesis is one way of communicating that to people.”
Some of the commercial products include novelty items such as Nicole the sexy French voice and a talking dog synthesiser which can read messages and information out from your phone.
Dr Aylett said: “Unless you use writing you have to use speech there is no other way to communicate language, well I guess you could use Morse code, maybe flags and things like that but in reality if we want computers to communicate with people they either display text or speak. The requirement of computers to communicate with people is getting higher and higher and higher.”