Speech Synthesis – the early years…

Recently I got asked to recreate sound of the Voder for an art installation which meant I had to do some background research about speech synthesis and got a few surprises along the way…

First surprise: First speech synthesiser was developed in the 1770s by Wolfgang von Kempelen in Vienna!

Von Kempelen’s machine was the first that allowed to produce not only some speech sounds, but also whole words and short sentences. The machine consisted of a bellows that simulated the lungs and a ‘wind box’ that was provided with levers to be actuated with the fingers of the right hand. The levers actuated a ‘mouth’, made of rubber, there was also a ‘nose with two nostrils that had to be covered with two fingers unless a nasal sound was to be produced. The whole speech production mechanism was enclosed in a box with holes for the hands.

Second surprise: Wheatstone, developer of the Wheatstone Bridge (electrical engineers will know all about this) also experimented with speech synthesis.

Third surprise: The Voder was an instrument that you played. Homer Dudley at Bell Labs had been refining the idea that vocal sounds can be grouped into a fairly small number of pitched and un-pitched sounds that could be created electronically. For example letter “A” is pitched and the letter “S” is un-pitched. He also realised that the vocal chords were a “carrier” and that the lips, tongue, cheeks etc. were filtering the carrier to create all the different sounds required to produce speech. He reasoned that if these simulated sounds were then strung together in the correct order then speech could be created from scratch. His system used ten filters that could, if used in the correct combinations, create approximations of the most common vocal sounds. The (highly skilled) operator of the Voder had to manipulate 10 keys, a footpedal and wrist switches to create each of the sounds. Apparently it took about a year to become good enough to produce reasonable speech. The Voder was one of the attractions at the World Fair in New York in 1939, along with a robot that would smoke cigarettes!

So to make the Voder say “she saw me” you would have to do the following…

speech:             SH-E,       S- AW,       M-E

key:                 7&8  1&8    9     3        1   1,8

wrist lever:    up down    up down    down down

This was from the Voder instruction manual– Lesson 1,  no wonder it took over a year to get good at playing it…

So as you can see the Voder was effectively a vocoder that you manually ‘played’. Homer Dudley went on to develop the Vocoder which automates the analysis process by splitting the the input speech into the same bands as the keys on the Voder thus automating the process.

The principles of the Voder do live on in some modern equipment / systems. Adding a computer to the system it is possible to make text to speech converters that construct speech from text in real time. As a computer is doing the manipulation the intelligibility is much better and it is the system used by a number of up market Sat Nav systems  to read out street names.