Vocoder
A
vocoder (name derived from
voice encoder, formerly also called
voder) is a
speech analyzer and
synthesizer. It was originally developed as a speech coder for
telecommunications applications in the
1930s, the idea being to
code speech for transmission. Its primary use in this fashion is for secure radio communication, where voice has to be
digitized,
encrypted and then transmitted on a narrow, voice-bandwidth channel. The vocoder has also been used extensively as an
electronic musical instrument.
The vocoder is related to, but essentially different from, the computer algorithm known as the "
phase vocoder".
Vocoder theory
The
human voice consists of sounds generated by the opening and closing of the
glottis by the
vocal cords, which produces a periodic waveform with many
harmonics. This basic sound is then
filtered by the nose and throat (a complicated
resonant piping system) to produce differences in harmonic content (
formants) in a controlled way, creating the wide variety of sounds used in speech. There is another set of sounds, known as the
unvoiced and
plosive sounds, which are not modified by the mouth in the same fashion.
The vocoder examines speech by finding this basic
carrier wave, which is at the
fundamental frequency, and measuring how its spectral characteristics are changed over time by recording someone speaking. This results in a series of numbers representing these modified frequencies at any particular time as the user speaks. In doing so, the vocoder dramatically reduces the amount of information needed to store speech, from a complete recording to a series of numbers. To recreate speech, the vocoder simply reverses the process, creating the fundamental frequency in an
oscillator, then passing it through a stage that filters the frequency content based on the originally recorded series of numbers.
Early vocoders
Most
analog vocoder systems use a number of frequency channels, all tuned to different frequencies (using
band-pass filters). The various values of these filters are stored not as the raw numbers, which are all based on the original fundamental frequency, but as a series of modifications to that fundamental needed to modify it into the signal seen in the output of that filter. During playback these settings are sent back into the filters and then added together, modified with the knowledge that speech typically varies between these frequencies in a fairly linear way. The result is recognizable speech, although somewhat "mechanical" sounding. Vocoders also often include a second system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency.
The first experiments with a vocoder were conducted in 1928 by
Bell Labs engineer
Homer Dudley, who eventually patented it in 1935. Dudley's vocoder was used in the
SIGSALY system, which was built by
Bell Labs engineers (
Alan Turing was briefly involved) in 1943. The
SIGSALY system was used for encrypted high-level communications during WW-II. Later work in this field has been conducted by
James Flanagan.
Linear prediction-based vocoders
Since the late
1970s, most non-musical vocoders have been implemented using
linear prediction, whereby the target signal's spectral envelope (formant) is estimated by an
all-pole IIR filter. In linear prediction coding, the all-pole filter replaces the bandpass filter bank of its predecessor and is used at the encoder to
whiten the signal (i.e., flatten the spectrum) and again at the decoder to re-apply the spectral shape of the target speech signal. In contrast with vocoders realized using bandpass filter banks, the location of the linear predictor's spectral peaks is entirely determined by the target signal and need not be
harmonic, i.e., a whole-number multiple of the basic frequency.
Modern vocoder implementations
Even with the need to record several frequencies, and the additional unvoiced sounds, the compression of the vocoder system is impressive. Standard systems to record speech record a frequency from about 500 Hz to 3400 Hz, where most of the frequencies used in speech lie, which requires 64kbit/s of bandwidth (the
Nyquist rate). However a vocoder can provide a reasonably good simulation with as little as 2400 bit/s of bandwidth, a 26× improvement.
Several vocoder systems are used in
NSA encryption systems:
* LPC-10,
FIPS Pub 137, 2400 bit/s, which uses
linear predictive coding* Code Excited Linear Prediction, (
CELP), 2400 and 4800 bit/s, Federal Standard 1016, used in
STU-III * Continuously Variable Slope Delta-modulation (CVSD), 16 Kbit/s, used in wide band encryptors such as the
KY-57.
* Mixed Excitation Linear Prediction (
MELP), MIL STD 3005, 2400 bit/s, used in the Future Narrowband Digital Terminal
FNBDT,
NSA's 21st century secure telephone.
* Adaptive Differential Pulse Code Modulation (
ADPCM), former
ITU-T G.721, 32Kbit/s used in
STE secure telephone
(ADPCM is not a proper vocoder but rather a waveform codec.
ITU has gathered G.721 along with some other ADPCM codecs into G.726.)
For
musical applications, a source of musical sounds is used as the carrier, instead of extracting the fundamental frequency. For instance, one could use the sound of a
guitar as the input to the filter bank, a technique that became popular in the
1970s.
Musical history
In 1970, electronic music pioneers
Wendy Carlos and
Robert Moog developed one of the first truly musical vocoders. A 10-band device inspired by the vocoder designs of Homer Dudley, it was originally called a spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier signal came from a Moog
modular synthesizer, and the modulator from a
microphone input. The output of the 10-band vocoder was fairly intelligible, but relied on specially articulated
speech. Later improved vocoders use a high-pass filter to let some
sibilance through from the microphone; this ruins the device for its original speech-coding application, but it makes the "talking synthesizer" effect much more intelligible.
Carlos' and Moog's vocoder was featured in several recordings, including the
soundtrack to
Stanley Kubrick's
A Clockwork Orange, in which the vocoder sang the vocal part of
Beethoven's Ninth Symphony. Also featured in the soundtrack was a piece called "Timesteps," which featured the vocoder in two sections. Originally, "Timesteps" was intended as merely an introduction to vocoders for the "timid listener", but Kubrick chose to include the piece on the soundtrack, much to the surprise of Wendy Carlos.
In the late 1970s, vocoder began to appear in
pop music, for example on
disco recordings. A typical example is
Giorgio Moroder's 1977 album
From Here to Eternity. Vocoders are often used to create the sound of a robot talking, as in the
Styx song "
Mr. Roboto". It was also used for the introduction to the
Main Street Electrical Parade at Disneyland.
Vocoder has appeared on pop recordings from time to time ever since, but in most of cases vocoder works just as a some kind of special effect in pop music. However, many experimental electronic artists and representors of "
new age" genre often utilize vocoder in a more comprehensive manner.
Jean Michel Jarre (album
Zoolook, 1984) and
Mike Oldfield (album
Five Miles Out, 1982) are good examples. There are also some artists who have made vocoder an essential part of their music. Those include the famous German group,
Kraftwerk, jazz/fusion keyboardist
Herbie Hancock during his late 1970s disco period,
Patrick Cowley's late recordings and more recently, avant-garde-pop group
Trans Am. The song "
O Superman" by avant-garde musician,
Laurie Anderson, is a popular recording released in 1981 that incorporates the vocoder.
The KLF used vocoder-distorted voices in their
1991 "Stadium House" mix
Last Train to Trancentral (Live from the Lost Continent). In 1998,
Marilyn Manson utilized the vocoder heavily in their glam- and 70s-influenced LP,
Mechanical Animals, whereon such songs as "User Friendly" and "Posthuman" among others make substantial use of the technology. Since 1998, Manson has favored the live concert use of vocoders and many concert-goers can hear him use the technology when performing many songs, notably, "Antichrist Superstar". In 2005, artist
Imogen Heap's track
Hide and Seek used the vocoder exclusively, with zero other instrumental support. The bands
The Faint,
Air,
Ween, and
Death From Above 1979 all have extensive use of the vocoder.
Other voice effects
"Robot voices" became a recurring element in popular music during the late twentieth century. Several methods of producing variations on this effect have arisen, of which the vocoder is only one. It is still the best known, and the following other pieces of music technology are often confused with the vocoder:
;Talk box:The
talk box guitar effect was invented by Doug Forbes and popularized by
Peter Frampton. In the talk box effect, amplified sound is actually fed via a tube into the performer's mouth and is then shaped by the performer's lip, tongue, and mouth movements before being picked up by a microphone. In contrast, the vocoder effect is produced entirely electronically. The background riff from "
Livin' on a Prayer" by
Bon Jovi is a well-known example, "
California Love" by
2Pac and
Roger Troutman is a more recent recording featuring a talk box fed with a synthesizer instead of guitar.
;Autotuner:The vocoder should also not be confused with the Antares
Auto-Tune Pitch Correcting Plug-In, which can also be used to achieve a robotic-sounding vocal effect by
quantizing (removing smooth changes in) voice pitch or by adding pitch changes. This has been employed in recent years by artists such as
Daft Punk (who also use vocoders and talk boxes),
Cher, and the Italian dance/pop group
Eiffel 65.
;Linear prediction coding:Linear prediction coding is also used as a musical effect (generally for cross-synthesis of musical timbres), but is not as popular as bandpass filter bank vocoders, and the musical use of the word
vocoder refers exclusively to the latter type of device.
;Speech synthesis:Robotic voices in music may also be produced by
speech synthesis. This does not usually create a "singing" effect (although it can). Speech synthesis means that, unlike in vocoding, no human speech is employed as basis. One example of such use is the song
Das Boot by
U96. A more tongue-in-cheek musical use of speech synthesis is
MC Hawking.
;Delay:A
delay unit, when set to a high feedback level and delay time of less than a tenth of a second, produces a sharp, resonating transformation of the voice. Of the robot voice effects listed here, this one requires the least resources, since delay units are a staple of recording studios and sound editing software. As the effect deprives a voice of much of its musical qualities (and has few options for sound customization), the robotic delay is mostly used in TV/movie applications.
Vocoders have also been used in television and film, usually for robots or talking computers:
*One of the earliest film applications of vocoding can be heard in the flashback preludes of the
1949 movie
A Letter to Three Wives.
*In
Transformers, the vocal effects of
Soundwave were created with vocoders.
*The
Cylons from
Battlestar Galactica used a
Roland Vocoder to create their monotone voice.
*A vocoder was used by
Wendy Carlos for the soundtrack to Stanley Kubrick's
A Clockwork Orange, particularly the choirs in "An die Freude".
* In the film "
Sgt. Pepper's Lonely Heart Club Band", the robotic singing of the Computerettes in the song "
Mean Mr. Mustard" was achieved by using a vocoder.
* A vocoder was used in the soundtrack for the movie
Donnie Darko to create tension and mystery.
* The voices of the
Daleks in
Doctor Who are created using a
ring modulator, not a vocoder.
*
Talk box*
Auto-Tune*
Speech synthesisCited references
*
List of music releases featuring a vocoder*
Vocod'o'rama, an Analogue Vocoder Fan Site.*
Vocoders at Vintage Synth Explorer*
GPL implementation of a vocoder, as a LADSPA plugin*
Various links on compression of human speech