My Personal Views
on the History of Speech Technology

Louis C.W. Pols
Inst. of Phonetic Sciences / ACLC Univ. of Amsterdam

(Contribution to Panel on the History of Speech Technology,
Session 35D1b; Janet Baker, Chair
Interspeech 2005, Lisbon, September 6, 2005)

History (Participants vs. Historians)

• only in 'oral history', the players themselves are involved
• is this the reason for having this panel of involved senior speech scientists?
• historians generally have a much wider view
• I can only talk about my own experiences
• my personal key milestones or events
• my own enabling circumstances

Qualified for this panel?

• old enough: born in 1941 in the Netherlands, meanwhile partially retired since June 2005
• different enough: degree in Physics (1964), first job in 1965 at TNO Inst. for Perception, at Dept. of Audiology (later: Speech and Hearing), headed by Reinier Plomp
• speech technology interests: gradually growing interest in speech science and technology and in interdisciplinarity. Prof. in Phonetics, Univ. A'dam since 1982

Speech Science and Technology

• my point of reference (almost 35 years ago):
• my first, major, international speech conference: 7th ICA 1971, Budapest, Hungary plus Speech Symposium in Szeged
• at that time no ICASSP, Interspeech, or ASRU
• some key players:
• Velichko, dynamic programming
• Atal, initial ideas about predictive coding
• Sakoe, time normalization
• Coker, articulatory synthesis
• Mermelstein, VT transfer functions
• Fujimura, x-ray microbeam
• Liljencrants/Fant, OVE III formant synthesizer
• Denes, word concatenation
• Rabiner, hardware for synthesis ("we were away a year ago")
• Itakura, digital filters
• several ~50 words recognizers:
Velichko, Rao, Sakoe, Erman, Dreyfus-Graf, Pols, De Mori
• Carré, Gubrynowicz, speaker identification
• Mathews, music synthesis
• plenaries: a.o. Chistovich (vowel discrimination), Flanagan (focal points in speech communication research)

Some observations ICA'71

• no text-to-speech synthesis-by rule
(only manipulated, parameterized analysis/resynthesis)
• minor interest in prosody (Fujisaki, Coker, Umeda)
• only carefully-read isolated-word recognition
(only Erman used telephone speech, Neely speech in noise)
• neither LPC nor HMM yet
• hardware signal processing (rather than fast PC)
• papers presented in En, Fr, Ge and Ru!

My important events

• research attitude at TNO Soesterberg
• NATO RSG 10 on Speech Processing
• frequent (inter)national contacts
• EU-projects SAM, SPIN, Eagles, EuroCocosda
• Invariance and variability Symp., MIT (1983)
• Eur. Speech Comm. Ass. ESCA (1987)
• ETRW 'Speech I/O Assessment and Sp. DB's'
• Dutch programs ASSP, CGN, IMIX, Stevin

My observations

• science meets technology (e.g., HSR vs. ASR)
• potential phonetic contributions to technology:
e.g., front end, data reduction, potential role of context and prosody, computational phonetics
• speech borrows and improves technologies
e.g., FFT, DTW, PCA, HMM, ANN, WER, perplexity
• importance of fast hardware, huge storage, and user-friendly software (e.g., ILS, Waves, HTK, praat)

My observations (cont'd)

example of spectro-temporal analysis

• phonetics preaches formants, but undo-able
• psycho-acoustics recommends critical bands
• practical implementation via bandfilters
• further data reduction via PCA, MFCC, etc.
• high correlation with Fi and perception
• good for e.g., ASR, speaker normalization, studying child vocalizations, language change (Polder Dutch), pathological speech

My observations (cont'd)

• research is stimulated but also blocked by potential applications (e.g. single-voice concatenative synthesis)
• neither funding organizations (e.g., DARPA, European Framework) nor application demands (e.g., aids for the handicapped, minority languages, ICT, internet, GSM, dialog systems) have strongly influenced speech research and developments

My observations (cont'd)

• speech companies come and go
• importance of annotated speech databases
• much better infrastructure (LDC, SPEX, ELRA)
• from components to complete dialog systems
• challenge from simple to very complex tasks
• importance of systematic and diagnostic evaluation, also in actual applications
• importance of young enthusiastic co-workers

Why speech research so rewarding

• it is everybody's science:
everybody communicates, speaks and listens
• field is strongly interdisciplinary
from acoustics and AI, to linguistics and ergonomics
• for each level of development there are potential applications
• ever new challenges
robust and/or adaptable to almost everything, more natural (style, emotion), multilingual, multimodal, better than human
• speech scientists are nice people

