Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Tutorials Day - Sunday 6 September 2009

T-4: Emerging Technologies for Silent Speech Interfaces

Presented by Tanja Schultz and Bruce Denby

Outline

In the past decade, the performance of automatic speech processing systems, including speech recognition, text and speech translation, and speech synthesis, has improved dramatically. This has resulted in an increasingly widespread use of speech and language technologies in a wide variety of applications, such as commercial information retrieval systems, call center services, voice-operated cell phones or car navigation systems, personal dictation and translation assistance, as well as applications in military and security domains. However, speech-driven interfaces based on conventional acoustic speech signals still suffer from several limitations. Firstly, the acoustic signals are transmitted through the air and are thus prone to ambient noise. Despite tremendous efforts, robust speech processing systems, which perform reliably in crowded restaurants, airports, or other public places, are still not in sight. Secondly, conventional interfaces rely on audibly uttered speech, which has two major drawbacks: it jeopardizes confidential communications in public and it disturbs any bystanders. Services which require the access, retrieval, and transmission of private or confidential information, such as PINS, passwords, and security or safety information are particularly vulnerable.

Recently, Silent Speech Interfaces have been proposed which allow its users to communicate by speaking silently, i.e. without producing any sound. This is realized by capturing the speech signal at the early stage of human articulation, namely before the signal becomes airborne, and then transfer these articulation-related signals for further processing and interpretation. Due to this novel approach Silent Speech Interfaces have the potential to overcome the major limitations of traditional speech interfaces today, i.e. (a) limited robustness in the presence of ambient noise; (b) lack of secure transmission of private and confidential information; and (c) disturbance of bystanders created by audibly spoken speech in quiet environments; while at the same time retaining speech as the most natural human communication modality. The SSI furthermore could provide an alternative for persons with speech disabilities such as laryngectomy, as well as the elderly or weak who may not be healthy or strong enough to speak aloud effectively.

Speaker Biography

Tanja Schultz is a Full Professor at the Computer Science Department of Karlsruhe University in Germany and an Assistant Research Professor at the Language Technologies Institute at Carnegie Mellon University. She is the director of the Cognitive Systems Lab and director of the Center for Visually Impaired Students, both at Karlsruhe University. Her research activities focus on human-human communication and human-machine interfaces with a particular area of expertise in rapid adaptation of speech processing systems to new domains and languages. She co-edited a book on this subject and received several awards for this work. In 2001 she received the FZI price for her outstanding Ph.D. thesis on language independent and language adaptive speech recognition. In 2002 she received the Allen Newell Medal for Research Excellence from Carnegie Mellon for her contribution to Speech-to-Speech Translation and the ISCA best paper award for her publication on language independent acoustic modeling. In 2005 she was awarded the Carnegie Mellon Language Technologies Institute Junior Faculty Chair. Her recent research focuses on the development of human-centered technologies and intuitive human-machine interfaces based on biosignals, by capturing, processing, and interpreting signals such as muscle and brain activities. The development of the silent speech interface based on myoelectric signals received the Interspeech 2006 Demo award. Together with Prof. Denby she is a guest editor of the Speech Communication Special Issue on Silent Speech Interfaces to be published in 2009. Tanja Schultz is the author of more than 150 articles published in books, journals, and proceedings. She is a member of the IEEE Computer Society, the International Speech Communication Association ISCA, the European Language Resource Association, the Society of Computer Science (GI) in Germany, and currently serves as elected ISCA Board member, on several program committees, and review panels.

Bruce Denby is Full Professor of Electronics and Signal Processing at the Université Pierre et Marie Curie (Paris- VI), and Research Scientist at the Laboratoire d’Electronique ESPCI- ParisTech (CNRS) in Paris, France. He holds a BS degree from the California Institute of Technology (Caltech), MS from Rutgers University, and PhD from the University of California at Santa Barbara, all in physics. During post-doctoral studies in Switzerland, France, and the UK in the late 1980’s, he developed the Denby-Peterson contour extraction algorithm, and became well known for introducing statistical learning techniques to the experimental physics community. In 1995 he was named professor at the University of Versailles, France, where he created, and for 10 years directed, an innovative Master’s degree program in cellular telephone technology. Since transferring to Paris-VI in 2004, he has been leader in the area of applications of statistical learning techniques to real-time systems. Professor Denby is an Associate Editor of the journal Pattern Recognition, and has authored over 180 publications in international journals and peer-reviewed international conferences. He is a member of the International Speech Communication Association ISCA, the Association for Computing Equipment (ACM), Senior Member of IEEE, and member of the IEEE Computer, Communications, Consumer Electronics, and Instrumentation and Measurement Societies. He is one of the originators of the “Silent Speech Interface” concept, having authored in 2004, with Prof. Maureen Stone, a pioneering article on speech synthesis from ultrasound imagery of the tongue, and is coordinator of the OUISPER (Oral Ultrasound SynthetIc SPEech SouRce) Project, funded by the French Department of Defense (DGA) and the French Agence Nationale de la Recherche (ANR). He is also the primary guest editor of the Speech Communication Special Issue on Silent Speech Interfaces to be published in 2009. Prof. Denby’s current research interests include speech and audio signal processing, telecommunications, and radio engineering.