|
10thAnnual Conference of the International Speech Communication Association
Interspeech 2009 Brighton
|
Plenary Speakers
The Interspeech 2009 Organising committee are pleased to announce the acceptance of the following distinguished speakers to give plenary talks at the conference.
Monday: 2009 ISCA Medallist - Sadaoki Furui, Tokyo Institute of Technology
|
|
Sadaoki Furui is currently a Professor at Tokyo Institute of Technology, Department of Computer Science. He is engaged in a wide range of research on speech analysis, speech recognition, speaker recognition, speech synthesis, and multimodal human-computer interaction and has authored or coauthored over 800 published articles. He is a Fellow of the IEEE, the International Speech Communication Association (ISCA), the Institute of Electronics, Information and Communication Engineers of Japan (IEICE), and the Acoustical Society of America. He has served as President of the Acoustical Society of Japan (ASJ) and the ISCA. He has served as a member of the Board of Governor of the IEEE Signal Processing (SP) Society and Editor-in-Chief of both the Transaction of the IEICE and the Journal of Speech Communication. He has received the Yonezawa Prize, the Paper Award and the Achievement Award from the IEICE (1975, 88, 93, 2003, 2003, 2008), and the Sato Paper Award from the ASJ (1985, 87). He has received the Senior Award and Society Award from the IEEE SP Society (1989, 2006), the Achievement Award from the Minister of Science and Technology and the Minister of Education, Japan (1989, 2006), and the Purple Ribbon Medal from Japanese Emperor (2006). In 1993 he served as an IEEE SPS Distinguished Lecturer.
|
Selected topics from 40 years of research on speech and speaker recognition
Download slide presentation
This talk summarizes my 40 years research on speech and speaker recognition, focusing on selected topics that I have investigated at NTT Laboratories, Bell Laboratories and Tokyo Institute of Technology with my colleagues and students. These topics include: the importance of spectral dynamics in speech perception; speaker recognition methods using statistical features, cepstral features, and HMM/GMM; text-prompted speaker recognition; speech recognition by dynamic features; Japanese LVCSR; spontaneous speech corpus construction and analysis; spontaneous speech recognition; automatic speech summarization; WFST-based decoder development and its applications; and unsupervised model adaptation methods.
|
Tuesday: Tom Griffiths, UC Berkeley
|
|
Tom Griffiths is an Assistant Professor of Psychology and Cognitive Science at UC Berkeley, with courtesy appointments in Computer Science and Neuroscience. His research explores connections between human and machine learning, using ideas from statistics and artificial intelligence to try to understand how people solve the challenging computational problems they encounter in everyday life. He received his PhD in Psychology from Stanford University in 2005, and taught in the Department of Cognitive and Linguistic Sciences at Brown University before moving to Berkeley. His work and that of his students has received awards from the Neural Information Processing Systems conference and the Annual Conference of the Cognitive Science Society, and in 2006 IEEE Intelligent Systems magazine named him one of "Ten to watch in AI."
|
Connecting human and machine learning via probabilistic models of cognition
Download slide presentation
Human performance defines the standard that machine learning systems aspire to in many areas, including learning language. This suggests that studying human cognition may be a good way to develop better learning algorithms, as well as providing basic insights into how the human mind works. However, in order for ideas to flow easily from cognitive science to computer science and vice versa, we need a common framework for describing human and machine learning. I will summarize recent work exploring the hypothesis that probabilistic models of cognition, which view learning as a form of statistical inference, provide such a framework, including results that illustrate how novel ideas from statistics can inform cognitive science. Specifically, I will talk about how probabilistic models can be used to
identify the assumptions of learners, learn at different levels of abstraction, and link the inductive biases of individuals to cultural universals.
|
Wednesday: Deb Roy, MIT Media Lab
|
|
Deb Roy directs the Media Lab's Cognitive Machines group, is founding director of MIT’s Center for Future Banking, and chairs the academic program in Media Arts and Sciences. A native of Canada, he received his bachelor of computer engineering from the University of Waterloo in 1992 and his PhD in cognitive science from MIT in 1999. He joined the MIT faculty in 2000 and was named AT&T Associate Professorship of Media Arts and Sciences in 2003.
Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. To enable this work, he has developed new data-driven methods for analyzing and modeling human linguistic and social behavior. He has begun exploring applications of these methods to a range of new domains from financial behavior to autism. Roy has authored numerous scientific papers in the areas of artificial intelligence, cognitive modeling, human-machine interaction, data mining and information visualization.
|
New Horizons in the Study of Language Development
Emerging forms of ecologically-valid longitudinal recordings of human behavior and social interaction promise fresh perspectives on age-old questions of child development. In a pilot effort, 240,000 hours of audio and video recordings of one child’s life at home are being analyzed with a focus on language development. To study a corpus of this scale and richness, current methods of developmental sciences are insufficient. New data analysis algorithms and methods for interpretation and computational modeling are under development. Preliminary speech analysis reveals surprising levels of linguistic “finetuning” by caregivers that may provide crucial support for word learning. Ongoing analysis of various other aspects of the corpus aim to model detailed aspects of the child’s language development as a function of learning mechanisms combined with everyday experience. Plans to collect similar corpora from more children based on a streamlined recording system are underway.
|
Thursday: Mari Ostendorf, University of Washington
|
|
Mari Ostendorf received the Ph.D. in electrical engineering from Stanford University. After working at BBN Laboratories and Boston University, she joined the University of Washington (UW) in 1999. She has also been a visiting researcher at the ATR Interpreting Telecommunications Laboratory and at the University of Karlsruhe. At UW, she is currently an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. Currently, she is the Associate Dean for Research and Graduate Studies in the UW College of Engineering. She teaches undergraduate and graduate courses in signal processing and statistical learning, including a design-oriented freshman course that introduces students to signal processing and communications.
Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 200 publications and 2 paper awards. Prof. Ostendorf has served as co-Editor of Computer Speech and Language, as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing, and is currently on the IEEE Signal Processing Society Board of Governors and the ISCA Advisory Council. She is a Fellow of IEEE and ISCA.
|
Transcribing Speech for Spoken Language Processing
Download slide presentation
As storage costs drop and bandwidth increases, there has been a rapid growth of spoken information available via the web or in online archives -- including radio and TV broadcasts, oral histories, legislative proceedings, call center recordings, etc. -- raising problems of document retrieval, information extraction, summarization and translation for spoken language. While there is a long tradition of research in these technologies for text, new challenges arise when moving from written to spoken language. In this talk, we look at differences between speech and text, and how we can leverage the information in the speech signal beyond the words to provide structural information in a rich, automatically generated transcript that better serves language processing applications. In particular, we look at three interrelated types of structure (segmentation, prominence and syntax), methods for automatic detection, the benefit of optimizing rich transcription for the target language processing task, and the impact of this structural information in tasks such as parsing, topic detection, information extraction and translation.
|
|
|