Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Special Sessions

The INTERSPEECH 2009 Organisation Committee is pleased to announce acceptance of the following Special Sessions at Interspeech 2009 - the 10th Annual Conference of the International Speech Communication Association, to be held on September 6-10, 2009, in Brighton, United Kingdom.

Measuring the rhythm of speech

There has been considerable interest in the last decade in the modelling of rhythm both from a typological perspective (e.g. establishing objective criteria for classifying languages or dialect as stress timed, syllable timed or mora timed) and from the perspective of establishing evaluation metrics of non standard or deviant varieties of speech such as that obtained from non-native speakers, from speakers with pathological disabilities or from automatic speech synthesis. The aim of this special session will be to bring together a number of researchers who have contributed to this debate and to assess and discuss the current status of our understanding of the relative value of different metrics for different tasks.

Organiser: Daniel Hirst (daniel.hirst@lpl-aix.fr), Laboratoire Parole et Langage, Université de Provence

Machine Learning for Adaptivity in Spoken Dialogue Systems

In the past decade, research in the field of Spoken Dialogue Systems (SDS) has experienced increasing growth, and new applications include interactive mobile search, tutoring, and troubleshooting systems. The design and optimization of robust SDS for such tasks requires the development of dialogue strategies which can automatically adapt to different types of users and noise conditions. New statistical learning techniques are emerging for training and optimizing adaptive speech recognition, spoken language understanding, dialogue management, natural language generation, and speech synthesis in spoken dialogue systems. Among machine learning techniques for spoken dialogue strategy optimization, reinforcement learning using Markov Decision Processes (MDPs) and Partially Observable MDP (POMDPs) has become a particular focus. The purpose of this special session is to provide an opportunity for the international research community to share ideas on these topics and to have constructive discussions in a single, focussed, special conference session.

Organisers: Oliver Lemon (olemon@inf.ed.ac.uk), Edinburgh University, and Olivier Pietquin (olivier.pietquin@supelec.fr), Supélec - IMS Research Group

New Approaches to Modeling Variability for Automatic Speech Recognition

Despite great strides in the development of automatic speech recognition (ASR) technology, our community is still far from achieving its holy grail: an ASR system with performance comparable to humans in automatically transcribing unrestricted conversational speech, spoken by many speakers and in adverse acoustic environments. Many of the difficulties faced by ASR models are due to the high degree of variation in the acoustic waveforms associated with a given phonetic unit measured across different segmental and prosodic contexts. Such variation has both deterministic origins (intersegmental coarticulation; prosodic juncture and accent) and stochastic origins (token-to-token variability for utterances with the same segmental and prosodic structure). Current ASR systems successfully model acoustic variation that is due to adjacent phone context, but variation due to other sources, including prosodic context, speech rate, and speaker, is not adequately treated. The goal of this special session is to bring together researchers who are exploring alternative approaches to state-of-the-art ASR methodologies. Of special interest are new approaches that model variation in the speech signal at multiple levels, from both linguistic and extra-linguistic sources. In particular, we encourage the participation of those who are attempting to incorporate the insights that the field has gained over the past several decades from acoustic phonetics, speech production, speech perception, prosody, lexical access, natural language processing and pattern recognition to the problem of developing models of speech recognition that are robust to the full variability of speech.

Organisers: Carol Espy-Wilson (espy@umd.edu), Jennifer Cole (jscole@illinois.edu), Abeer Alwan, Louis Goldstein, Mary Harper, Elliot Saltzman, & Mark Hasegawa-Johnson.

Silent Speech Interfaces

A Silent Speech Interface (SSI) is an electronic system enabling speech communication to take place without the necessity of emitting an audible acoustic signal. By acquiring sensor data from elements of the human speech production process – from the articulators, their neural pathways, or the brain itself – an SSI produces a digital representation of speech which can synthesized directly, interpreted as data, or routed into a communications network. Due to this novel approach Silent Speech Interfaces have the potential to overcome the major limitations of traditional speech interfaces today, i.e. (a) limited robustness in the presence of ambient noise; (b) lack of secure transmission of private and confidential information; and (c) disturbance of bystanders created by audibly spoken speech in quiet environments; while at the same time retaining speech as the most natural human communication modality. The special session intends to bring together researchers in the areas of human articulation, speech and language technologies, data acquisition and signal processing, as well as in human interface design, software engineering and systems integration. Its goal is to promote the exchange of ideas on current SSI challenges and to discuss solutions, by highlighting, for each of the technological approaches presented, its range of applications, key advantages, potential drawbacks, and current state of development.

Further Details: Silent Speech Interfaces Web Site.

Organisers: Bruce Denby (denby@ieee.org), Université Pierre et Marie Curie, Paris, Tanja Schultz (tanja@ira.uka.de), Cognitive Systems Lab, University of Karlsruhe.

INTERSPEECH 2009 Emotion Challenge

The INTERSPEECH 2009 Emotion Challenge aims to help bridge the gap between the excellent research on human emotion recognition from speech and the low compatibility of results. The FAU Aibo Emotion Corpus of spontaneous, emotionally coloured speech, and benchmark results of the two most popular approaches will be provided by the organisers. This consists of nine hours of speech from 51 children, recorded at two different schools. This corpus allows for distinct definition of test and training partitions incorporating speaker independence as needed in most real-life settings. The corpus further provides a uniquely detailed transcription of spoken content with word boundaries, non-linguistic vocalisations, emotion labels, units of analysis, etc. The results of the Challenge will be presented at the Special Session and prizes will be awarded to the sub-challenge winners and a best paper.

Further Details: Emotion Challenge Web Site.

Organisers: Bjoern Schuller (schuller@tum.de), Technische Universitaet Muenchen, Germany, Stefan Steidl (steidl@informatik.uni-erlangen.de), FAU Erlangen-Nuremberg, Germany, Anton Batliner (batliner@informatik.uni-erlangen.de), FAU Erlangen-Nuremberg, Germany.

Advanced Voice Function Assessment

In order to advance the field of voice function assessment in a clinical setting, cooperation between clinicians and technologists is essential. The aim of this special session is to showcase work that crosses the borders between basic, applied and clinical research and highlights the development of partnership between technologists and healthcare professionals in advancing the protocols and technologies for the assessment of voice function.

Organisers: Anna Barney (ab3@soton.ac.uk), Institute of Sound and Vibration Research, UK, Mette Pedersen (m.f.pedersen@dadlnet.dk), Medical Centre, Voice Unit, Denmark.

Active Listening & Synchrony

Traditional approaches to Multimodal Interface design have tended to assume a "ping-pong" or "push-to-talk" approach to speech interaction wherein either the system or the interlocuting human is active at any one time. This is contrary to many recent findings in conversation and discourse analysis, where the definition of a "turn", or even an "utterance" is found to be very complex; people don’t "take turns" to talk in a typical conversational interaction, but they each contribute actively and interactively to the joint emergence of a "common understanding". . The aim of this special session, marking the 70th anniversary of synchrony research, is to bring together researchers from the varous different fields, who have special interest in novel techniques that are aimed at overcoming weaknesses of the "push-to-talk" approach in interface technology, or who have knowledge of the history of this field from which the research community could benefit.

Further Details: Active Listening & Synchrony Web Site.

Organisers: Nick Campbell (nick@tcd.ie), Anton Nijholt (anijholt@cs.utwente.nl), Joakim Gustafson (jocke@speech.kth.se), & Carl Vogel (vogel@tcd.ie)

Lessons and Challenges Deploying Voice Search

In the past year, a number of companies have deployed multimodal search applications for mobile phones. These applications enable spoken input for search, as an alternative to typing. There are many technical challenges associated with deploying such applications, including: High perplexity: A language model for general search must accommodate a very large vocabulary and tremendous range of possible inputs; Challenging acoustic environments: Mobile phones are often used when "on the go" - which can often be in noisy environments; Challenging usage scenarios: Mobile search may be used in challenging situations such as information access while driving a car. This session will focus on early lessons learned from usage data, challenges posed, and technical and design solutions to these challenges, as well as a look towards the future.

Organisers: Mike Cohen (mcohen@google.com), Google, Johan Schalkwyk (johans@google.com), Google, Mike Phillips (phillips@vlingo.com), Vlingo.