Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Tue-Ses1-O2:
Language acquisition

Time:Tuesday 10:00 Place:East Wing 1 Type:Oral
Chair:Maria Uther

10:00Connecting Rhythm and Prominence in Automatic ESL Pronunciation Scoring

Emily Nava (University of Southern California)
Joseph Tepperman (University of Southern California)
Louis Goldstein (University of Southern California)
Maria Luisa Zubizarreta (University of Southern California)
Shrikanth Narayanan (University of Southern California)

Past studies have shown that a native Spanish speaker's use of phrasal prominence is a good indicator of her level of English prosody acquisition. Because of the cross-linguistic differences in the organization of phrasal prominence and durational contrasts, we hypothesize that those speakers with English-like prominence in their L2 speech are also expected to have acquired English-like rhythm. Statistics from a corpus of native and nonnative English confirm that speakers with an English-like phrasal prominence are also the ones who use English-like rhythm. Additionally, two methods of automatic score generation based on vowel duration times demonstrate a correlation of at least 0.6 between these automatic scores and subjective scores for phrasal prominence. These findings suggest that simple vowel duration measures obtained from standard automatic speech recognition methods can be salient cues for estimating subjective scores of prosodic acquisition, and of pronunciation in general.

10:20Evaluating parameters for mapping adult vowels to imitative babbling

Ilana Heintz (The Ohio State University)
Mary Beckman (The Ohio State University)
Eric Fosler-Lussier (The Ohio State University)
Lucie Ménard (Université de Québec à Montréal)

We design a neural network model of first language acquisition to explore the relationship between child and adult speech sounds. The model learns simple vowel categories using a produce-and-perceive babbling algorithm in addition to listening to ambient speech. The model is similar to that of Westermann & Miranda (2004), but adds a dynamic aspect in that it adapts in both the articulatory and acoustic domains to changes in the child's speech patterns. The training data is designed to replicate infant speech sounds and articulatory configurations. By exploring a range of articulatory and acoustic dimensions, we see how the child might learn to draw correspondences between his or her own speech and that of a caretaker, whose productions are quite different from the child's. We also design an imitation evaluation paradigm that gives insight into the strengths and weaknesses of the model.

10:40Intonation of Japanese sentences spoken by English speakers

Chiharu Tsurutani (Griffith University, Australia)

This study investigated intonation of Japanese sentences spoken by Australian English speakers and the influence of their first language (L1) prosody on their intonation of Japanese sentences. The second language (L2) intonation is a complicated product of the L1 transfer at two levels of prosodic hierarchy: at word level and at phrase levels. L2 speech is hypothesized to retain the characteristics of L1, and to gain marked features of the target language only during the late stage of acquisition. Investigation of this hypothesis involved acoustic measurement of L2 speakers’ intonation contours, and comparison of these contours with those of native speakers.

11:00KLAIR: a Virtual Infant for Spoken Language Acquisition Research

Mark Huckvale (University College London)
Ian Howard (University of Cambridge)
Sascha Fagel (Berlin Institute of Technology)

Recent research into the acquisition of spoken language has stressed the importance of learning through embodied linguistic interaction with caregivers rather than through passive observation. However the necessity of interaction makes experimental work into the simulation of infant speech acquisition difficult because of the technical complexity of building real-time embodied systems. In this paper we present KLAIR: a software toolkit for building simulations of spoken language acquisition through interactions with a virtual infant. The main part of KLAIR is a sensori-motor server that supplies a client machine learning application with a virtual infant on screen that can see, hear and speak. By encapsulating the real-time complexities of audio and video processing within a server that will run on a modern PC, we hope that KLAIR will encourage and facilitate more experimental research into spoken language acquisition through interaction.

11:20An Articulatory Analysis of Phonological Transfer Using Real-Time MRI

Joseph Tepperman (University of Southern California)
Erik Bresch (University of Southern California)
Yoon-Chul Kim (University of Southern California)
Sungbok Lee (University of Southern California)
Louis Goldstein (University of Southern California)
Shrikanth Narayanan (University of Southern California)

Phonological transfer is the influence of a first language on phonological variations made when speaking a second language. With automatic pronunciation assessment applications in mind, this study intends to uncover evidence of phonological transfer in terms of articulation. Real-time MRI videos from three German speakers of English and three native English speakers are compared to uncover the influence of German consonants on close English consonants not found in German. Results show that nonnative speakers demonstrate the effects of L1 transfer through the absence of articulatory contrasts seen in native speakers, while still maintaining minimal articulatory contrasts that are necessary for automatic detection of pronunciation errors, encouraging the further use of articulatory models for speech error characterization and detection.

11:40Do Multiple Caregivers Speed up Language Acquisition?

Louis ten Bosch (Radboud University Nijmegen)
Okko Rasanen (Helsinki University of Technology)
Joris Driesen (Catholic University of Leuven)
Guillaume Aimetti (University of Sheffield)
Toomas Altosaar (Helsinki University of Technology)
Lou Boves (Radboud University Nijmegen)
Athena Corns (Radboud University Nijmegen)

In this paper we compare three different implementations of language learning to investigate the issue of speaker-dependent initial representations and subsequent generalization. These implementations are used in a comprehensive model of language acquisition under development in the FP6 FET project ACORNS. All algorithms are embedded in a cognitively and ecologically plausible framework, and perform the task of detecting word-like units without any lexical, phonetic, or phonological information. The results show that the computational approaches differ with respect to the extent they deal with unseen speakers, and how generalization depends on the variation observed during training.