Skip to main content

Joint Audio-Visual Speech Processing

  1. The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for f...

    Authors: Ara V. Nefian, Luhong Liang, Xiaobo Pi, Xiaoxing Liu and Kevin Murphy
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:783042
  2. It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in ...

    Authors: Martin Heckmann, Frédéric Berthommier and Kristian Kroschel
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:720764
  3. We aim at modeling the appearance of the lower face region to assist visual feature extraction for audio-visual speech processing applications. In this paper, we present a neural network based statistical appe...

    Authors: Philippe Daubias and Paul Deléglise
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:720534
  4. This study examines relationships between external face movements, tongue movements, and speech acoustics for consonant-vowel (CV) syllables and sentences spoken by two male and two female talkers with differe...

    Authors: Jintao Jiang, Abeer Alwan, Patricia A. Keating, Edward T. Auer Jr. and Lynne E. Bernstein
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:506945
  5. Authors: Chalapathy Neti, Gerasimos Potamianos, Juergen Luettin and Eric Vatikiotis-Bateson
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:475826
  6. Visual speech recognition is an emerging research field. In this paper, we examine the suitability of support vector machines for visual speech recognition. Each word is modeled as a temporal sequence of visem...

    Authors: Mihaela Gordan, Constantine Kotropoulos and Ioannis Pitas
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:427615
  7. We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal ...

    Authors: David Sodoyer, Jean-Luc Schwartz, Laurent Girin, Jacob Klinkisch and Christian Jutten
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:382823
  8. There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield inform...

    Authors: Xiaozheng Zhang, Charles C. Broun, Russell M. Mersereau and Mark A. Clements
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:240192
  9. Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has bec...

    Authors: Eric K. Patterson, Sabri Gurbuz, Zekeriya Tufekci and John N. Gowdy
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:208541
  10. It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can...

    Authors: Dmitry N. Zotkin, Ramani Duraiswami and Larry S. Davis
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:162620
  11. We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio ...

    Authors: Petar S. Aleksic, Jay J. Williams, Zhilin Wu and Aggelos K. Katsaggelos
    Citation: EURASIP Journal on Advances in Signal Processing 2002 2002:150948