Skip to main content

Deep Learning for Speech and Language Processing Applications

Deep learning techniques have enjoyed enormous success in the speech and language processing community over the past few years, beating previous state-of-the-art approaches to acoustic modeling, language modeling, and natural language processing. A common theme across different tasks is that that the depth of the network allows useful representations to be learned. For example, in acoustic modeling, the ability of deep architectures to disentangle multiple factors of variation in the input, such as various speaker-dependent effects on speech acoustics, has led to excellent improvements in speech recognition performance on a wide variety of tasks. We as a community should continue to understand what makes deep learning successful for speech and language, and how further improvements can be achieved.

Edited by: Michiel Bacchiani, Hui Jiang, B. Kingsbury, Tara Sainath, Frank Seide and Andrew Senior


  1. Automatic speech recognition is becoming more ubiquitous as recognition performance improves, capable devices increase in number, and areas of new application open up. Neural network acoustic models that can u...

    Authors: Ryan Price, Ken-ichi Iso and Koichi Shinoda
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:10
  2. Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberati...

    Authors: Yang Yu, Wenwu Wang and Peng Han
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:7
  3. In recent years, deep learning has not only permeated the computer vision and speech recognition research fields but also fields such as acoustic event detection (AED). One of the aims of AED is to detect and ...

    Authors: Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita and Tomohiro Nakatani
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:26
  4. Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specifi...

    Authors: Petr Motlicek, David Imseng, Blaise Potard, Philip N. Garner and Ivan Himawan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:17