Skip to main content

Deep Learning for Speech and Language Processing Applications

Deep learning techniques have enjoyed enormous success in the speech and language processing community over the past few years, beating previous state-of-the-art approaches to acoustic modeling, language modeling, and natural language processing. A common theme across different tasks is that that the depth of the network allows useful representations to be learned. For example, in acoustic modeling, the ability of deep architectures to disentangle multiple factors of variation in the input, such as various speaker-dependent effects on speech acoustics, has led to excellent improvements in speech recognition performance on a wide variety of tasks. We as a community should continue to understand what makes deep learning successful for speech and language, and how further improvements can be achieved.

Edited by: Michiel Bacchiani, Hui Jiang, B. Kingsbury, Tara Sainath, Frank Seide and Andrew Senior

  1. Research

    Wise teachers train better DNN acoustic models

    Automatic speech recognition is becoming more ubiquitous as recognition performance improves, capable devices increase in number, and areas of new application open up. Neural network acoustic models that can u...

    Ryan Price, Ken-ichi Iso and Koichi Shinoda

    EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:10

    Published on: 12 April 2016

  2. Research

    Exploiting spectro-temporal locality in deep learning based acoustic event detection

    In recent years, deep learning has not only permeated the computer vision and speech recognition research fields but also fields such as acoustic event detection (AED). One of the aims of AED is to detect and ...

    Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita and Tomohiro Nakatani

    EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:26

    Published on: 14 September 2015

  3. Research

    Phone recognition with hierarchical convolutional deep maxout networks

    Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that conv...

    László Tóth

    EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:25

    Published on: 4 September 2015

  4. Research

    Exploiting foreign resources for DNN-based ASR

    Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specifi...

    Petr Motlicek, David Imseng, Blaise Potard, Philip N. Garner and Ivan Himawan

    EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:17

    Published on: 26 June 2015