Deep learning techniques have enjoyed enormous success in the speech and language processing community over the past few years, beating previous state-of-the-art approaches to acoustic modeling, language modeling, and natural language processing. A common theme across different tasks is that that the depth of the network allows useful representations to be learned. For example, in acoustic modeling, the ability of deep architectures to disentangle multiple factors of variation in the input, such as various speaker-dependent effects on speech acoustics, has led to excellent improvements in speech recognition performance on a wide variety of tasks. We as a community should continue to understand what makes deep learning successful for speech and language, and how further improvements can be achieved.
Edited by: Michiel Bacchiani, Hui Jiang, B. Kingsbury, Tara Sainath, Frank Seide and Andrew Senior