Data-driven Approaches in Acoustic Signal Processing: Methods and Applications

Three-stage training and orthogonality regularization for spoken language recognition

Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack...

Authors: Zimu Li, Yanyan Xu, Dengfeng Ke and Kaile Su

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:14

Content type: Methodology Published on: 6 April 2023
- View Full Text
- View PDF
Research on monaural speech segregation based on feature selection

Speech feature model is the basis of speech and noise separation, speech expression, and different styles of speech conversion. With the development of signal processing methods, the feature types and dimensio...

Authors: Xiaoping Xie, Yongzhen Chen, Rufeng Shen and Dan Tian

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:10

Content type: Research Published on: 16 February 2023
- View Full Text
- View PDF
Nonlinear residual echo suppression based on dual-stream DPRNN

The acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and the far-end signal. Usually, a post-processing module is required to further suppr...

Authors: Hongsheng Chen, Guoliang Chen, Kai Chen and Jing Lu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:35

Content type: Research Published on: 7 September 2021
- View Full Text
- View PDF
Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

The performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world...

Authors: Masoud Geravanchizadeh, Elnaz Forouhandeh and Meysam Bashirpour

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:31

Content type: Research Published on: 4 August 2021
- View Full Text
- View PDF
Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system

Many end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data th...

Authors: Gui-Xin Shi, Wei-Qiang Zhang, Guan-Bo Wang, Jing Zhao, Shu-Zhou Chai and Ze-Yu Zhao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:27

Content type: Research Published on: 8 July 2021
- View Full Text
- View PDF
Geometry calibration in wireless acoustic sensor networks utilizing DoA and distance information

Due to the ad hoc nature of wireless acoustic sensor networks, the position of the sensor nodes is typically unknown. This contribution proposes a technique to estimate the position and orientation of the sens...

Authors: Tobias Gburrek, Joerg Schmalenstroeer and Reinhold Haeb-Umbach

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:25

Content type: Methodology Published on: 2 July 2021
- View Full Text
- View PDF
Components loss for neural networks in mask-based speech enhancement

Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel comp...

Authors: Ziyi Xu, Samy Elshamy, Ziyue Zhao and Tim Fingscheidt

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:24

Content type: Research Published on: 2 July 2021
- View Full Text
- View PDF
Multi-source localization by using offset residual weight

Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-F...

Authors: Maoshen Jia, Shang Gao and Changchun Bao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:23

Content type: Research Published on: 24 June 2021
- View Full Text
- View PDF
Neural network-based non-intrusive speech quality assessment using attention pooling function

Recently, the non-intrusive speech quality assessment method has attracted a lot of attention since it does not require the original reference signals. At the same time, neural networks began to be applied to ...

Authors: Miao Liu, Jing Wang, Weiming Yi and Fang Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:20

Content type: Research Published on: 17 May 2021
- View Full Text
- View PDF
Frequency-dependent auto-pooling function for weakly supervised sound event detection

Sound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotat...

Authors: Sichen Liu, Feiran Yang, Yin Cao and Jun Yang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:19

Content type: Research Published on: 17 May 2021
- View Full Text
- View PDF
End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network

Amongst the various characteristics of a speech signal, the expression of emotion is one of the characteristics that exhibits the slowest temporal dynamics. Hence, a performant speech emotion recognition (SER)...

Authors: Duowei Tang, Peter Kuppens, Luc Geurts and Toon van Waterschoot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:18

Content type: Research Published on: 12 May 2021
- View Full Text
- View PDF
Dynamically localizing multiple speakers based on the time-frequency domain

In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, tim...

Authors: Hodaya Hammer, Shlomo E. Chazan, Jacob Goldberger and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:16

Content type: Research Published on: 8 April 2021
- View Full Text
- View PDF
A CNN-based approach to identification of degradations in speech signals

The presence of degradations in speech signals, which causes acoustic mismatch between training and operating conditions, deteriorates the performance of many speech-based systems. A variety of enhancement tec...

Authors: Yuki Saishu, Amir Hossein Poorjam and Mads Græsbøll Christensen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:9

Content type: Research Published on: 5 February 2021
- View Full Text
- View PDF
Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

Over the recent years, machine learning techniques have been employed to produce state-of-the-art results in several audio related tasks. The success of these approaches has been largely due to access to large...

Authors: Rajat Hebbar, Pavlos Papadopoulos, Ramon Reyes, Alexander F. Danvers, Angelina J. Polsinelli, Suzanne A. Moseley, David A. Sbarra, Matthias R. Mehl and Shrikanth Narayanan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:7

Content type: Research Published on: 3 February 2021
- View Full Text
- View PDF
Audio source separation by activity probability detection with maximum correlation and simplex geometry

Two novel methods for speaker separation of multi-microphone recordings that can also detect speakers with infrequent activity are presented. The proposed methods are based on a statistical model of the probab...

Authors: Bracha Laufer-Goldshtein, Ronen Talmon and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:5

Content type: Research Published on: 28 January 2021
- View Full Text
- View PDF