Computational auditory scene analysis (CASA) system is well used in speech enhancement area in recent years. We propose a new system that combines CASA and spectral subtraction to get better enhanced speech. The CASA part consists of the latest method deep neural networks (DNNs). The original way to reconstruct the denoise signal is to use the estimated masks with direct overlap-add method ignoring the information of noise within the frames. In our system, we estimate self-adapted thresholds for each channel by Gaussian Mixture Model from the estimated ratio masks (ERMs) to separate noise and speech of each channel. In this way, we make full use of the information within frames. The results show increase in both objective and subjective evaluation.
Download Full PDF Version (Non-Commercial Use)