![]() ![]() Compared with conventional AEC algorithms, deep learning-based AEC methods have the ability to recover the near-end signals from the microphone signals directly, and they do not need to identify the acoustic echo paths explicitly and also do not suffer from the non-unique solution problem. ![]() In recent years, deep learning-based methods have been employed in AEC and have achieved significant success. Besides, as the number of channels increases, the computational complexity and convergence time will increase, and the control of the step size becomes much more sophisticated. Although many algorithms have been proposed to decorrelate the loudspeaker signals so as to solve the non-unique solution problem, the reproduction quality and immersion of the far-end may be affected. ![]() However, the adaptive filtering-based algorithms may suffer from the well-known non-unique solution problem due to high cross-correlation between the loudspeaker signals, when there are two or more reproduction channels. Conventional AEC methods often use adaptive filters to identify the acoustic echo paths, and the echo signal in each microphone is then estimated and subtracted from each microphone signal. In a closed-loop teleconference system, the echo signal caused by the acoustic coupling between microphones and loudspeakers has a significant negative impact on hands-free speech communication systems. Surround sound systems offer the potential for immersive sound field reproduction, enhancing realism in virtual reality and multimedia communication systems, such as immersive teleconference and acoustic human–machine interfaces. Experimental results on both simulated and real acoustic environments showed the effectiveness of the proposed algorithm in surround AEC, and outperformed other competing methods in terms of the speech quality and the amount of echo reduction. To achieve a better generalization capability against different loudspeaker layouts, the compressed complex spectra of the first-order ambisonic signals (B-format) were sent to the neural network as the input features directly instead of using the ambisonic decoded signals (D-format). The ambisonics technique was adopted to record the surround sound for reproduction. This paper proposes a deep learning-based acoustic echo cancellation (AEC) method to recover the desired near-end speech from the microphone signals in surround sound systems. It is common that a hand-free speech communication system suffers from the acoustic echo problem, and the echo needs to be canceled or suppressed completely. Surround sound systems that play back multi-channel audio signals through multiple loudspeakers can improve augmented reality, which has been widely used in many multimedia communication systems. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |