Robust Voice Activity Detection with Deep Maxout Neural Networks

  •  Valentin Sergeyevich Mendelev    
  •  Tatiana Nikolaevna Prisyach    
  •  Alexey Alexandrovich Prudnikov    


Voice activity detection (VAD) under non-stationary noises is a very important task to solve when using a real-life system of automatic speech recognition, especially if a remote microphone is used. Many existing methods do not work well with noise that changes over time or with very low signal-to-noise ratio (SNR). This paper proposes a method based on deep maxout neural networks with dropout regularization. The method is effective even for very low SNR (up to -5dB). The robustness of the method is demonstrated by low FR/FA error rates on a test dataset that was recorded under conditions different from the training dataset.

This work is licensed under a Creative Commons Attribution 4.0 License.