Deep Neural Network acoustic models for ASR

Date of Presentation: 
Wednesday, November 4, 2015
2015 Fall
Research Focus: 

Abstract - In the past few years, Deep Neural Networks (DNNs) have achieved the state-of-the-art performance in acoustic modelling on many standard benchmarks breaking records long held by Gaussian mixture models (GMMs). In this talk, I will present our work in DNNs acoustic models and in understanding why DNNs are more sensible choice for acoustic modelling than GMMs. We found that the depth and the distributed representation of DNNs are important to implicitly assimilate different spectro-temporal realizations of the same phone under various conditions using higher level representations that marginalize out undesirable information. This marginalization could be done explicitly by applying Couvolutional Neural Networks (CNNs) concepts, i.e. local weights and pooling, along the frequency axis. We found that Deep CNN acoustic models achieve about 5% to 12% relative improvements on standard large vocabulary tasks compared to the best fully connected DNN model.

Bio - Abdel-rahman Mohamed is a researcher at the speech&Dialogue group at MSR. He received his PhD degree from the University of Toronto. Before studying in Toronto, Abdel-rahman received his B.Sc. and M.Sc. from the Electronics and Communication Engineering Department, Cairo University in 2004 and 2007. From 2004 he worked in the speech research group at the RDI Company, Egypt. Then he joined the ESAT-PSI speech group at the Katholieke Universiteit Leuven, Belgium. His research focuses on developing machine learning techniques for automatic speech recognition and understanding.

Abdel-rahman Mohamed
Speaker affiliation: 
Speech & Dialogue group at Microsoft Research