Speech Lab
Research
Research in the SpeechLab at TUDelft aims to build inclusive speech technology that can be used by everyone irrespective of how they speak or the language they speak, with a focus on automatic speech recognition. In order to do so, several highly interesting challenges need to be overcome:
- The high diversity in languages in the world
- The high diversity in types of speech
- Dealing with low-resource scenarios as for many languages and speech types, only (very) little data is available
In doing so, we not only take into account technological aspects, but look at ethical and societal aspects, and use the best speech recognisers that exist, i.e., human listeners, as a source of inspiration. Our research thus combines speech technology with linguistics, psycholinguistics, and ethics.
In our research, we combine different types of research techniques ranging from human listening experiments, eye-tracking, EEG, and computational modelling to deep neural networks.
The research in the SpeechLab focusses on, but is not limited to:
- Building computational models of human speech processing using techniques from automatic speech recognition and machine learning
- Speech technology for pathological speech
- Bias in automatic speech recognition
- Multi-modal speech technology
- Building speech technology for under-resourced languages and languages without a common written language
- Visualisations of the speech representations in deep neural networks
- Systematic comparisons between human and machine speech processing architectures and performance
- Using knowledge about human speech processing to build help improve speech processing algorithms
A short video explaining our research can be found here.
Current Projects
- NSF 19-10319: (RI Small): Automatic Creation of New Phone Inventories; PIs Mark Hasegawa-Johnson (UIUC) and Najim Dehak (JHU), Unfunded collaborator (July 2019 – June 2022).
- NSF & Amazon 2147350 (FAIN, IIS): A New Paradigm for the Evaluation and Training of Inclusive Automatic Speech Recognition; PIs Mark Hasegawa-Johnson & Zsuzsanna Fagyal (UIUC), Najim Dehak, Piotr Zelasko, & Laureano Moro-Velazquez (JHU), Unfunded collaborator (Feb 2022 – Jan 2025).
The Team
- Dr. Odette Scharenborg: Associate professor - Inclusive speech technology
- Dr. Jorge Martinez Castaneda: Assistant professor – Speech processing in adverse conditions; Multimodal processing; Audio processing
- Dr. Zhengyun Yue: Assistant professor – Speech technology for health care
- Dr. Tanvina Patel: Post-doc – Inclusive automatic speech recognition; Child speech recognition
- Yuanyuan Zhang: PhD student – Speech technology for atypical speech
- Dimme de Groot: PhD student – Multimodal speech enhancement