End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus

Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022

Recommended citation: M. Wang, J. Chen, X. -L. Zhang and S. Rahardja, "End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 513-524, 2023, doi: 10.1109/TASLP.2022.3224305. https://ieeexplore.ieee.org/document/9961873

In this paper, we first develop a multi-modal Mandarin corpus, which contains air- and bone-conducted synchronized speech (ABCS). Then, we propose a multi-modal conformer ASR system based on a novel multi-modal transducer.

Download paper here

Database