Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus

Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022

In this paper, we first develop a multi-modal Mandarin corpus, which contains air- and bone-conducted synchronized speech (ABCS). Then, we propose a multi-modal conformer ASR system based on a novel multi-modal transducer.

Recommended citation: M. Wang, J. Chen, X. -L. Zhang and S. Rahardja, "End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 513-524, 2023, doi: 10.1109/TASLP.2022.3224305. https://ieeexplore.ieee.org/document/9961873

Smoothed Frame-Level SINR and Its Estimation for Sensor Selection in Distributed Acoustic Sensor Networks

Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

In this work, we propose a speaker-dependent smoothed frame-level SINR estimation method for sensor selection in multi-speaker scenarios, specifically addressing source movement within DASN. Additionally, we devise an approach for similarity measurement to generate dynamic speaker embeddings resilient to variations in reference speech levels. Furthermore, we introduce a novel loss function that integrates classification and ordinal regression within a unified framework.

Recommended citation: S. Guan, M. Wang, Z. Bai, J. Wang, J. Chen and J. Benesty, "Smoothed Frame-Level SINR and Its Estimation for Sensor Selection in Distributed Acoustic Sensor Networks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4554-4568, 2024, 10.1109/TASLP.2024.3477277. https://ieeexplore.ieee.org/document/10711254

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.