Smoothed Frame-Level SINR and Its Estimation for Sensor Selection in Distributed Acoustic Sensor Networks
Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
Recommended citation: S. Guan, M. Wang, Z. Bai, J. Wang, J. Chen and J. Benesty, "Smoothed Frame-Level SINR and Its Estimation for Sensor Selection in Distributed Acoustic Sensor Networks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4554-4568, 2024, 10.1109/TASLP.2024.3477277. https://ieeexplore.ieee.org/document/10711254
Distributed acoustic sensor network (DASN) refers to a sound acquisition system that consists of a collection of microphones randomly distributed across a wide acoustic area. Theory and methods for DASN are gaining increasing attention as the associated technologies can be used in a broad range of applications to solve challenging problems. However, unlike traditional microphone arrays or centralized systems, properly exploiting the redundancy among different channels in DASN is facing many challenges including but not limited to variations in pre-amplification gains, clocks, sensors’ response, and signal-to-interference-plus-noise ratios (SINRs). Selecting appropriate sensors relevant to the task at hand is therefore crucial in DASN. In this work, we propose a speaker-dependent smoothed frame-level SINR estimation method for sensor selection in multi-speaker scenarios, specifically addressing source movement within DASN. Additionally, we devise an approach for similarity measurement to generate dynamic speaker embeddings resilient to variations in reference speech levels. Furthermore, we introduce a novel loss function that integrates classification and ordinal regression within a unified framework. Extensive simulations are performed and the results demonstrate the efficacy of the proposed method in accurately estimating smoothed frame-level SINR dynamically, yielding state-of-the-art performance.