Supervised Deep Hashing for Efficient Audio Retrieval
Talk at Microsoft Research, Redmond, WA, USA (Paper)
Efficient retrieval of audio events can facilitate real-time implementation of numerous query and search-based systems. This work investigates the potency of different hashing techniques for efficient
audio event retrieval. Multiple state-of-the-art weak audio embeddings are employed for this purpose. The performance of four classical unsupervised hashing algorithms is explored as part of off-theshelf analysis. Then, we propose a partially supervised deep hashing framework that transforms the weak embeddings into a lowdimensional space while optimizing for efficient hash codes. The
model uses only a fraction of the available labels and is shown here
to significantly improve the retrieval accuracy on two widely employed audio event datasets. The extensive analysis and comparison
between supervised and unsupervised hashing methods presented
here, give insights on the quantizability of audio embeddings. This
work provides a first look in efficient audio event retrieval systems
and hopes to set baselines for future research.