Interactive machine learning technologies harbor significant potential to personalize media recognition models, such as those for images and sounds, to individual users. However, GUIs (Graphical User Interfaces) for interactive machine learning have predominantly been researched for images and texts, with insufficient exploration for use cases targeting non-visual data like sound. In this study, we envisioned a scenario where users browse vast amounts of sound data while labeling training data corresponding to their target sound recognition classes. We delved into visualization techniques for comprehending the overall structure of the samples. From sound spectrograms to deep-learning-based searches from sound to images, we experimentally compared various visualization techniques. Through this, we discuss design guidelines for GUIs tailored for interactive sound recognition handling large volumes of sound data.
Reference Lecture
- Tatsuya Ishibashi, Yuri Nakao, Yusuke Sugano, “Investigating audio data visualization for interactive sound recognition”, in Proc. 25th International Conference on Intelligent User Interfaces (IUI 2020).