The task of identifying a speaker in very noisy conditions is the main problem of real applications, especially in a robotic system. Existing datasets are more focused on the generalization of the background and the acquisition system and are not representative of this kind of applications. Indeed, they are composed of audio samples acquired on open sources and, therefore, recorded with different audio setup and background noise without any filtering based on the application.
SpReW is an Italian speaker identification dataset acquired in both controlled and very crowded environments. It borns with the aim to make publicly available a novel challenging dataset for cognitive robotics applications.
In the whole, SpReW is composed of 493 audio samples for a total of about 1 hour (3305.52 seconds) of recording in four different environments. It contains the voices of 20 speakers of age between 20 and 50 years old with a 30% of female people. Furthermore, SpReW contains audio samples of each speaker recorded in all the proposed environments.
The audio samples have been recorded using the Samson UB1 omnidirectional microphone with a sampling rate of 16 KHz in four different sites. The details of the dataset composition are reported in the following table, where C00, C01, W01, and W02 represent the four recording environments and the acronym POI means Person Of Interest.
|# of POI||20||20||20
|# of recordings||200||104||104||85||493|
|Avg # of recordings per POI
|Avg sec per POI||66.995||33.968||36.218||33.228||165.276|
|Avg noise energy (dB)||-35||-30||-20||-17||–|
For a detailed explanation of the data set and if you aim to use this, please refer to the following papers:
- Antonio Roberto, Alessia Saggese, Mario Vento, A challenging voice dataset for robotic applications in noisy environments, International Conference on Computer Analysis of Images and Patterns (CAIP), 2019.
In order to download the data set click here.