November 14, 2016

Mivia Audio Events Dataset

The dataset

The MIVIA audio events data set is composed of a total of 6000 events for surveillance applications, namely glass breaking, gun shots and screams. The 6000 events are divided into a training set (composed of 4200 events) and a test set (composed of 1800 events).

In audio surveillance applications, the events of interest (for instance a scream) can occur at different distances from the microphone that correspond to different levels of the signal-to-noise ratio. Moreover, in these applications the events are generally mixed with a complex background, usually composed of several types of different sounds depending on the specific environments both indoor and outdoor (household appliances, cheering of crowds, talking people, traffic jam, passing cars or motorbikes etc.).

The data set is designed to provide each audio event at 6 different values of signal-to-noise ratio (namely 5dB, 10dB, 15dB, 20dB, 25dB and 30dB) and overimposed to different combinations of environmental sounds in order to simulate their occurrence in different ambiences.


The sounds have been registered with an Axis P8221Audio Module and an Axis T83 omnidirectional microphone for audio surveillance applicationsare, sampled at 32000 Hz and quantized at 16 bits per PCM sample. The audio clips are distributed as WAV files. The training set has a duration of about 20 hours while the test set of about 9 hours.
The events of interest are organized in three classes (glass breaking, gun shots and screams) and their duration in the training and test sets is reported in the following table

Training set Test set
#Events Duration (s) #Events Duration (s)
Background 58371,6 25036,8
Glass breaking 4200 6024,8 1800 2561,7
Gun shots 4200 1883,6 1800 743,5
Screams 4200 5488,8 1800 2445,4


A cross-validation ready version of the data set is also available. This version is organized into 5 folds: each fold contains 1200 events of interest superimposed to typical background sounds. The events are available at 6 different values of signal-to-noise ratio (namely 5dB, 10dB, 15dB, 20dB, 25dB and 30dB).


For a detailed explanation of the data set and if you aim to use this, please refer to the following papers:

  • Pasquale Foggia, Nicolai Petkov, Alessia Saggese, Nicola Strisciuglio, Mario Vento, Reliable Detection of Audio Events in Highly Noisy Environments, Pattern Recognition Letters, Available online 9 July 2015, ISSN 0167-8655,


In order to download the data set click here.