We provide the visual demonstration of our algorithm using a ~30 hour long audio trace we recorded. The dataset is a live stream recording of Southern California Public Radio
(89.3 KPCC) on August 1, 2013 starting from 2:39 PM PST. The recording is available here.
We converted the entire audio stream to MFCC at a sampling frequency of 62.4 Hz, and ran our algorithm on the 2nd coefficient space. The time series in MFCC space has ~6.6 million points.
The MFCC converted time series is available here.
We used a cache which could buffer only at most 5% of the data (1/4000 of the possible subsequences). We searched for 4 second long motifs in this dataset.
Below we show four examples of the rare motifs we found, and the corresponding timestamps of their occurrences in the data.
We converted the ten hour long spooky night sound of some forest to MFCC space.
In the MFCC space, we planted ~3 seconds long calls of White Crowned Sparrow (Zonotrichia leucophrys pugetensis
and Zonotrichia leucophrys), and ran our algorithm with a cache
that can buffer only 1/4000 of the subsequences (5% of the data). Out of 100 nights, we could detect the calls
of White Crowned Sparrow 98 nights.
The false positives we discovered were also bird sounds which were already in the data. Below we show three examples of the false positives:
The MFCC space night sound is available here.
The planted MFCC space bird calls are available here.
The spreadsheet of this experiment is here.
In this case study, we used a 1 year long electricity consumption data of a house in British Columbia (Canada) and searched for ~1.5 hour long Dishwasher cycles in a noisy time series of refrigerator and dishwasher electricity usage.
In this experiment we used class 1 patterns from MALLAT Dataset as target motifs
in a long time series of random walks.
Results for more datasets where cardinality reduction performs better than downsampling and dimensionality reduction are available here.
An example Bloom filter implementation is here.
MFCC conversion code is here.
An example code with a 0.01 million long bird dataset with all parameters set is
here.