This is a supporting webpage of the paper

Rare Time Series Motif Discovery from Unbounded Streams

by

Nurjahan Begum and Eamonn Keogh

The paper is here

Real Life Case Study 1: National Public Radio (NPR) dataset

We provide the visual demonstration of our algorithm using a ~30 hour long audio trace we recorded. The dataset is a live stream recording of Southern California Public Radio (89.3 KPCC) on August 1, 2013 starting from 2:39 PM PST. The recording is available here.

We converted the entire audio stream to MFCC at a sampling frequency of 62.4 Hz, and ran our algorithm on the 2nd coefficient space. The time series in MFCC space has ~6.6 million points. The MFCC converted time series is available here. We used a cache which could buffer only at most 5% of the data (1/4000 of the possible subsequences). We searched for 4 second long motifs in this dataset. Below we show four examples of the rare motifs we found, and the corresponding timestamps of their occurrences in the data.









Real Life Case Study 2: Wildlife Monitoring

We converted the ten hour long spooky night sound of some forest to MFCC space. In the MFCC space, we planted ~3 seconds long calls of White Crowned Sparrow (Zonotrichia leucophrys pugetensis and Zonotrichia leucophrys), and ran our algorithm with a cache that can buffer only 1/4000 of the subsequences (5% of the data). Out of 100 nights, we could detect the calls of White Crowned Sparrow 98 nights.

The false positives we discovered were also bird sounds which were already in the data. Below we show three examples of the false positives:





The MFCC space night sound is available here.

The planted MFCC space bird calls are available here.

Scalability



The spreadsheet of this experiment is here.



Real Life Case Study 3: Energy Disaggregation

In this case study, we used a 1 year long electricity consumption data of a house in British Columbia (Canada) and searched for ~1.5 hour long Dishwasher cycles in a noisy time series of refrigerator and dishwasher electricity usage.

Rate of Detection

In this experiment we used class 1 patterns from MALLAT Dataset as target motifs in a long time series of random walks.

Data Reduction Experiment

Results for more datasets where cardinality reduction performs better than downsampling and dimensionality reduction are available here.

Adaptive Threshold as a Function of the Rareness



Setting appropriate SAX parameters



Code

An example Bloom filter implementation is here.
MFCC conversion code is here.
An example code with a 0.01 million long bird dataset with all parameters set is here.