This webpage was build in support of the UCR Suite; Software
that enables ultrafast subsequence search under both Dynamic Time Warping (DTW)
and Euclidean Distance (ED). The work first appeared in a SIGKDD 2012 paper.
Authors Rakthanmanon,
Campana, Mueen and Batista contributed equally, and
should be consider joint first authors.
How fast is the UCR-Suite? It
depends on the data, query length, query shape, hardware, warping
constraint etc. However, to a first degree approximation:
We can search a million datapoints in a second...
We can search billions of datapoints in minutes...
We can search trilliions of datapoints in hours.
What are the advantages of the UCR-Suite?
It is exact, not aproximate.
It does not require parameters to be set.
It requires zero preprocessing time.
It correctly z-normalizes the data.
It has no minimum or maximum query length (We have searched queries as short as 16 and as long as 72,500, see DNA video)
The same idea works for both streaming data, and batch offline search.
Finally, we are simply much faster than any known technique.
Here we show we
can search a day-long ECG tracing in 35 seconds under DTW, using a single core.
Using the same
query, we can search a year of ECG (8,518,554,188 datapoints) in 18 minutes
using a multi-core machine.
Thus we can
search 256Hz signals about thirty thousand times faster than real time.
Here we show we can support very long queries. We search for a query of length 72,500 in 21,435,268 datapoints in 18 seconds. The refernce dendrogram we compared to at the end of this video is from: D. P. Locke, et al. 2011. Comparative and demographic analysis of orangutan genomes. Nature 469, 529-533.
How does changing the width of the warping effect the speed-up? See here for the numbers,
however, in brief, it makes very little difference. Over the range of 0
to 15, which would include the best accuracy setting for the vast
majority of the UCR archive problems, the difference is bearly
perceptable
The code for random walk is here, including the exact seeds we used. See also.
The 20 million random walk dataset is here, including all the queries used.
The 22 hours and 23 minutes of ECG data (20,140,000 datapoints) shown in the video above is here, together with the exact query.
The 1,000 star light curve data is the entire training set from StarLighhtCurves archived here.
The 1.08 year of ECG data came from Physionet.org.
Here we list the exact set of data we trawled. This is too large for
our servers to host. If you want the exact data, just send us a 16 Gig thumb
drive with your return address, we will pay return shipping.