Thanks to the support of TD Ameritrade, I recently open sourced (BSD-3-Clause) a new, powerful, and scalable Python library called STUMPY that can be used for a variety of time series data mining tasks. At the heart of it, this library takes any time series or sequential data and efficiently computes something called the matrix profile, which, with only a few extra lines of code, enables you to perform:

  • pattern/motif (approximately repeated subsequences within a longer time series) discovery
  • anomaly/novelty (discord) discovery
  • shapelet discovery
  • semantic segmentation
  • density estimation
  • time series chains (temporally ordered set of subsequence patterns)
  • and more…

First, let’s install stumpy with Conda (preferred):

conda install -c conda-forge stumpy

or, alternatively, you can install stumpy with Pip:

pip install stumpy

Once stumpy is installed, typical usage would be to take your time series and compute the matrix profile:

import stumpy
import numpy as np

your_time_series = np.random.rand(10000)
window_size = 50  # Approximately, how many data points might be found in a pattern

matrix_profile = stumpy.stump(your_time_series, m=window_size)

For a more detailed example, check out our tutorials and documentation or feel free to file a Github issue. We welcome contributions in any form!

I’d love to hear from you so let me know what you think!


May 13, 2019