{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Data Shift Detection & Filtering\n\nIdentifying data shifts/capacity changes in time series data\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This example covers identifying data shifts/capacity changes in a time series\nand extracting the longest time series segment free of these shifts, using\n:py:func:`pvanalytics.quality.data_shifts.detect_data_shifts` and\n:py:func:`pvanalytics.quality.data_shifts.get_longest_shift_segment_dates`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pvanalytics\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom pvanalytics.quality import data_shifts as ds\nimport pathlib"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As an example, we load in a simulated pvlib AC power time series with a\nsingle changepoint, occurring on October 28, 2015.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent\ndata_shift_file = pvanalytics_dir / 'data' / 'pvlib_data_shift.csv'\ndf = pd.read_csv(data_shift_file)\ndf.index = pd.to_datetime(df['timestamp'])\ndf['value'].plot()\nprint(\"Changepoint at: \" + str(df[df['label'] == 1].index[0]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we run the data shift algorithm (with default parameters)\non the data stream, using\n:py:func:`pvanalytics.quality.data_shifts.detect_data_shifts`. We plot the\npredicted time series segments, based on algorithm results.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "shift_mask = ds.detect_data_shifts(df['value'])\nshift_list = list(df[shift_mask].index)\nedges = [df.index[0]] + shift_list + [df.index[-1]]\nfig, ax = plt.subplots()\nfor (st, ed) in zip(edges[:-1], edges[1:]):\n    ax.plot(df.loc[st:ed, \"value\"])\nplt.show()\n\n# We zoom in around the changepoint to more closely show the data shift. Time\n# series segments pre- and post-shift are color-coded.\n\nedges = [pd.to_datetime(\"10-15-2015\")] + shift_list + \\\n    [pd.to_datetime(\"11-15-2015\")]\nfig, ax = plt.subplots()\nfor (st, ed) in zip(edges[:-1], edges[1:]):\n    ax.plot(df.loc[st:ed, \"value\"])\nplt.xticks(rotation=45)\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We filter the time series by the detected changepoints, taking the longest\ncontinuous segment free of data shifts, using\n:py:func:`pvanalytics.quality.data_shifts.get_longest_shift_segment_dates`.\nThe trimmed time series is then plotted.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "start_date, end_date = ds.get_longest_shift_segment_dates(df['value'])\ndf['value'][start_date:end_date].plot()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}