{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Interpolated Data Periods\n\nIdentifying periods in a time series where the data has been\nlinearly interpolated.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Identifying periods where time series data has been linearly interpolated\nand removing these periods may help to reduce noise when performing future\ndata analysis. This example shows how to use\n:py:func:`pvanalytics.quality.gaps.interpolation_diff`, which identifies and\nmasks linearly interpolated periods.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pvanalytics\nfrom pvanalytics.quality import gaps\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport pathlib"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "First, we import the AC power data stream that we are going to check for\ninterpolated periods. The time series we download is a normalized AC power\ntime series from the PV Fleets Initiative, and is available via the DuraMAT\nDataHub:\nhttps://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data.\nThis data set has a Pandas DateTime index, with the min-max normalized\nAC power time series represented in the 'value_normalized' column. There is\nalso an \"interpolated_data_mask\" column, where\ninterpolated periods are labeled as True, and all other data is labeled\nas False. The data is sampled at 15-minute intervals.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent\nfile = pvanalytics_dir / 'data' / 'ac_power_inv_2173_interpolated_data.csv'\ndata = pd.read_csv(file, index_col=0, parse_dates=True)\ndata = data.asfreq(\"15T\")\ndata['value_normalized'].plot()\ndata.loc[data[\"interpolated_data_mask\"], \"value_normalized\"].plot(ls='',\n                                                                  marker='.')\nplt.legend(labels=[\"AC Power\", \"Interpolated Data\"])\nplt.xlabel(\"Date\")\nplt.ylabel(\"Normalized AC Power\")\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we use :py:func:`pvanalytics.quality.gaps.interpolation_diff` to\nidentify linearly interpolated periods in the time series. We re-plot\nthe data with this mask. Please note that nighttime periods generally consist\nof repeating 0 values; this means that these periods can be linearly\ninterpolated. Consequently, these periods are flagged by\n:py:func:`pvanalytics.quality.gaps.interpolation_diff`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "detected_interpolated_data_mask = gaps.interpolation_diff(\n    data['value_normalized'])\ndata['value_normalized'].plot()\ndata.loc[detected_interpolated_data_mask,\n         \"value_normalized\"].plot(ls='', marker='.')\nplt.legend(labels=[\"AC Power\", \"Detected Interpolated Data\"])\nplt.xlabel(\"Date\")\nplt.ylabel(\"Normalized AC Power\")\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}