{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Stale Data Periods\n\nIdentifying stale data periods in a time series.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Identifing and removing stale, or consecutive repeating, values in time\nseries data reduces noise when performing data analysis. This example shows\nhow to use two PVAnalytics functions,\n:py:func:`pvanalytics.quality.gaps.stale_values_diff`\nand :py:func:`pvanalytics.quality.gaps.stale_values_round`, to identify\nand mask stale data periods in time series data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pvanalytics\nfrom pvanalytics.quality import gaps\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport pathlib"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "First, we import the AC power data stream that we are going to check for\nstale data periods. The time series we download is a normalized AC power time\nseries from the PV Fleets Initiative, and is available via the DuraMAT\nDataHub:\nhttps://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data\nThis data set has a Pandas DateTime index, with the min-max normalized\nAC power time series represented in the 'value_normalized' column.\nAdditionally, there is a \"stale_data_mask\" column, where stale periods are\nlabeled as True, and all other data is labeled as False. The data\nis sampled at 15-minute intervals.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent\nfile = pvanalytics_dir / 'data' / 'ac_power_inv_2173_stale_data.csv'\ndata = pd.read_csv(file, index_col=0, parse_dates=True)\ndata = data.asfreq(\"15T\")\ndata['value_normalized'].plot()\ndata.loc[data[\"stale_data_mask\"], \"value_normalized\"].plot(ls='', marker='.')\nplt.legend(labels=[\"AC Power\", \"Inserted Stale Data\"])\nplt.xlabel(\"Date\")\nplt.ylabel(\"Normalized AC Power\")\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we use :py:func:`pvanalytics.quality.gaps.stale_values_diff` to\nidentify stale values in data. We visualize the detected stale periods\ngraphically. Please note that nighttime periods generally contain consecutive\nrepeating 0 values, which are flagged by\n:py:func:`pvanalytics.quality.gaps.stale_values_diff`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "stale_data_mask = gaps.stale_values_diff(data['value_normalized'])\ndata['value_normalized'].plot()\ndata.loc[stale_data_mask, \"value_normalized\"].plot(ls='', marker='.')\nplt.legend(labels=[\"AC Power\", \"Detected Stale Data\"])\nplt.xlabel(\"Date\")\nplt.ylabel(\"Normalized AC Power\")\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we use :py:func:`pvanalytics.quality.gaps.stale_values_round` to\nidentify stale values in data, using rounded data. This function yields\nsimilar results as :py:func:`pvanalytics.quality.gaps.stale_values_diff`,\nexcept it looks for consecutive repeating data that has been rounded to\na settable decimals place.\nPlease note that nighttime periods generally\ncontain consecutive repeating 0 values, which are flagged by\n:py:func:`pvanalytics.quality.gaps.stale_values_round`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "stale_data_round_mask = gaps.stale_values_round(data['value_normalized'])\ndata['value_normalized'].plot()\ndata.loc[stale_data_round_mask, \"value_normalized\"].plot(ls='', marker='.')\nplt.legend(labels=[\"AC Power\", \"Detected Stale Data\"])\nplt.xlabel(\"Date\")\nplt.ylabel(\"Normalized AC Power\")\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}