.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated/gallery/stale-data.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_gallery_stale-data.py: Stale Data Periods ================== Identifying stale data periods in a time series. .. GENERATED FROM PYTHON SOURCE LINES 9-15 Identifing and removing stale, or consecutive repeating, values in time series data reduces noise when performing data analysis. This example shows how to use two PVAnalytics functions, :py:func:`pvanalytics.quality.gaps.stale_values_diff` and :py:func:`pvanalytics.quality.gaps.stale_values_round`, to identify and mask stale data periods in time series data. .. GENERATED FROM PYTHON SOURCE LINES 15-22 .. code-block:: default import pvanalytics from pvanalytics.quality import gaps import matplotlib.pyplot as plt import pandas as pd import pathlib .. GENERATED FROM PYTHON SOURCE LINES 23-33 First, we import the AC power data stream that we are going to check for stale data periods. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the 'value_normalized' column. Additionally, there is a "stale_data_mask" column, where stale periods are labeled as True, and all other data is labeled as False. The data is sampled at 15-minute intervals. .. GENERATED FROM PYTHON SOURCE LINES 33-46 .. code-block:: default pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent file = pvanalytics_dir / 'data' / 'ac_power_inv_2173_stale_data.csv' data = pd.read_csv(file, index_col=0, parse_dates=True) data = data.asfreq("15T") data['value_normalized'].plot() data.loc[data["stale_data_mask"], "value_normalized"].plot(ls='', marker='.') plt.legend(labels=["AC Power", "Inserted Stale Data"]) plt.xlabel("Date") plt.ylabel("Normalized AC Power") plt.tight_layout() plt.show() .. image-sg:: /generated/gallery/images/sphx_glr_stale-data_001.png :alt: stale data :srcset: /generated/gallery/images/sphx_glr_stale-data_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 47-52 Now, we use :py:func:`pvanalytics.quality.gaps.stale_values_diff` to identify stale values in data. We visualize the detected stale periods graphically. Please note that nighttime periods generally contain consecutive repeating 0 values, which are flagged by :py:func:`pvanalytics.quality.gaps.stale_values_diff`. .. GENERATED FROM PYTHON SOURCE LINES 52-62 .. code-block:: default stale_data_mask = gaps.stale_values_diff(data['value_normalized']) data['value_normalized'].plot() data.loc[stale_data_mask, "value_normalized"].plot(ls='', marker='.') plt.legend(labels=["AC Power", "Detected Stale Data"]) plt.xlabel("Date") plt.ylabel("Normalized AC Power") plt.tight_layout() plt.show() .. image-sg:: /generated/gallery/images/sphx_glr_stale-data_002.png :alt: stale data :srcset: /generated/gallery/images/sphx_glr_stale-data_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 63-71 Now, we use :py:func:`pvanalytics.quality.gaps.stale_values_round` to identify stale values in data, using rounded data. This function yields similar results as :py:func:`pvanalytics.quality.gaps.stale_values_diff`, except it looks for consecutive repeating data that has been rounded to a settable decimals place. Please note that nighttime periods generally contain consecutive repeating 0 values, which are flagged by :py:func:`pvanalytics.quality.gaps.stale_values_round`. .. GENERATED FROM PYTHON SOURCE LINES 71-80 .. code-block:: default stale_data_round_mask = gaps.stale_values_round(data['value_normalized']) data['value_normalized'].plot() data.loc[stale_data_round_mask, "value_normalized"].plot(ls='', marker='.') plt.legend(labels=["AC Power", "Detected Stale Data"]) plt.xlabel("Date") plt.ylabel("Normalized AC Power") plt.tight_layout() plt.show() .. image-sg:: /generated/gallery/images/sphx_glr_stale-data_003.png :alt: stale data :srcset: /generated/gallery/images/sphx_glr_stale-data_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.612 seconds) .. _sphx_glr_download_generated_gallery_stale-data.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: stale-data.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: stale-data.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_