.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated/gallery/outliers/zscore-outlier-detection.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_gallery_outliers_zscore-outlier-detection.py: Z-Score Outlier Detection ========================= Identifying outliers in time series using z-score outlier detection. .. GENERATED FROM PYTHON SOURCE LINES 10-15 Identifying and removing outliers from PV sensor time series data allows for more accurate data analysis. In this example, we demonstrate how to use :py:func:`pvanalytics.quality.outliers.zscore` to identify and filter out outliers in a time series. .. GENERATED FROM PYTHON SOURCE LINES 15-22 .. code-block:: default import pvanalytics from pvanalytics.quality.outliers import zscore import matplotlib.pyplot as plt import pandas as pd import pathlib .. GENERATED FROM PYTHON SOURCE LINES 23-32 First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the "value_normalized" column. There is a boolean column "outlier" where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data .. GENERATED FROM PYTHON SOURCE LINES 32-37 .. code-block:: default pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent ac_power_file = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv' data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True) print(data.head(10)) .. rst-class:: sphx-glr-script-out .. code-block:: none value_normalized outlier timestamp 2017-04-10 19:15:00+00:00 0.000002 False 2017-04-10 19:30:00+00:00 0.000000 False 2017-04-11 06:15:00+00:00 0.000000 False 2017-04-11 06:45:00+00:00 0.033103 False 2017-04-11 07:00:00+00:00 0.043992 False 2017-04-11 07:15:00+00:00 0.055615 False 2017-04-11 07:30:00+00:00 0.110986 False 2017-04-11 07:45:00+00:00 0.184948 False 2017-04-11 08:00:00+00:00 0.276810 False 2017-04-11 08:15:00+00:00 0.358061 False .. GENERATED FROM PYTHON SOURCE LINES 38-40 We then use :py:func:`pvanalytics.quality.outliers.zscore` to identify outliers in the time series, and plot the data with the z-score outlier mask. .. GENERATED FROM PYTHON SOURCE LINES 40-48 .. code-block:: default zscore_outlier_mask = zscore(data=data['value_normalized']) data['value_normalized'].plot() data.loc[zscore_outlier_mask, 'value_normalized'].plot(ls='', marker='o') plt.legend(labels=["AC Power", "Detected Outlier"]) plt.xlabel("Date") plt.ylabel("Normalized AC Power") plt.tight_layout() plt.show() .. image-sg:: /generated/gallery/outliers/images/sphx_glr_zscore-outlier-detection_001.png :alt: zscore outlier detection :srcset: /generated/gallery/outliers/images/sphx_glr_zscore-outlier-detection_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.213 seconds) .. _sphx_glr_download_generated_gallery_outliers_zscore-outlier-detection.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: zscore-outlier-detection.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: zscore-outlier-detection.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_