PVAnalytics¶
PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data. It can be used as a standalone analysis package and as a data cleaning “front end” for other PV analysis packages.
PVAnalytics is free and open source under a permissive license. The source code for PVAnalytics is hosted on github.
Library Overview¶
The functions provided by PVAnalytics are organized in submodules based on their anticipated use. The list below provides a general overview; however, not all modules have functions at this time, see the API reference for current library status.
quality
contains submodules for different kinds of data quality checks.quality.data_shifts
contains quality checks for detecting and isolating data shifts in PV time series data.quality.irradiance
contains quality checks for irradiance measurements.quality.weather
contains quality checks for weather data (e.g. tests for physically plausible values of temperature, wind speed, humidity).quality.outliers
contains functions for identifying outliers.quality.gaps
contains functions for identifying gaps in the data (i.e. missing values, stuck values, and interpolation).quality.time
quality checks related to time (e.g. timestamp spacing, time shifts).quality.util
general purpose quality functions (e.g. simple range checks).
features
contains submodules with different methods for identifying and labeling salient features.features.clipping
functions for labeling inverter clipping.features.clearsky
functions for identifying periods of clear sky conditions.features.daytime
functions for identifying periods of day and night.features.orientation
functions for identifying orientation-related features in the data (e.g. days where the data looks like there is a functioning tracker). These functions are distinct from the functions in thesystem
module in that we are identifying features of data rather than properties of the system that produced the data.features.shading
functions for identifying shadows.
system
identification of PV system characteristics from data (e.g. nameplate power, tilt, azimuth)metrics
contains functions for computing PV system-level metrics (e.g. performance ratio)
Dependencies¶
This project follows the guidelines laid out in NEP-29. It supports:
All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.
All minor versions of numpy released in the 24 months prior to the project, and at minimum the last three minor versions
The latest release of pvlib.
Additionally, PVAnalytics relies on several other packages in the open source scientific python ecosystem. For details on dependencies and versions, see our setup.py.
Contents¶
API Reference¶
Quality¶
Data Shifts¶
Functions for identifying shifts in data values in time series
and for identifying periods with data shifts. For functions
that identify shifts in time, see quality.time
Detect data shifts in a time series of daily values. |
|
Return the start and end dates of the longest serially complete time series segment. |
Irradiance¶
The check_*_limits_qcrad
functions use the QCRad algorithm 1 to
identify irradiance measurements that are beyond physical limits.
Test for physical limits on GHI using the QCRad criteria. |
|
Test for physical limits on DHI using the QCRad criteria. |
|
Test for physical limits on DNI using the QCRad criteria. |
All three checks can be combined into a single function call.
Test for physical limits on GHI, DHI or DNI using the QCRad criteria. |
Irradiance measurements can also be checked for consistency.
Check consistency of GHI, DHI and DNI using QCRad criteria. |
GHI and POA irradiance can be validated against clearsky values to eliminate data that is unrealistically high.
|
Identify irradiance values which do not exceed clearsky values. |
You may want to identify entire days that have unrealistically high or low insolation. The following function examines daily insolation, validating that it is within a reasonable range of the expected clearsky insolation for the same day.
Check that daily insolation lies between minimum and maximum values. |
Gaps¶
Identify gaps in the data.
|
Identify sequences which appear to be linear. |
Data sometimes contains sequences of values that are “stale” or “stuck.” These are contiguous spans of data where the value does not change within the precision given. The functions below can be used to detect stale values.
Note
If the data has been altered in some way (i.e. temperature that has been rounded to an integer value) before being passed to these functions you may see unexpectedly large amounts of stale data.
|
Identify stale values in the data. |
|
Identify stale values by rounding. |
The following functions identify days with incomplete data.
|
Calculate a data completeness score for each day. |
|
Select data points that are part of days with complete data. |
Many data sets may have leading and trailing periods of days with sporadic or no data. The following functions can be used to remove those periods.
|
Get the start and end of data excluding leading and trailing gaps. |
|
Mask the beginning and end of the data if not all True. |
|
Trim the series based on the completeness score. |
Outliers¶
Functions for detecting outliers.
|
Identify outliers based on the interquartile range. |
|
Identify outliers using the z-score. |
|
Identify outliers by the Hampel identifier. |
Time¶
Quality control related to time. This includes things like time-stamp spacing, time-shifts, and time zone validation.
|
Check that the spacing between times conforms to freq. |
Timestamp shifts, such as daylight savings, can be identified with the following functions.
|
Identify time shifts using the ruptures library. |
|
Return True if events appears to have daylight savings shifts at the dates on which tz transitions to or from daylight savings time. |
Utilities¶
The quality.util
module contains general-purpose/utility
functions for building your own quality checks.
|
Check whether a value falls withing the given limits. |
|
Return True for data on days when the day's minimum exceeds minimum. |
Weather¶
Quality checks for weather data.
Identify relative humidity values that are within limits. |
|
|
Identify temperature values that are within limits. |
|
Identify wind speed values that are within limits. |
In addition to validating temperature by comparing with limits, module
temperature should be positively correlated with irradiance. Poor
correlation could indicate that the sensor has become detached from
the module, for example. Unlike other functions in the
quality
module which return Boolean masks over the input
series, this function returns a single Boolean value indicating
whether the entire series has passed (True
) or failed (False
)
the quality check.
Test whether the module temperature is correlated with irradiance. |
References
- 1
C. N. Long and Y. Shi, An Automated Quality Assessment and Control Algorithm for Surface Radiation Measurements, The Open Atmospheric Science Journal 2, pp. 23-37, 2008.
Features¶
Functions for detecting features in the data.
Clipping¶
Functions for identifying inverter clipping
|
Label clipping in AC power data based on levels in the data. |
|
Detect clipping based on a maximum power threshold. |
|
Identify clipping based on a the shape of the ac_power curve on each day. |
Clearsky¶
|
Identify times when GHI is consistent with clearsky conditions. |
Orientation¶
System orientation refers to mounting type (fixed or tracker) and the azimuth and tilt of the mounting. A system’s orientation can be determined by examining power or POA irradiance on days that are relatively sunny.
This module provides functions that operate on power or POA irradiance to identify system orientation on a daily basis. These functions can tell you whether a day’s profile matches that of a fixed system or system with a single-axis tracker.
Care should be taken when interpreting function output since other factors such as malfunctioning trackers can interfere with identification.
|
Flag days that match the profile of a fixed PV system on a sunny day. |
|
Flag days that match the profile of a single-axis tracking PV system on a sunny day. |
Daytime¶
Functions that return a Boolean mask indicating day and night.
Return True for values that are during the day. |
Shading¶
Functions for labeling shadows.
|
Detects shadows from fixed structures such as wires and poles. |
System¶
This module contains functions and classes relating to PV system parameters such as nameplate power, tilt, azimuth, or whether the system is equipped with tracker.
Tracking¶
|
Enum describing the orientation of a PV System. |
|
Infer whether the system is equipped with a tracker. |
Orientation¶
The following function can be used to infer system orientation from power or plane of array irradiance measurements.
Determine system azimuth and tilt from power or POA using solar azimuth at the daily peak. |
|
|
Get the tilt and azimuth that give PVWatts output that most closely fits the data in power_ac. |
Metrics¶
Performance Ratio¶
The following functions can be used to calculate system performance metrics.
|
Calculate NREL Performance Ratio. |
Variability¶
Functions to calculate variability statistics.
|
Calculate the variability index. |
Example Gallery¶
This gallery shows examples of pvanalytics functionality. Community contributions are welcome!
Note
Click here to download the full example code
Z-Score Outlier Detection¶
Identifying outliers in time series using z-score outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.zscore()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import zscore
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.zscore()
to identify
outliers in the time series, and plot the data with the z-score outlier mask.
zscore_outlier_mask = zscore(data=data['value_normalized'])
data['value_normalized'].plot()
data.loc[zscore_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.254 seconds)
Note
Click here to download the full example code
Tukey Outlier Detection¶
Identifying outliers in time series using Tukey outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.tukey()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import tukey
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.tukey()
to identify
outliers in the time series, and plot the data with the tukey outlier mask.
tukey_outlier_mask = tukey(data=data['value_normalized'],
k=0.5)
data['value_normalized'].plot()
data.loc[tukey_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.211 seconds)
Note
Click here to download the full example code
Hampel Outlier Detection¶
Identifying outliers in time series using Hampel outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.hampel()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import hampel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.hampel()
to identify
outliers in the time series, and plot the data with the hampel outlier mask.
hampel_outlier_mask = hampel(data=data['value_normalized'],
window=10)
data['value_normalized'].plot()
data.loc[hampel_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.329 seconds)
Note
Click here to download the full example code
Flag Sunny Days for a Fixed-Tilt System¶
Flag sunny days for a fixed-tilt PV system.
Identifying and masking sunny days for a fixed-tilt system is important when performing future analyses that require filtered sunny day data. For this example we will use data from the fixed-tilt NREL SERF East system located on the NREL campus in Colorado, USA, and generate a sunny day mask. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), as system ID 50. This data is timezone-localized.
import pvanalytics
from pvanalytics.features import daytime as day
from pvanalytics.features.orientation import fixed_nrel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the NREL SERF East fixed-tilt system. This data set contains 15-minute interval AC power data.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'serf_east_15min_ac_power.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Mask day-night periods using the
pvanalytics.features.daytime.power_or_irradiance()
function.
Then apply pvanalytics.features.orientation.fixed_nrel()
to the AC power stream and mask the sunny days in the time series.
daytime_mask = day.power_or_irradiance(data['ac_power'])
fixed_sunny_days = fixed_nrel(data['ac_power'],
daytime_mask)
Plot the AC power stream with the sunny day mask applied to it.
data['ac_power'].plot()
data.loc[fixed_sunny_days, 'ac_power'].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Sunny Day"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.272 seconds)
Note
Click here to download the full example code
Flag Sunny Days for a Tracking System¶
Flag sunny days for a single-axis tracking PV system.
Identifying and masking sunny days for a single-axis tracking system is important when performing future analyses that require filtered sunny day data. For this example we will use data from the single-axis tracking NREL Mesa system located on the NREL campus in Colorado, USA, and generate a sunny day mask. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), as system ID 50. This data is timezone-localized.
import pvanalytics
from pvanalytics.features import daytime as day
from pvanalytics.features.orientation import tracking_nrel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the NREL Mesa 1-axis tracking system. This data set contains 15-minute interval AC power data.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'nrel_1axis_tracker_mesa_ac_power.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Mask day-night periods using the
pvanalytics.features.daytime.power_or_irradiance()
function.
Then apply pvanalytics.features.orientation.tracking_nrel()
to the AC power stream and mask the sunny days in the time series.
daytime_mask = day.power_or_irradiance(data['ac_power'])
tracking_sunny_days = tracking_nrel(data['ac_power'],
daytime_mask)
Plot the AC power stream with the sunny day mask applied to it.
data['ac_power'].plot()
data.loc[tracking_sunny_days, 'ac_power'].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Sunny Day"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.926 seconds)
Note
Click here to download the full example code
Clear-Sky Detection¶
Identifying periods of clear-sky conditions using measured irradiance.
Identifying and filtering for clear-sky conditions is a useful way to
reduce noise when analyzing measured data. This example shows how to
use pvanalytics.features.clearsky.reno()
to identify clear-sky
conditions using measured GHI data. For this example we’ll use
GHI measurements from NREL in Golden CO.
import pvanalytics
from pvanalytics.features.clearsky import reno
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in the GHI measurements. For this example we’ll use an example file included in pvanalytics covering a single day, but the same process applies to data of any length.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ghi_file = pvanalytics_dir / 'data' / 'midc_bms_ghi_20220120.csv'
data = pd.read_csv(ghi_file, index_col=0, parse_dates=True)
# or you can fetch the data straight from the source using pvlib:
# date = pd.to_datetime('2022-01-20')
# data = pvlib.iotools.read_midc_raw_data_from_nrel('BMS', date, date)
measured_ghi = data['Global CMP22 (vent/cor) [W/m^2]']
Now model clear-sky irradiance for the location and times of the measured data:
location = pvlib.location.Location(39.742, -105.18)
clearsky = location.get_clearsky(data.index)
clearsky_ghi = clearsky['ghi']
Finally, use pvanalytics.features.clearsky.reno()
to identify
measurements during clear-sky conditions:
is_clearsky = reno(measured_ghi, clearsky_ghi)
# clear-sky times indicated in black
measured_ghi.plot()
measured_ghi[is_clearsky].plot(ls='', marker='o', ms=2, c='k')
plt.ylabel('Global Horizontal Irradiance [W/m2]')
plt.show()

Total running time of the script: ( 0 minutes 0.242 seconds)
Note
Click here to download the full example code
Interpolated Data Periods¶
Identifying periods in a time series where the data has been linearly interpolated.
Identifying periods where time series data has been linearly interpolated
and removing these periods may help to reduce noise when performing future
data analysis. This example shows how to use
pvanalytics.quality.gaps.interpolation_diff()
, which identifies and
masks linearly interpolated periods.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for interpolated periods. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data. This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. There is also an “interpolated_data_mask” column, where interpolated periods are labeled as True, and all other data is labeled as False. The data is sampled at 15-minute intervals.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173_interpolated_data.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15T")
data['value_normalized'].plot()
data.loc[data["interpolated_data_mask"], "value_normalized"].plot(ls='',
marker='.')
plt.legend(labels=["AC Power", "Interpolated Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.interpolation_diff()
to
identify linearly interpolated periods in the time series. We re-plot
the data with this mask. Please note that nighttime periods generally consist
of repeating 0 values; this means that these periods can be linearly
interpolated. Consequently, these periods are flagged by
pvanalytics.quality.gaps.interpolation_diff()
.
detected_interpolated_data_mask = gaps.interpolation_diff(
data['value_normalized'])
data['value_normalized'].plot()
data.loc[detected_interpolated_data_mask,
"value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Interpolated Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.820 seconds)
Note
Click here to download the full example code
Clearsky Limits for Daily Insolation¶
Checking the clearsky limits for daily insolation data.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use pvanalytics.quality.irradiance.daily_insolation_limits()
to determine when the daily insolation lies between a minimum
and a maximum value. Irradiance measurements and clear-sky
irradiance on each day are integrated with the trapezoid rule
to calculate daily insolation. For this example we will use data
from the RMIS weather system located on the NREL campus
in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import daily_insolation_limits
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the measured data:
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.daily_insolation_limits()
to identify if the daily insolation lies between a minimum
and a maximum value. Here, we check GHI irradiance field
‘irradiance_ghi__7981’.
pvanalytics.quality.irradiance.daily_insolation_limits()
returns a mask that identifies data that falls between
lower and upper limits. The defaults (used here)
are upper bound of 125% of clear-sky daily insolation,
and lower bound of 40% of clear-sky daily insolation.
daily_insolation_mask = daily_insolation_limits(data['irradiance_ghi__7981'],
clearsky['ghi'])
Plot the ‘irradiance_ghi__7981’ data stream and its associated clearsky GHI data stream. Mask the GHI time series by its daily_insolation_mask.
data['irradiance_ghi__7981'].plot()
clearsky['ghi'].plot()
data.loc[daily_insolation_mask, 'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Clearsky GHI",
"Within Daily Insolation Limit"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.258 seconds)
Note
Click here to download the full example code
Data Shift Detection & Filtering¶
Identifying data shifts/capacity changes in time series data
This example covers identifying data shifts/capacity changes in a time series
and extracting the longest time series segment free of these shifts, using
pvanalytics.quality.data_shifts.detect_data_shifts()
and
pvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
.
import pvanalytics
import pandas as pd
import matplotlib.pyplot as plt
from pvanalytics.quality import data_shifts as ds
import pathlib
As an example, we load in a simulated pvlib AC power time series with a single changepoint, occurring on October 28, 2015.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
data_shift_file = pvanalytics_dir / 'data' / 'pvlib_data_shift.csv'
df = pd.read_csv(data_shift_file)
df.index = pd.to_datetime(df['timestamp'])
df['value'].plot()
print("Changepoint at: " + str(df[df['label'] == 1].index[0]))

Changepoint at: 2015-10-28 00:00:00
Now we run the data shift algorithm (with default parameters)
on the data stream, using
pvanalytics.quality.data_shifts.detect_data_shifts()
. We plot the
predicted time series segments, based on algorithm results.
shift_mask = ds.detect_data_shifts(df['value'])
shift_list = list(df[shift_mask].index)
edges = [df.index[0]] + shift_list + [df.index[-1]]
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(df.loc[st:ed, "value"])
plt.show()
# We zoom in around the changepoint to more closely show the data shift. Time
# series segments pre- and post-shift are color-coded.
edges = [pd.to_datetime("10-15-2015")] + shift_list + \
[pd.to_datetime("11-15-2015")]
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(df.loc[st:ed, "value"])
plt.xticks(rotation=45)
plt.show()
We filter the time series by the detected changepoints, taking the longest
continuous segment free of data shifts, using
pvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
.
The trimmed time series is then plotted.
start_date, end_date = ds.get_longest_shift_segment_dates(df['value'])
df['value'][start_date:end_date].plot()
plt.show()

Total running time of the script: ( 0 minutes 2.322 seconds)
Note
Click here to download the full example code
Clearsky Limits for Irradiance Data¶
Checking the clearsky limits of irradiance data.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use pvanalytics.quality.irradiance.clearsky_limits()
to identify irradiance values that do not exceed
a limit based on a clear-sky model. For this example we will
use GHI data from the RMIS weather system located on the NREL campus in CO.
import pvanalytics
from pvanalytics.quality.irradiance import clearsky_limits
from pvanalytics.features.daytime import power_or_irradiance
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned POA, GHI, DNI, DHI, and GNI measurements, but only the GHI is relevant here.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
freq = '5T'
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the
measured data. You can do this using
pvlib.location.Location.get_clearsky()
, using the lat-long
coordinates associated the RMIS NREL system.
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.clearsky_limits()
.
Here, we check GHI data in field ‘irradiance_ghi__7981’.
pvanalytics.quality.irradiance.clearsky_limits()
returns a mask that identifies data that falls between
lower and upper limits. The defaults (used here)
are upper bound of 110% of clear-sky GHI, and
no lower bound.
clearsky_limit_mask = clearsky_limits(data['irradiance_ghi__7981'],
clearsky['ghi'])
Mask nighttime values in the GHI time series using the
pvanalytics.features.daytime.power_or_irradiance()
function.
We will then remove nighttime values from the GHI time series.
day_night_mask = power_or_irradiance(series=data['irradiance_ghi__7981'],
freq=freq)
Plot the ‘irradiance_ghi__7981’ data stream and its associated clearsky GHI data stream. Mask the GHI time series by its clearsky_limit_mask for daytime periods. Please note that a simple Ineichen model with static monthly turbidities isn’t always accurate, as in this case. Other models that may provide better clear-sky estimates include McClear or PSM3.
data['irradiance_ghi__7981'].plot()
clearsky['ghi'].plot()
data.loc[clearsky_limit_mask & day_night_mask][
'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Clearsky GHI",
"Under Clearsky Limit"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.422 seconds)
Note
Click here to download the full example code
Stale Data Periods¶
Identifying stale data periods in a time series.
Identifing and removing stale, or consecutive repeating, values in time
series data reduces noise when performing data analysis. This example shows
how to use two PVAnalytics functions,
pvanalytics.quality.gaps.stale_values_diff()
and pvanalytics.quality.gaps.stale_values_round()
, to identify
and mask stale data periods in time series data.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for stale data periods. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. Additionally, there is a “stale_data_mask” column, where stale periods are labeled as True, and all other data is labeled as False. The data is sampled at 15-minute intervals.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173_stale_data.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15T")
data['value_normalized'].plot()
data.loc[data["stale_data_mask"], "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Inserted Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.stale_values_diff()
to
identify stale values in data. We visualize the detected stale periods
graphically. Please note that nighttime periods generally contain consecutive
repeating 0 values, which are flagged by
pvanalytics.quality.gaps.stale_values_diff()
.
stale_data_mask = gaps.stale_values_diff(data['value_normalized'])
data['value_normalized'].plot()
data.loc[stale_data_mask, "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.stale_values_round()
to
identify stale values in data, using rounded data. This function yields
similar results as pvanalytics.quality.gaps.stale_values_diff()
,
except it looks for consecutive repeating data that has been rounded to
a settable decimals place.
Please note that nighttime periods generally
contain consecutive repeating 0 values, which are flagged by
pvanalytics.quality.gaps.stale_values_round()
.
stale_data_round_mask = gaps.stale_values_round(data['value_normalized'])
data['value_normalized'].plot()
data.loc[stale_data_round_mask, "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 1.612 seconds)
Note
Click here to download the full example code
Clipping Detection¶
Identifying clipping periods using the PVAnalytics clipping module.
Identifying and removing clipping periods from AC power time series
data aids in generating more accurate degradation analysis results,
as using clipped data can lead to under-predicting degradation. In this
example, we show how to use
pvanalytics.features.clipping.geometric()
to mask clipping periods in an AC power time series. We use a
normalized time series example provided by the PV Fleets Initiative,
where clipping periods are labeled as True, and non-clipping periods are
labeled as False. This example is adapted from the DuraMAT DataHub
clipping data set:
https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
import pvanalytics
from pvanalytics.features.clipping import geometric
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import numpy as np
First, read in the ac_power_inv_7539 example, and visualize a subset of the clipping periods via the “label” mask column.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
data['label'] = data['label'].astype(bool)
# This is the known frequency of the time series. You may need to infer
# the frequency or set the frequency with your AC power time series.
freq = "15T"
data['value_normalized'].plot()
data.loc[data['label'], 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Labeled Clipping"],
title="Clipped")
plt.xticks(rotation=20)
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, use pvanalytics.features.clipping.geometric()
to identify
clipping periods in the time series. Re-plot the data subset with this mask.
predicted_clipping_mask = geometric(ac_power=data['value_normalized'],
freq=freq)
data['value_normalized'].plot()
data.loc[predicted_clipping_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Clipping"],
title="Clipped")
plt.xticks(rotation=20)
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Compare the filter results to the ground-truth labeled data side-by-side, and generate an accuracy metric.
acc = 100 * np.sum(np.equal(data.label,
predicted_clipping_mask))/len(data.label)
print("Overall model prediction accuracy: " + str(round(acc, 2)) + "%")
Overall model prediction accuracy: 99.2%
Total running time of the script: ( 0 minutes 0.436 seconds)
Note
Click here to download the full example code
QCrad Limits for Irradiance Data¶
Test for physical limits on GHI, DHI or DNI using the QCRad criteria.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use
pvanalytics.quality.irradiance.check_irradiance_limits_qcrad()
to test for physical limits on GHI, DHI or DNI using the QCRad criteria.
For this example we will use data from the RMIS weather system located
on the NREL campus in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import check_irradiance_limits_qcrad
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location,
based on the data’s time zone and site latitude-longitude
coordinates. This is done using the
pvlib.solarposition.get_solarposition()
function.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Generate the estimated extraterrestrial radiation for the time series,
referred to as dni_extra. This is done using the
pvlib.irradiance.get_extra_radiation()
function.
dni_extra = pvlib.irradiance.get_extra_radiation(data.index)
Use pvanalytics.quality.irradiance.check_irradiance_limits_qcrad()
to generate the QCRAD irradiance limit mask
qcrad_limit_mask = check_irradiance_limits_qcrad(
solar_zenith=solar_position['zenith'],
dni_extra=dni_extra,
ghi=data['irradiance_ghi__7981'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'])
Plot the ‘irradiance_ghi__7981’ data stream with its associated QCRAD limit mask.
data['irradiance_ghi__7981'].plot()
data.loc[qcrad_limit_mask[0], 'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Plot the ‘irradiance_dhi__7983 data stream with its associated QCRAD limit mask.
data['irradiance_dhi__7983'].plot()
data.loc[qcrad_limit_mask[1], 'irradiance_dhi__7983'].plot(ls='', marker='.')
plt.legend(labels=["RMIS DHI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DHI (W/m^2)")
plt.tight_layout()
plt.show()

Plot the ‘irradiance_dni__7982’ data stream with its associated QCRAD limit mask.
data['irradiance_dni__7982'].plot()
data.loc[qcrad_limit_mask[2], 'irradiance_dni__7982'].plot(ls='', marker='.')
plt.legend(labels=["RMIS DNI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DNI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.598 seconds)
Note
Click here to download the full example code
Missing Data Periods¶
Identifying days with missing data using a “completeness” score metric.
Identifying days with missing data and filtering these days out reduces noise
when performing data analysis. This example shows how to use a
daily data “completeness” score to identify and filter out days with missing
data. This includes using
pvanalytics.quality.gaps.completeness_score()
,
pvanalytics.quality.gaps.complete()
, and
pvanalytics.quality.gaps.trim_incomplete()
.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for completeness. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data. This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. The data is sampled at 15-minute intervals. This data set does contain NaN values.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15T")
Now, we use pvanalytics.quality.gaps.completeness_score()
to get the
percentage of daily data that isn’t NaN. This percentage score is calculated
as the total number of non-NA values over a 24-hour period, meaning that
nighttime values are expected.
data_completeness_score = gaps.completeness_score(data['value_normalized'])
# Visualize data completeness score as a time series.
data_completeness_score.plot()
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.tight_layout()
plt.show()

We mask complete days, based on daily completeness score, using
pvanalytics.quality.gaps.complete()
.
min_completeness = 0.333
daily_completeness_mask = gaps.complete(data['value_normalized'],
minimum_completeness=min_completeness)
# Mask complete days, based on daily completeness score
data_completeness_score.plot()
data_completeness_score.loc[daily_completeness_mask].plot(ls='', marker='.')
data_completeness_score.loc[~daily_completeness_mask].plot(ls='', marker='.')
plt.axhline(y=min_completeness, color='r', linestyle='--')
plt.legend(labels=["Completeness Score", "Threshold met",
"Threshold not met", "Completeness Threshold (.33)"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.tight_layout()
plt.show()

We trim the time series based on the completeness score, where the time
series must have at least 10 consecutive days of data that meet the
completeness threshold. This is done using
pvanalytics.quality.gaps.trim_incomplete()
.
number_consecutive_days = 10
completeness_trim_mask = gaps.trim_incomplete(data['value_normalized'],
days=number_consecutive_days)
# Re-visualize the time series with the data masked by the trim mask
data[completeness_trim_mask]['value_normalized'].plot()
data[~completeness_trim_mask]['value_normalized'].plot()
plt.legend(labels=[True, False],
title="Daily Data Passing")
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

/home/docs/checkouts/readthedocs.org/user_builds/pvanalytics/checkouts/v0.1.2/pvanalytics/quality/gaps.py:416: FutureWarning: Indexing a timezone-aware DatetimeIndex with a timezone-naive datetime is deprecated and will raise KeyError in a future version. Use a timezone-aware object instead.
mask.loc[start.date():end.date()] = True
Total running time of the script: ( 0 minutes 1.162 seconds)
Note
Click here to download the full example code
QCrad Consistency for Irradiance Data¶
Check consistency of GHI, DHI and DNI using QCRad criteria.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use
pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad()
to check the consistency of GHI, DHI and DNI data using QCRad criteria.
For this example we will use data from the RMIS weather system located
on the NREL campus in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import check_irradiance_consistency_qcrad
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location, based on the data’s time zone and site latitude-longitude coordinates.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Use
pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad()
to generate the QCRAD consistency mask.
qcrad_consistency_mask = check_irradiance_consistency_qcrad(
solar_zenith=solar_position['zenith'],
ghi=data['irradiance_ghi__7981'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'])
Plot the GHI, DHI, and DNI data streams with the QCRAD consistency mask overlay. This mask applies to all 3 data streams.
fig = data[['irradiance_ghi__7981', 'irradiance_dhi__7983',
'irradiance_dni__7982']].plot()
# Highlight periods where the QCRAD consistency mask is True
fig.fill_between(data.index, fig.get_ylim()[0], fig.get_ylim()[1],
where=qcrad_consistency_mask[0], alpha=0.4)
fig.legend(labels=["RMIS GHI", "RMIS DHI", "RMIS DNI", "QCRAD Consistent"],
loc="upper center")
plt.xlabel("Date")
plt.ylabel("Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Plot the GHI, DHI, and DNI data streams with the diffuse ratio limit mask overlay. This mask is true when the DHI / GHI ratio passes the limit test.
fig = data[['irradiance_ghi__7981', 'irradiance_dhi__7983',
'irradiance_dni__7982']].plot()
# Highlight periods where the GHI ratio passes the limit test
fig.fill_between(data.index, fig.get_ylim()[0], fig.get_ylim()[1],
where=qcrad_consistency_mask[1], alpha=0.4)
fig.legend(labels=["RMIS GHI", "RMIS DHI", "RMIS DNI",
"Within Diffuse Ratio Limit"],
loc="upper center")
plt.xlabel("Date")
plt.ylabel("Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.628 seconds)
Note
Click here to download the full example code
Day-Night Masking¶
Masking day-night periods using the PVAnalytics daytime module.
Identifying and masking day-night periods in an AC power time series or
irradiance time series can aid in future data analysis, such as detecting
if a time series has daylight savings time or time shifts. Here, we use
pvanalytics.features.daytime.power_or_irradiance()
to mask day/night
periods, as well as to estimate sunrise and sunset times in the data set.
This function is particularly useful for cases where the time zone of a data
stream is unknown or incorrect, as its outputs can be used to determine time
zone.
import pvanalytics
from pvanalytics.features.daytime import power_or_irradiance
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import pvlib
import numpy as np
First, read in the 1-minute sampled AC power time series data, taken from the SERF East installation on the NREL campus. This sample is provided from the NREL PVDAQ database, and contains a column representing an AC power data stream.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'serf_east_1min_ac_power.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
data = data.sort_index()
# This is the known frequency of the time series. You may need to infer
# the frequency or set the frequency with your AC power time series.
freq = "1T"
# These are the latitude-longitude coordinates associated with the
# SERF East system.
latitude = 39.742
longitude = -105.173
# Plot the time series.
data['ac_power__752'].plot()
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

It is critical to set all negative values in the AC power time series to 0
for pvanalytics.features.daytime.power_or_irradiance()
to work
properly. Negative erroneous data may affect daytime mask assignments.
data.loc[data['ac_power__752'] < 0, 'ac_power__752'] = 0
Now, use pvanalytics.features.daytime.power_or_irradiance()
to mask day periods in the time series.
predicted_day_night_mask = power_or_irradiance(series=data['ac_power__752'],
freq=freq)
Function pvlib.solarposition.sun_rise_set_transit_spa()
is
used to get ground-truth sunrise and sunset times for each day at the site
location, and a SPA-daytime mask is calculated based on these times. Data
associated with SPA daytime periods is labeled as True, and data associated
with SPA nighttime periods is labeled as False.
SPA sunrise and sunset times are used here as a point of comparison to the
pvanalytics.features.daytime.power_or_irradiance()
outputs.
SPA-based sunrise and sunset values are not
needed to run pvanalytics.features.daytime.power_or_irradiance()
.
sunrise_sunset_df = pvlib.solarposition.sun_rise_set_transit_spa(data.index,
latitude,
longitude)
data['sunrise_time'] = sunrise_sunset_df['sunrise']
data['sunset_time'] = sunrise_sunset_df['sunset']
data['daytime_mask'] = True
data.loc[(data.index < data.sunrise_time) |
(data.index > data.sunset_time), "daytime_mask"] = False
Plot the AC power data stream with the mask output from
pvanalytics.features.daytime.power_or_irradiance()
,
as well as the SPA-calculated sunrise and sunset
data['ac_power__752'].plot()
data.loc[predicted_day_night_mask, 'ac_power__752'].plot(ls='', marker='o')
data.loc[~predicted_day_night_mask, 'ac_power__752'].plot(ls='', marker='o')
sunrise_sunset_times = sunrise_sunset_df[['sunrise',
'sunset']].drop_duplicates()
for sunrise, sunset in sunrise_sunset_times.itertuples(index=False):
plt.axvline(x=sunrise, c="blue")
plt.axvline(x=sunset, c="red")
plt.legend(labels=["AC Power", "Daytime", "Nighttime",
"SPA Sunrise", "SPA Sunset"])
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Compare the predicted mask to the ground-truth SPA mask, to get the model accuracy. Also, compare sunrise and sunset times for the predicted mask compared to the ground truth sunrise and sunset times.
acc = 100 * np.sum(np.equal(data.daytime_mask,
predicted_day_night_mask))/len(data.daytime_mask)
print("Overall model prediction accuracy: " + str(round(acc, 2)) + "%")
# Generate predicted + SPA sunrise times for each day
print("Sunrise Comparison:")
print(pd.DataFrame({'predicted_sunrise': predicted_day_night_mask
.index[predicted_day_night_mask]
.to_series().resample("d").first(),
'pvlib_spa_sunrise': sunrise_sunset_df["sunrise"]
.resample("d").first()}))
# Generate predicted + SPA sunset times for each day
print("Sunset Comparison:")
print(pd.DataFrame({'predicted_sunset': predicted_day_night_mask
.index[predicted_day_night_mask]
.to_series().resample("d").last(),
'pvlib_spa_sunset': sunrise_sunset_df["sunrise"]
.resample("d").last()}))
Overall model prediction accuracy: 98.39%
Sunrise Comparison:
predicted_sunrise pvlib_spa_sunrise
measured_on
2022-03-18 00:00:00-07:00 2022-03-18 06:11:00-07:00 2022-03-18 06:07:09.226592-07:00
2022-03-19 00:00:00-07:00 2022-03-19 06:14:00-07:00 2022-03-19 06:05:32.867153920-07:00
Sunset Comparison:
predicted_sunset pvlib_spa_sunset
measured_on
2022-03-18 00:00:00-07:00 2022-03-18 17:56:00-07:00 2022-03-18 06:07:09.226592-07:00
2022-03-19 00:00:00-07:00 2022-03-19 17:52:00-07:00 2022-03-19 06:05:32.867153920-07:00
Total running time of the script: ( 0 minutes 1.199 seconds)
Release Notes¶
These are the bug-fixes, new features, and improvements for each release.
0.1.2 (August 18, 2022)¶
Enhancements¶
Detect data shifts in daily summed time series with
pvanalytics.quality.data_shifts.detect_data_shifts()
andpvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
. (GH142)
Bug Fixes¶
Fix
pvanalytics.quality.outliers.zscore()
so that the NaN mask is assigned the time series index (GH138)
Documentation¶
Added fifteen new gallery example pages:
pvanalytics.quality.data_shifts
(GH131):
Other¶
Removed empty modules
pvanalytics.filtering
andpvanalytics.fitting
until the relevant functionality is added to the package. (GH145)
Contributors¶
Kirsten Perry (@kperrynrel)
Cliff Hansen (@cwhanse)
Kevin Anderson (@kanderso-nrel)
Will Vining (@wfvining)
0.1.1 (February 18, 2022)¶
Enhancements¶
Quantification of irradiance variability with
pvanalytics.metrics.variability_index()
. (GH60, GH106)Internal refactor of
pvanalytics.metrics.performance_ratio_nrel()
to support other performance ratio formulas. (GH109)Detect shadows from fixed objects in GHI data using
pvanalytics.features.shading.fixed()
. (GH24, GH101)
Bug Fixes¶
Added
nan_policy
parameter to zscore calculation inpvanalytics.quality.outliers.zscore()
. (GH102, GH108)Prohibit pandas versions in the 1.1.x series to avoid an issue in
.groupby().rolling()
. Newer versions starting in 1.2.0 and older versions going back to 0.24.0 are still allowed. (GH82, GH118)Fixed an issue with
pvanalytics.features.clearsky.reno()
in recent pandas versions (GH125, GH128)Improved convergence in
pvanalytics.features.orientation.fixed_nrel()
(GH119, GH120)
Requirements¶
Drop support for python 3.6, which reached end of life Dec 2021 (GH129)
Documentation¶
Started an example gallery and added an example for
pvanalytics.features.clearsky.reno()
(GH125, GH127)
Contributors¶
Kevin Anderson (@kanderso-nrel)
Cliff Hansen (@cwhanse)
Will Vining (@wfvining)
Kirsten Perry (@kperrynrel)
Michael Hopwood (@MichaelHopwood)
Carlos Silva (@camsilva)
Ben Taylor (@bt-)
0.1.0 (November 20, 2020)¶
This is the first release of PVAnalytics. As such, the list of “changes” below is not specific. Future releases will describe specific changes here along with references to the relevant github issue and pull requests.
API Changes¶
Enhancements¶
Quality control functions for irradiance, weather and time series data. See
pvanalytics.quality
for content.Feature labeling functions for clipping, clearsky, daytime, and orientation. See
pvanalytics.features
for content.System parameter inference for tilt, azimuth, and whether the system is tracking or fixed. See
pvanalytics.system
for content.NREL performance ratio metric (
pvanalytics.metrics.performance_ratio_nrel()
).
Bug Fixes¶
Contributors¶
Special thanks to Matt Muller and Kirsten Perry of NREL for their assistance in adapting components from the PVFleets QA project to PVAnalytics.