PVAnalytics¶
PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data. It can be used as a standalone analysis package and as a data cleaning “front end” for other PV analysis packages.
PVAnalytics is free and open source under a permissive license. The source code for PVAnalytics is hosted on github.
Library Overview¶
The functions provided by PVAnalytics are organized in submodules based on their anticipated use. The list below provides a general overview; however, not all modules have functions at this time, see the API reference for current library status.
quality
contains submodules for different kinds of data quality checks.quality.data_shifts
contains quality checks for detecting and isolating data shifts in PV time series data.quality.irradiance
contains quality checks for irradiance measurements.quality.weather
contains quality checks for weather data (e.g. tests for physically plausible values of temperature, wind speed, humidity).quality.outliers
contains functions for identifying outliers.quality.gaps
contains functions for identifying gaps in the data (i.e. missing values, stuck values, and interpolation).quality.time
quality checks related to time (e.g. timestamp spacing, time shifts).quality.util
general purpose quality functions (e.g. simple range checks).
features
contains submodules with different methods for identifying and labeling salient features.features.clipping
functions for labeling inverter clipping.features.clearsky
functions for identifying periods of clear sky conditions.features.daytime
functions for identifying periods of day and night.features.orientation
functions for identifying orientation-related features in the data (e.g. days where the data looks like there is a functioning tracker). These functions are distinct from the functions in thesystem
module in that we are identifying features of data rather than properties of the system that produced the data.features.shading
functions for identifying shadows.
system
identification of PV system characteristics from data (e.g. nameplate power, tilt, azimuth)metrics
contains functions for computing PV system-level metrics (e.g. performance ratio)
Dependencies¶
This project follows the guidelines laid out in NEP-29. It supports:
All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.
All minor versions of numpy released in the 24 months prior to the project, and at minimum the last three minor versions
The latest release of pvlib.
Additionally, PVAnalytics relies on several other packages in the open source scientific python ecosystem. For details on dependencies and versions, see our setup.py.
Contents¶
API Reference¶
Quality¶
Data Shifts¶
Functions for identifying shifts in data values in time series
and for identifying periods with data shifts. For functions
that identify shifts in time, see quality.time
Detect data shifts in a time series of daily values. |
|
Return the start and end dates of the longest serially complete time series segment. |
Irradiance¶
The check_*_limits_qcrad
functions use the QCRad algorithm 1 to
identify irradiance measurements that are beyond physical limits.
Test for physical limits on GHI using the QCRad criteria. |
|
Test for physical limits on DHI using the QCRad criteria. |
|
Test for physical limits on DNI using the QCRad criteria. |
All three checks can be combined into a single function call.
Test for physical limits on GHI, DHI or DNI using the QCRad criteria. |
Irradiance measurements can also be checked for consistency.
Check consistency of GHI, DHI and DNI using QCRad criteria. |
GHI and POA irradiance can be validated against clearsky values to eliminate data that is unrealistically high.
|
Identify irradiance values which do not exceed clearsky values. |
You may want to identify entire days that have unrealistically high or low insolation. The following function examines daily insolation, validating that it is within a reasonable range of the expected clearsky insolation for the same day.
Check that daily insolation lies between minimum and maximum values. |
There is function for calculating the component sum for GHI, DHI, and DNI, and correcting for nighttime periods. Using this function, we can estimate one irradiance field using the two other irradiance fields. This can be useful for comparison, as well as to calculate missing data fields.
Use the component sum equations to calculate the missing series, using the other available time series. |
Gaps¶
Identify gaps in the data.
|
Identify sequences which appear to be linear. |
Data sometimes contains sequences of values that are “stale” or “stuck.” These are contiguous spans of data where the value does not change within the precision given. The functions below can be used to detect stale values.
Note
If the data has been altered in some way (i.e. temperature that has been rounded to an integer value) before being passed to these functions you may see unexpectedly large amounts of stale data.
|
Identify stale values in the data. |
|
Identify stale values by rounding. |
The following functions identify days with incomplete data.
|
Calculate a data completeness score for each day. |
|
Select data points that are part of days with complete data. |
Many data sets may have leading and trailing periods of days with sporadic or no data. The following functions can be used to remove those periods.
|
Get the start and end of data excluding leading and trailing gaps. |
|
Mask the beginning and end of the data if not all True. |
|
Trim the series based on the completeness score. |
Outliers¶
Functions for detecting outliers.
|
Identify outliers based on the interquartile range. |
|
Identify outliers using the z-score. |
|
Identify outliers by the Hampel identifier. |
Time¶
Quality control related to time. This includes things like time-stamp spacing, time-shifts, and time zone validation.
|
Check that the spacing between times conforms to freq. |
Timestamp shifts, such as daylight savings, can be identified with the following functions.
|
Identify time shifts using the ruptures library. |
|
Return True if events appears to have daylight savings shifts at the dates on which tz transitions to or from daylight savings time. |
Utilities¶
The quality.util
module contains general-purpose/utility
functions for building your own quality checks.
|
Check whether a value falls withing the given limits. |
|
Return True for data on days when the day's minimum exceeds minimum. |
Weather¶
Quality checks for weather data.
Identify relative humidity values that are within limits. |
|
|
Identify temperature values that are within limits. |
|
Identify wind speed values that are within limits. |
In addition to validating temperature by comparing with limits, module
temperature should be positively correlated with irradiance. Poor
correlation could indicate that the sensor has become detached from
the module, for example. Unlike other functions in the
quality
module which return Boolean masks over the input
series, this function returns a single Boolean value indicating
whether the entire series has passed (True
) or failed (False
)
the quality check.
Test whether the module temperature is correlated with irradiance. |
References
- 1
C. N. Long and Y. Shi, An Automated Quality Assessment and Control Algorithm for Surface Radiation Measurements, The Open Atmospheric Science Journal 2, pp. 23-37, 2008.
Features¶
Functions for detecting features in the data.
Clipping¶
Functions for identifying inverter clipping
|
Label clipping in AC power data based on levels in the data. |
|
Detect clipping based on a maximum power threshold. |
|
Identify clipping based on a the shape of the ac_power curve on each day. |
Clearsky¶
|
Identify times when GHI is consistent with clearsky conditions. |
Orientation¶
System orientation refers to mounting type (fixed or tracker) and the azimuth and tilt of the mounting. A system’s orientation can be determined by examining power or POA irradiance on days that are relatively sunny.
This module provides functions that operate on power or POA irradiance to identify system orientation on a daily basis. These functions can tell you whether a day’s profile matches that of a fixed system or system with a single-axis tracker.
Care should be taken when interpreting function output since other factors such as malfunctioning trackers can interfere with identification.
|
Flag days that match the profile of a fixed PV system on a sunny day. |
|
Flag days that match the profile of a single-axis tracking PV system on a sunny day. |
Daytime¶
Functions that return a Boolean mask indicating day and night.
Return True for values that are during the day. |
Shading¶
Functions for labeling shadows.
|
Detects shadows from fixed structures such as wires and poles. |
System¶
This module contains functions and classes relating to PV system parameters such as nameplate power, tilt, azimuth, or whether the system is equipped with tracker.
Tracking¶
|
Enum describing the orientation of a PV System. |
|
Infer whether the system is equipped with a tracker. |
Orientation¶
The following function can be used to infer system orientation from power or plane of array irradiance measurements.
Determine system azimuth and tilt from power or POA using solar azimuth at the daily peak. |
|
|
Get the tilt and azimuth that give PVWatts output that most closely fits the data in power_ac. |
Metrics¶
Performance Ratio¶
The following functions can be used to calculate system performance metrics.
|
Calculate NREL Performance Ratio. |
Variability¶
Functions to calculate variability statistics.
|
Calculate the variability index. |
Example Gallery¶
This gallery shows examples of pvanalytics functionality. Community contributions are welcome!
Clearsky Detection¶
This includes examples for identifying clearsky periods in time series data.
Note
Click here to download the full example code
Clear-Sky Detection¶
Identifying periods of clear-sky conditions using measured irradiance.
Identifying and filtering for clear-sky conditions is a useful way to
reduce noise when analyzing measured data. This example shows how to
use pvanalytics.features.clearsky.reno()
to identify clear-sky
conditions using measured GHI data. For this example we’ll use
GHI measurements from NREL in Golden CO.
import pvanalytics
from pvanalytics.features.clearsky import reno
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in the GHI measurements. For this example we’ll use an example file included in pvanalytics covering a single day, but the same process applies to data of any length.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ghi_file = pvanalytics_dir / 'data' / 'midc_bms_ghi_20220120.csv'
data = pd.read_csv(ghi_file, index_col=0, parse_dates=True)
# or you can fetch the data straight from the source using pvlib:
# date = pd.to_datetime('2022-01-20')
# data = pvlib.iotools.read_midc_raw_data_from_nrel('BMS', date, date)
measured_ghi = data['Global CMP22 (vent/cor) [W/m^2]']
Now model clear-sky irradiance for the location and times of the measured data:
location = pvlib.location.Location(39.742, -105.18)
clearsky = location.get_clearsky(data.index)
clearsky_ghi = clearsky['ghi']
Finally, use pvanalytics.features.clearsky.reno()
to identify
measurements during clear-sky conditions:
is_clearsky = reno(measured_ghi, clearsky_ghi)
# clear-sky times indicated in black
measured_ghi.plot()
measured_ghi[is_clearsky].plot(ls='', marker='o', ms=2, c='k')
plt.ylabel('Global Horizontal Irradiance [W/m2]')
plt.show()

Total running time of the script: ( 0 minutes 0.279 seconds)
Clipping¶
This includes examples for identifying clipping in AC power time series.
Note
Click here to download the full example code
Clipping Detection¶
Identifying clipping periods using the PVAnalytics clipping module.
Identifying and removing clipping periods from AC power time series
data aids in generating more accurate degradation analysis results,
as using clipped data can lead to under-predicting degradation. In this
example, we show how to use
pvanalytics.features.clipping.geometric()
to mask clipping periods in an AC power time series. We use a
normalized time series example provided by the PV Fleets Initiative,
where clipping periods are labeled as True, and non-clipping periods are
labeled as False. This example is adapted from the DuraMAT DataHub
clipping data set:
https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
import pvanalytics
from pvanalytics.features.clipping import geometric
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import numpy as np
First, read in the ac_power_inv_7539 example, and visualize a subset of the clipping periods via the “label” mask column.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
data['label'] = data['label'].astype(bool)
# This is the known frequency of the time series. You may need to infer
# the frequency or set the frequency with your AC power time series.
freq = "15T"
data['value_normalized'].plot()
data.loc[data['label'], 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Labeled Clipping"],
title="Clipped")
plt.xticks(rotation=20)
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, use pvanalytics.features.clipping.geometric()
to identify
clipping periods in the time series. Re-plot the data subset with this mask.
predicted_clipping_mask = geometric(ac_power=data['value_normalized'],
freq=freq)
data['value_normalized'].plot()
data.loc[predicted_clipping_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Clipping"],
title="Clipped")
plt.xticks(rotation=20)
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Compare the filter results to the ground-truth labeled data side-by-side, and generate an accuracy metric.
acc = 100 * np.sum(np.equal(data.label,
predicted_clipping_mask))/len(data.label)
print("Overall model prediction accuracy: " + str(round(acc, 2)) + "%")
Overall model prediction accuracy: 99.2%
Total running time of the script: ( 0 minutes 0.436 seconds)
Data Shifts¶
This includes examples for identifying data/capacity shifts in time series data.
Note
Click here to download the full example code
Data Shift Detection & Filtering¶
Identifying data shifts/capacity changes in time series data
This example covers identifying data shifts/capacity changes in a time series
and extracting the longest time series segment free of these shifts, using
pvanalytics.quality.data_shifts.detect_data_shifts()
and
pvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
.
import pvanalytics
import pandas as pd
import matplotlib.pyplot as plt
from pvanalytics.quality import data_shifts as ds
import pathlib
As an example, we load in a simulated pvlib AC power time series with a single changepoint, occurring on October 28, 2015.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
data_shift_file = pvanalytics_dir / 'data' / 'pvlib_data_shift.csv'
df = pd.read_csv(data_shift_file)
df.index = pd.to_datetime(df['timestamp'])
df['value'].plot()
print("Changepoint at: " + str(df[df['label'] == 1].index[0]))

Changepoint at: 2015-10-28 00:00:00
Now we run the data shift algorithm (with default parameters)
on the data stream, using
pvanalytics.quality.data_shifts.detect_data_shifts()
. We plot the
predicted time series segments, based on algorithm results.
shift_mask = ds.detect_data_shifts(df['value'])
shift_list = list(df[shift_mask].index)
edges = [df.index[0]] + shift_list + [df.index[-1]]
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(df.loc[st:ed, "value"])
plt.show()
# We zoom in around the changepoint to more closely show the data shift. Time
# series segments pre- and post-shift are color-coded.
edges = [pd.to_datetime("10-15-2015")] + shift_list + \
[pd.to_datetime("11-15-2015")]
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(df.loc[st:ed, "value"])
plt.xticks(rotation=45)
plt.show()
We filter the time series by the detected changepoints, taking the longest
continuous segment free of data shifts, using
pvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
.
The trimmed time series is then plotted.
start_date, end_date = ds.get_longest_shift_segment_dates(df['value'])
df['value'][start_date:end_date].plot()
plt.show()

Total running time of the script: ( 0 minutes 1.071 seconds)
Day-Night Masking¶
This includes examples for identifying day-night periods in time series data.
Note
Click here to download the full example code
Day-Night Masking¶
Masking day-night periods using the PVAnalytics daytime module.
Identifying and masking day-night periods in an AC power time series or
irradiance time series can aid in future data analysis, such as detecting
if a time series has daylight savings time or time shifts. Here, we use
pvanalytics.features.daytime.power_or_irradiance()
to mask day/night
periods, as well as to estimate sunrise and sunset times in the data set.
This function is particularly useful for cases where the time zone of a data
stream is unknown or incorrect, as its outputs can be used to determine time
zone.
import pvanalytics
from pvanalytics.features.daytime import power_or_irradiance
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import pvlib
import numpy as np
First, read in the 1-minute sampled AC power time series data, taken from the SERF East installation on the NREL campus. This sample is provided from the NREL PVDAQ database, and contains a column representing an AC power data stream.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'serf_east_1min_ac_power.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
data = data.sort_index()
# This is the known frequency of the time series. You may need to infer
# the frequency or set the frequency with your AC power time series.
freq = "1T"
# These are the latitude-longitude coordinates associated with the
# SERF East system.
latitude = 39.742
longitude = -105.173
# Plot the time series.
data['ac_power__752'].plot()
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

It is critical to set all negative values in the AC power time series to 0
for pvanalytics.features.daytime.power_or_irradiance()
to work
properly. Negative erroneous data may affect daytime mask assignments.
data.loc[data['ac_power__752'] < 0, 'ac_power__752'] = 0
Now, use pvanalytics.features.daytime.power_or_irradiance()
to mask day periods in the time series.
predicted_day_night_mask = power_or_irradiance(series=data['ac_power__752'],
freq=freq)
Function pvlib.solarposition.sun_rise_set_transit_spa()
is
used to get ground-truth sunrise and sunset times for each day at the site
location, and a SPA-daytime mask is calculated based on these times. Data
associated with SPA daytime periods is labeled as True, and data associated
with SPA nighttime periods is labeled as False.
SPA sunrise and sunset times are used here as a point of comparison to the
pvanalytics.features.daytime.power_or_irradiance()
outputs.
SPA-based sunrise and sunset values are not
needed to run pvanalytics.features.daytime.power_or_irradiance()
.
sunrise_sunset_df = pvlib.solarposition.sun_rise_set_transit_spa(data.index,
latitude,
longitude)
data['sunrise_time'] = sunrise_sunset_df['sunrise']
data['sunset_time'] = sunrise_sunset_df['sunset']
data['daytime_mask'] = True
data.loc[(data.index < data.sunrise_time) |
(data.index > data.sunset_time), "daytime_mask"] = False
Plot the AC power data stream with the mask output from
pvanalytics.features.daytime.power_or_irradiance()
,
as well as the SPA-calculated sunrise and sunset
data['ac_power__752'].plot()
data.loc[predicted_day_night_mask, 'ac_power__752'].plot(ls='', marker='o')
data.loc[~predicted_day_night_mask, 'ac_power__752'].plot(ls='', marker='o')
sunrise_sunset_times = sunrise_sunset_df[['sunrise',
'sunset']].drop_duplicates()
for sunrise, sunset in sunrise_sunset_times.itertuples(index=False):
plt.axvline(x=sunrise, c="blue")
plt.axvline(x=sunset, c="red")
plt.legend(labels=["AC Power", "Daytime", "Nighttime",
"SPA Sunrise", "SPA Sunset"])
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Compare the predicted mask to the ground-truth SPA mask, to get the model accuracy. Also, compare sunrise and sunset times for the predicted mask compared to the ground truth sunrise and sunset times.
acc = 100 * np.sum(np.equal(data.daytime_mask,
predicted_day_night_mask))/len(data.daytime_mask)
print("Overall model prediction accuracy: " + str(round(acc, 2)) + "%")
# Generate predicted + SPA sunrise times for each day
print("Sunrise Comparison:")
print(pd.DataFrame({'predicted_sunrise': predicted_day_night_mask
.index[predicted_day_night_mask]
.to_series().resample("d").first(),
'pvlib_spa_sunrise': sunrise_sunset_df["sunrise"]
.resample("d").first()}))
# Generate predicted + SPA sunset times for each day
print("Sunset Comparison:")
print(pd.DataFrame({'predicted_sunset': predicted_day_night_mask
.index[predicted_day_night_mask]
.to_series().resample("d").last(),
'pvlib_spa_sunset': sunrise_sunset_df["sunrise"]
.resample("d").last()}))
Overall model prediction accuracy: 98.39%
Sunrise Comparison:
predicted_sunrise pvlib_spa_sunrise
measured_on
2022-03-18 00:00:00-07:00 2022-03-18 06:11:00-07:00 2022-03-18 06:07:09.226592-07:00
2022-03-19 00:00:00-07:00 2022-03-19 06:14:00-07:00 2022-03-19 06:05:32.867153920-07:00
Sunset Comparison:
predicted_sunset pvlib_spa_sunset
measured_on
2022-03-18 00:00:00-07:00 2022-03-18 17:56:00-07:00 2022-03-18 06:07:09.226592-07:00
2022-03-19 00:00:00-07:00 2022-03-19 17:52:00-07:00 2022-03-19 06:05:32.867153920-07:00
Total running time of the script: ( 0 minutes 1.195 seconds)
Gaps¶
This includes examples for identifying gaps and other related issues in time series data, including interpolated periods and stale data periods.
Note
Click here to download the full example code
Interpolated Data Periods¶
Identifying periods in a time series where the data has been linearly interpolated.
Identifying periods where time series data has been linearly interpolated
and removing these periods may help to reduce noise when performing future
data analysis. This example shows how to use
pvanalytics.quality.gaps.interpolation_diff()
, which identifies and
masks linearly interpolated periods.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for interpolated periods. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data. This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. There is also an “interpolated_data_mask” column, where interpolated periods are labeled as True, and all other data is labeled as False. The data is sampled at 15-minute intervals.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173_interpolated_data.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15T")
data['value_normalized'].plot()
data.loc[data["interpolated_data_mask"], "value_normalized"].plot(ls='',
marker='.')
plt.legend(labels=["AC Power", "Interpolated Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.interpolation_diff()
to
identify linearly interpolated periods in the time series. We re-plot
the data with this mask. Please note that nighttime periods generally consist
of repeating 0 values; this means that these periods can be linearly
interpolated. Consequently, these periods are flagged by
pvanalytics.quality.gaps.interpolation_diff()
.
detected_interpolated_data_mask = gaps.interpolation_diff(
data['value_normalized'])
data['value_normalized'].plot()
data.loc[detected_interpolated_data_mask,
"value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Interpolated Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.797 seconds)
Note
Click here to download the full example code
Stale Data Periods¶
Identifying stale data periods in a time series.
Identifing and removing stale, or consecutive repeating, values in time
series data reduces noise when performing data analysis. This example shows
how to use two PVAnalytics functions,
pvanalytics.quality.gaps.stale_values_diff()
and pvanalytics.quality.gaps.stale_values_round()
, to identify
and mask stale data periods in time series data.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for stale data periods. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. Additionally, there is a “stale_data_mask” column, where stale periods are labeled as True, and all other data is labeled as False. The data is sampled at 15-minute intervals.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173_stale_data.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15T")
data['value_normalized'].plot()
data.loc[data["stale_data_mask"], "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Inserted Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.stale_values_diff()
to
identify stale values in data. We visualize the detected stale periods
graphically. Please note that nighttime periods generally contain consecutive
repeating 0 values, which are flagged by
pvanalytics.quality.gaps.stale_values_diff()
.
stale_data_mask = gaps.stale_values_diff(data['value_normalized'])
data['value_normalized'].plot()
data.loc[stale_data_mask, "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.stale_values_round()
to
identify stale values in data, using rounded data. This function yields
similar results as pvanalytics.quality.gaps.stale_values_diff()
,
except it looks for consecutive repeating data that has been rounded to
a settable decimals place.
Please note that nighttime periods generally
contain consecutive repeating 0 values, which are flagged by
pvanalytics.quality.gaps.stale_values_round()
.
stale_data_round_mask = gaps.stale_values_round(data['value_normalized'])
data['value_normalized'].plot()
data.loc[stale_data_round_mask, "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 1.077 seconds)
Note
Click here to download the full example code
Missing Data Periods¶
Identifying days with missing data using a “completeness” score metric.
Identifying days with missing data and filtering these days out reduces noise
when performing data analysis. This example shows how to use a
daily data “completeness” score to identify and filter out days with missing
data. This includes using
pvanalytics.quality.gaps.completeness_score()
,
pvanalytics.quality.gaps.complete()
, and
pvanalytics.quality.gaps.trim_incomplete()
.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for completeness. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data. This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. The data is sampled at 15-minute intervals. This data set does contain NaN values.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15T")
Now, we use pvanalytics.quality.gaps.completeness_score()
to get the
percentage of daily data that isn’t NaN. This percentage score is calculated
as the total number of non-NA values over a 24-hour period, meaning that
nighttime values are expected.
data_completeness_score = gaps.completeness_score(data['value_normalized'])
# Visualize data completeness score as a time series.
data_completeness_score.plot()
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.tight_layout()
plt.show()

We mask complete days, based on daily completeness score, using
pvanalytics.quality.gaps.complete()
.
min_completeness = 0.333
daily_completeness_mask = gaps.complete(data['value_normalized'],
minimum_completeness=min_completeness)
# Mask complete days, based on daily completeness score
data_completeness_score.plot()
data_completeness_score.loc[daily_completeness_mask].plot(ls='', marker='.')
data_completeness_score.loc[~daily_completeness_mask].plot(ls='', marker='.')
plt.axhline(y=min_completeness, color='r', linestyle='--')
plt.legend(labels=["Completeness Score", "Threshold met",
"Threshold not met", "Completeness Threshold (.33)"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.tight_layout()
plt.show()

We trim the time series based on the completeness score, where the time
series must have at least 10 consecutive days of data that meet the
completeness threshold. This is done using
pvanalytics.quality.gaps.trim_incomplete()
.
number_consecutive_days = 10
completeness_trim_mask = gaps.trim_incomplete(data['value_normalized'],
days=number_consecutive_days)
# Re-visualize the time series with the data masked by the trim mask
data[completeness_trim_mask]['value_normalized'].plot()
data[~completeness_trim_mask]['value_normalized'].plot()
plt.legend(labels=[True, False],
title="Daily Data Passing")
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 1.153 seconds)
Irradiance-Quality¶
This includes examples for running irradiance quality checks on irradiance time series data.
Note
Click here to download the full example code
Clearsky Limits for Daily Insolation¶
Checking the clearsky limits for daily insolation data.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use pvanalytics.quality.irradiance.daily_insolation_limits()
to determine when the daily insolation lies between a minimum
and a maximum value. Irradiance measurements and clear-sky
irradiance on each day are integrated with the trapezoid rule
to calculate daily insolation. For this example we will use data
from the RMIS weather system located on the NREL campus
in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import daily_insolation_limits
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the measured data:
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.daily_insolation_limits()
to identify if the daily insolation lies between a minimum
and a maximum value. Here, we check GHI irradiance field
‘irradiance_ghi__7981’.
pvanalytics.quality.irradiance.daily_insolation_limits()
returns a mask that identifies data that falls between
lower and upper limits. The defaults (used here)
are upper bound of 125% of clear-sky daily insolation,
and lower bound of 40% of clear-sky daily insolation.
daily_insolation_mask = daily_insolation_limits(data['irradiance_ghi__7981'],
clearsky['ghi'])
Plot the ‘irradiance_ghi__7981’ data stream and its associated clearsky GHI data stream. Mask the GHI time series by its daily_insolation_mask.
data['irradiance_ghi__7981'].plot()
clearsky['ghi'].plot()
data.loc[daily_insolation_mask, 'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Clearsky GHI",
"Within Daily Insolation Limit"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.385 seconds)
Note
Click here to download the full example code
Clearsky Limits for Irradiance Data¶
Checking the clearsky limits of irradiance data.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use pvanalytics.quality.irradiance.clearsky_limits()
to identify irradiance values that do not exceed
a limit based on a clear-sky model. For this example we will
use GHI data from the RMIS weather system located on the NREL campus in CO.
import pvanalytics
from pvanalytics.quality.irradiance import clearsky_limits
from pvanalytics.features.daytime import power_or_irradiance
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned POA, GHI, DNI, DHI, and GNI measurements, but only the GHI is relevant here.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
freq = '5T'
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the
measured data. You can do this using
pvlib.location.Location.get_clearsky()
, using the lat-long
coordinates associated the RMIS NREL system.
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.clearsky_limits()
.
Here, we check GHI data in field ‘irradiance_ghi__7981’.
pvanalytics.quality.irradiance.clearsky_limits()
returns a mask that identifies data that falls between
lower and upper limits. The defaults (used here)
are upper bound of 110% of clear-sky GHI, and
no lower bound.
clearsky_limit_mask = clearsky_limits(data['irradiance_ghi__7981'],
clearsky['ghi'])
Mask nighttime values in the GHI time series using the
pvanalytics.features.daytime.power_or_irradiance()
function.
We will then remove nighttime values from the GHI time series.
day_night_mask = power_or_irradiance(series=data['irradiance_ghi__7981'],
freq=freq)
Plot the ‘irradiance_ghi__7981’ data stream and its associated clearsky GHI data stream. Mask the GHI time series by its clearsky_limit_mask for daytime periods. Please note that a simple Ineichen model with static monthly turbidities isn’t always accurate, as in this case. Other models that may provide better clear-sky estimates include McClear or PSM3.
data['irradiance_ghi__7981'].plot()
clearsky['ghi'].plot()
data.loc[clearsky_limit_mask & day_night_mask][
'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Clearsky GHI",
"Under Clearsky Limit"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.548 seconds)
Note
Click here to download the full example code
QCrad Limits for Irradiance Data¶
Test for physical limits on GHI, DHI or DNI using the QCRad criteria.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use
pvanalytics.quality.irradiance.check_irradiance_limits_qcrad()
to test for physical limits on GHI, DHI or DNI using the QCRad criteria.
For this example we will use data from the RMIS weather system located
on the NREL campus in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import check_irradiance_limits_qcrad
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location,
based on the data’s time zone and site latitude-longitude
coordinates. This is done using the
pvlib.solarposition.get_solarposition()
function.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Generate the estimated extraterrestrial radiation for the time series,
referred to as dni_extra. This is done using the
pvlib.irradiance.get_extra_radiation()
function.
dni_extra = pvlib.irradiance.get_extra_radiation(data.index)
Use pvanalytics.quality.irradiance.check_irradiance_limits_qcrad()
to generate the QCRAD irradiance limit mask
qcrad_limit_mask = check_irradiance_limits_qcrad(
solar_zenith=solar_position['zenith'],
dni_extra=dni_extra,
ghi=data['irradiance_ghi__7981'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'])
Plot the ‘irradiance_ghi__7981’ data stream with its associated QCRAD limit mask.
data['irradiance_ghi__7981'].plot()
data.loc[qcrad_limit_mask[0], 'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Plot the ‘irradiance_dhi__7983 data stream with its associated QCRAD limit mask.
data['irradiance_dhi__7983'].plot()
data.loc[qcrad_limit_mask[1], 'irradiance_dhi__7983'].plot(ls='', marker='.')
plt.legend(labels=["RMIS DHI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DHI (W/m^2)")
plt.tight_layout()
plt.show()

Plot the ‘irradiance_dni__7982’ data stream with its associated QCRAD limit mask.
data['irradiance_dni__7982'].plot()
data.loc[qcrad_limit_mask[2], 'irradiance_dni__7982'].plot(ls='', marker='.')
plt.legend(labels=["RMIS DNI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DNI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.715 seconds)
Note
Click here to download the full example code
QCrad Consistency for Irradiance Data¶
Check consistency of GHI, DHI and DNI using QCRad criteria.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use
pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad()
to check the consistency of GHI, DHI and DNI data using QCRad criteria.
For this example we will use data from the RMIS weather system located
on the NREL campus in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import check_irradiance_consistency_qcrad
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location, based on the data’s time zone and site latitude-longitude coordinates.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Use
pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad()
to generate the QCRAD consistency mask.
qcrad_consistency_mask = check_irradiance_consistency_qcrad(
solar_zenith=solar_position['zenith'],
ghi=data['irradiance_ghi__7981'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'])
Plot the GHI, DHI, and DNI data streams with the QCRAD consistency mask overlay. This mask applies to all 3 data streams.
fig = data[['irradiance_ghi__7981', 'irradiance_dhi__7983',
'irradiance_dni__7982']].plot()
# Highlight periods where the QCRAD consistency mask is True
fig.fill_between(data.index, fig.get_ylim()[0], fig.get_ylim()[1],
where=qcrad_consistency_mask[0], alpha=0.4)
fig.legend(labels=["RMIS GHI", "RMIS DHI", "RMIS DNI", "QCRAD Consistent"],
loc="upper center")
plt.xlabel("Date")
plt.ylabel("Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Plot the GHI, DHI, and DNI data streams with the diffuse ratio limit mask overlay. This mask is true when the DHI / GHI ratio passes the limit test.
fig = data[['irradiance_ghi__7981', 'irradiance_dhi__7983',
'irradiance_dni__7982']].plot()
# Highlight periods where the GHI ratio passes the limit test
fig.fill_between(data.index, fig.get_ylim()[0], fig.get_ylim()[1],
where=qcrad_consistency_mask[1], alpha=0.4)
fig.legend(labels=["RMIS GHI", "RMIS DHI", "RMIS DNI",
"Within Diffuse Ratio Limit"],
loc="upper center")
plt.xlabel("Date")
plt.ylabel("Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.751 seconds)
Note
Click here to download the full example code
Component Sum Equations for Irradiance Data¶
Estimate GHI, DHI, and DNI using the component sum equations, with nighttime corrections.
Estimating GHI, DHI, and DNI using the component sum equations is useful if the associated field is missing, or as a comparison to an existing physical data stream.
import pvanalytics
from pvanalytics.quality.irradiance import calculate_component_sum_series
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location,
based on the data’s time zone and site latitude-longitude
coordinates. This is done using the
pvlib.solarposition.get_solarposition()
function.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Get the clearsky DNI values associated with the current location, using
the pvlib.solarposition.get_solarposition()
function. These clearsky
values are used to calculate DNI data.
site = pvlib.location.Location(latitude, longitude, tz=time_zone)
clearsky = site.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.calcuate_ghi_component()
to estimate GHI measurements using DHI and DNI measurements
component_sum_ghi = calculate_component_sum_series(
solar_zenith=solar_position['zenith'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'],
zenith_limit=90,
fill_night_value='equation')
Plot the ‘irradiance_ghi__7981’ data stream against the estimated component sum GHI, for comparison
data['irradiance_ghi__7981'].plot()
component_sum_ghi.plot()
plt.legend(labels=["RMIS GHI", "Component Sum GHI"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Use pvanalytics.quality.irradiance.calcuate_dhi_component()
to estimate DHI measurements using GHI and DNI measurements
component_sum_dhi = calculate_component_sum_series(
solar_zenith=solar_position['zenith'],
dni=data['irradiance_dni__7982'],
ghi=data['irradiance_ghi__7981'],
zenith_limit=90,
fill_night_value='equation')
Plot the ‘irradiance_dhi__7983’ data stream against the estimated component sum GHI, for comparison
data['irradiance_dhi__7983'].plot()
component_sum_dhi.plot()
plt.legend(labels=["RMIS DHI", "Component Sum DHI"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Use pvanalytics.quality.irradiance.calcuate_dni_component()
to estimate DNI measurements using GHI and DHI measurements
component_sum_dni = calculate_component_sum_series(
solar_zenith=solar_position['zenith'],
dhi=data['irradiance_dhi__7983'],
ghi=data['irradiance_ghi__7981'],
dni_clear=clearsky['dni'],
zenith_limit=90,
fill_night_value='equation')
Plot the ‘irradiance_dni__7982’ data stream against the estimated component sum GHI, for comparison
data['irradiance_dni__7982'].plot()
component_sum_dni.plot()
plt.legend(labels=["RMIS DNI", "Component Sum DNI"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DNI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.762 seconds)
Metrics¶
This includes examples for quantifying system time series metrics, including variability index (VI) and NREL performance ratio (PR).
Note
Click here to download the full example code
Calculate Variability Index¶
Calculate the Variability Index for a GHI time series.
Highly variable irradiance can cause mismatch between irradiance and power measurements and result in noisy performance metrics. As such, identifying and removing highly variable conditions is useful in certain analyses. Identification and quantification of highly variable conditions are also of interest in grid integration and hourly modeling error contexts. The variability index (VI) is one way of quantifying the variability or jaggedness of an irradiance signal relative to a corresponding reference clear-sky irradiance profile. Note that quantifying variability is related to but distinct from clear-sky detection. For example, both clear and overcast skies have low VI. This example uses GHI data collected from the NREL RMIS system to calculate the variability index as a time series.
import pvanalytics
from pvanalytics.metrics import variability_index
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import pvlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned POA, GHI, DNI, DHI, and GNI measurements, but only the GHI is relevant here.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the
measured data. You can do this using
pvlib.location.Location.get_clearsky()
, using the lat-long
coordinates associated the RMIS NREL system.
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Calculate the variability index for the system GHI data stream using
the pvanalytics.metrics.variability_index()
function, using
an hourly frequency.
variability_index_series = variability_index(data['irradiance_ghi__7981'],
clearsky['ghi'],
freq='1H')
Plot the calculated VI against the underlying GHI measurements, for the purpose of comparison.
fig, axes = plt.subplots(2, 1, sharex=True)
data['irradiance_ghi__7981'].plot(ax=axes[0], label='measured')
clearsky['ghi'].plot(ax=axes[0], label='clear-sky')
variability_index_series.plot(ax=axes[1], drawstyle='steps-post')
axes[0].legend()
axes[0].set_ylabel("GHI [W/m2]")
axes[1].set_ylabel("Variability Index")
fig.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.423 seconds)
Note
Click here to download the full example code
Calculate Performance Ratio (NREL)¶
Calculate the NREL Performance Ratio for a system.
When evaluating PV system performance it is often desirable to distinguish
uncontrollable effects like weather variation from controllable effects
like soiling and hardware issues. The NREL Performance Ratio
(or “Weather-Corrected Performance Ratio”) is a unitless metric that
normalizes system output for variation in irradiance and temperature,
making it insensitive to uncontrollable weather variation and more
reflective of system health. In this example, we
show how to calculate the NREL PR at two different frequencies: for a
complete time series, and at daily intervals. We use the
pvanalytics.metrics.performance_ratio_nrel()
function.
import pvanalytics
from pvanalytics.metrics import performance_ratio_nrel
import pandas as pd
import pathlib
import matplotlib.pyplot as plt
First, we read in data from the NREL RSF II system. This data set contains 15-minute interval data for AC power, POA irradiance, ambient temperature, and wind speed, among others. The complete data set for the NREL RSF II installation is available in the PVDAQ database, under system ID 1283.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'nrel_RSF_II.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Now we calculate the PR for the entire time series, using the
POA, ambient temperature, wind speed, and AC power fields. We use this
data as parameters in the
pvanalytics.metrics.performance_ratio_nrel()
function.
In this example we are calculating PR for a single inverter connected
to a 204.12 kW PV array.
pr_whole_series = performance_ratio_nrel(data['poa_irradiance__1055'],
data['ambient_temp__1053'],
data['wind_speed__1051'],
data['inv2_ac_power_w__1047']/1000,
204.12)
print("RSF II, PR for the whole time series:")
print(pr_whole_series)
RSF II, PR for the whole time series:
0.5851958594021633
Next, we recalculate the PR on a daily basis. We separate the time series into daily intervals, and calculate the PR for each day. Note that this inverter was offline for the last day in this dataset, resulting in a PR value of zero for that day.
dates = list(pd.Series(data.index.date).drop_duplicates())
daily_pr_list = list()
for date in dates:
data_subset = data[data.index.date == date]
# Run the PR calculation for the specific day.
pr = performance_ratio_nrel(data_subset['poa_irradiance__1055'],
data_subset['ambient_temp__1053'],
data_subset['wind_speed__1051'],
data_subset['inv2_ac_power_w__1047']/1000,
204.12)
daily_pr_list.append({"date": date,
"PR": pr})
daily_pr_df = pd.DataFrame(daily_pr_list)
# Plot the PR time series to visualize it
daily_pr_df.set_index('date').plot()
plt.axhline(pr_whole_series, color='r', ls='--', label='PR, Entire Series')
plt.xticks(rotation=25)
plt.legend()
plt.ylabel('NREL PR')
plt.xlabel('Date')
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.257 seconds)
Orientation¶
This includes examples related to the orientation of a system (fixed-tilt, tracking).
Note
Click here to download the full example code
Flag Sunny Days for a Fixed-Tilt System¶
Flag sunny days for a fixed-tilt PV system.
Identifying and masking sunny days for a fixed-tilt system is important when performing future analyses that require filtered sunny day data. For this example we will use data from the fixed-tilt NREL SERF East system located on the NREL campus in Colorado, USA, and generate a sunny day mask. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), as system ID 50. This data is timezone-localized.
import pvanalytics
from pvanalytics.features import daytime as day
from pvanalytics.features.orientation import fixed_nrel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the NREL SERF East fixed-tilt system. This data set contains 15-minute interval AC power data.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'serf_east_15min_ac_power.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Mask day-night periods using the
pvanalytics.features.daytime.power_or_irradiance()
function.
Then apply pvanalytics.features.orientation.fixed_nrel()
to the AC power stream and mask the sunny days in the time series.
daytime_mask = day.power_or_irradiance(data['ac_power'])
fixed_sunny_days = fixed_nrel(data['ac_power'],
daytime_mask)
Plot the AC power stream with the sunny day mask applied to it.
data['ac_power'].plot()
data.loc[fixed_sunny_days, 'ac_power'].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Sunny Day"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 1.369 seconds)
Note
Click here to download the full example code
Flag Sunny Days for a Tracking System¶
Flag sunny days for a single-axis tracking PV system.
Identifying and masking sunny days for a single-axis tracking system is important when performing future analyses that require filtered sunny day data. For this example we will use data from the single-axis tracking NREL Mesa system located on the NREL campus in Colorado, USA, and generate a sunny day mask. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), as system ID 50. This data is timezone-localized.
import pvanalytics
from pvanalytics.features import daytime as day
from pvanalytics.features.orientation import tracking_nrel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the NREL Mesa 1-axis tracking system. This data set contains 15-minute interval AC power data.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'nrel_1axis_tracker_mesa_ac_power.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Mask day-night periods using the
pvanalytics.features.daytime.power_or_irradiance()
function.
Then apply pvanalytics.features.orientation.tracking_nrel()
to the AC power stream and mask the sunny days in the time series.
daytime_mask = day.power_or_irradiance(data['ac_power'])
tracking_sunny_days = tracking_nrel(data['ac_power'],
daytime_mask)
Plot the AC power stream with the sunny day mask applied to it.
data['ac_power'].plot()
data.loc[tracking_sunny_days, 'ac_power'].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Sunny Day"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.900 seconds)
Outliers¶
This includes examples for identifying outliers in time series data.
Note
Click here to download the full example code
Z-Score Outlier Detection¶
Identifying outliers in time series using z-score outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.zscore()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import zscore
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.zscore()
to identify
outliers in the time series, and plot the data with the z-score outlier mask.
zscore_outlier_mask = zscore(data=data['value_normalized'])
data['value_normalized'].plot()
data.loc[zscore_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.213 seconds)
Note
Click here to download the full example code
Tukey Outlier Detection¶
Identifying outliers in time series using Tukey outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.tukey()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import tukey
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.tukey()
to identify
outliers in the time series, and plot the data with the tukey outlier mask.
tukey_outlier_mask = tukey(data=data['value_normalized'],
k=0.5)
data['value_normalized'].plot()
data.loc[tukey_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.205 seconds)
Note
Click here to download the full example code
Hampel Outlier Detection¶
Identifying outliers in time series using Hampel outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.hampel()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import hampel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.hampel()
to identify
outliers in the time series, and plot the data with the hampel outlier mask.
hampel_outlier_mask = hampel(data=data['value_normalized'],
window=10)
data['value_normalized'].plot()
data.loc[hampel_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.320 seconds)
System¶
This includes examples for system parameter estimation, including azimuth and tilt estimation, and determination if the system is fixed tilt or tracking.
Note
Click here to download the full example code
Detect if a System is Tracking¶
Identifying if a system is tracking or fixed tilt
It is valuable to identify if a system is fixed tilt or tracking for
future analysis. This example shows how to use
pvanalytics.system.is_tracking_envelope()
to determine if a
system is tracking or not by fitting data to a maximum power or
irradiance envelope, and fitting this envelope to quadratic and
quartic curves. The r^2 output from these fits is used to determine
if the system fits a tracking or fixed-tilt profile.
import pvanalytics
from pvanalytics.system import is_tracking_envelope
from pvanalytics.features.clipping import geometric
from pvanalytics.features.daytime import power_or_irradiance
import pandas as pd
import pathlib
import matplotlib.pyplot as plt
First, we import an AC power data stream from the SERF East site located at NREL. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), under system ID 50. This data is timezone-localized. This particular data stream is associated with a fixed-tilt system.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / \
'serf_east_15min_ac_power.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
data = data.sort_index()
time_series = data['ac_power']
time_series = time_series.asfreq('15T')
# Plot the first few days of the time series to visualize it
time_series[:pd.to_datetime("2016-07-06 00:00:00-07:00")].plot()
plt.show()

Run the clipping and the daytime filters on the time series.
Both of these masks will be used as inputs to the
pvanalytics.system.is_tracking_envelope()
function.
# Generate the daylight mask for the AC power time series
daytime_mask = power_or_irradiance(time_series)
# Generate the clipping mask for the time series
clipping_mask = geometric(time_series)
Now, we use pvanalytics.system.is_tracking_envelope()
to
identify if the data stream is associated with a tracking or fixed-tilt
system.
predicted_mounting_config = is_tracking_envelope(time_series,
daytime_mask,
clipping_mask)
print("Estimated mounting configuration: " + predicted_mounting_config.name)
Estimated mounting configuration: FIXED
Total running time of the script: ( 0 minutes 1.197 seconds)
Note
Click here to download the full example code
Infer Array Tilt/Azimuth - PVWatts Method¶
Infer the azimuth and tilt of a system using PVWatts-based methods
Identifing and/or validating the azimuth and tilt information for a
system is important, as these values must be correct for degradation
and system yield analysis. This example shows how to use
pvanalytics.system.infer_orientation_fit_pvwatts()
to estimate
a fixed-tilt system’s azimuth and tilt, using the system’s known
latitude-longitude coordinates and an associated AC power time series.
import pvanalytics
import matplotlib.pyplot as plt
from pvanalytics import system
import pandas as pd
import pathlib
import pvlib
First, we import an AC power data stream from the SERF East site located at NREL. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), under system ID 50. This data is timezone-localized.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'serf_east_15min_ac_power.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
data = data.sort_index()
time_series = data['ac_power']
time_series = time_series.asfreq('15T')
# Plot the first few days of the time series to visualize it
time_series[:pd.to_datetime("2016-07-06 00:00:00-07:00")].plot()
plt.show()
# Outline the ground truth metadata associated with the system
latitude = 39.742
longitude = -105.1727
actual_azimuth = 158
actual_tilt = 45

Next, we import the PSM3 data generated via the
pvlib.iotools.get_psm3()
function, using
site latitude-longitude coordinates. To generate the
PSM3 data, you must first register for NREL’s NSDRB API at the
following link: https://developer.nrel.gov/signup/.
PSM3 data can then be retrieved using pvlib.iotools.get_psm3()
.
The PSM3 data has been resampled to 15 minute intervals, to match the AC
power data.
psm3_file = pvanalytics_dir / 'data' / 'serf_east_psm3_data.csv'
psm3 = pd.read_csv(psm3_file, index_col=0, parse_dates=True)
Filter the PSM3 data to only include clearsky periods
is_clear = (psm3.ghi_clear == psm3.ghi)
is_daytime = (psm3.ghi > 0)
time_series_clearsky = time_series[is_clear & is_daytime]
time_series_clearsky = time_series_clearsky.dropna()
psm3_clearsky = psm3.loc[time_series_clearsky.index]
# Get solar azimuth and zenith from pvlib, based on
# lat-long coords
solpos_clearsky = pvlib.solarposition.get_solarposition(
time_series_clearsky.index, latitude, longitude)
Run the pvlib data and the sensor-based time series data through
the pvanalytics.system.infer_orientation_fit_pvwatts()
function.
best_tilt, best_azimuth, r2 = system.infer_orientation_fit_pvwatts(
time_series_clearsky,
psm3_clearsky.ghi_clear,
psm3_clearsky.dhi_clear,
psm3_clearsky.dni_clear,
solpos_clearsky.zenith,
solpos_clearsky.azimuth,
temperature=psm3_clearsky.temp_air,
)
# Compare actual system azimuth and tilt to predicted azimuth and tilt
print("Actual Azimuth: " + str(actual_azimuth))
print("Predicted Azimuth: " + str(best_azimuth))
print("Actual Tilt: " + str(actual_tilt))
print("Predicted Tilt: " + str(best_tilt))
Actual Azimuth: 158
Predicted Azimuth: 161.07038200206492
Actual Tilt: 45
Predicted Tilt: 45.37814701121294
Total running time of the script: ( 0 minutes 0.878 seconds)
Weather¶
This includes examples for weather quality checks.
Note
Click here to download the full example code
Weather Limits¶
Identifying weather values that are within limits.
Identifying weather values that are within logical, expected limits
and filtering data outside of these limits allows for more accurate
future data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.weather.wind_limits()
,
pvanalytics.quality.weather.temperature_limits()
,
and pvanalytics.quality.weather.relative_humidity_limits()
to identify and filter out values that are not within expected limits,
for wind speed, ambient temperature, and relative humidity, respectively.
import pvanalytics
from pvanalytics.quality.weather import wind_limits, \
temperature_limits, relative_humidity_limits
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the NREL RMIS weather station example, which contains wind speed, temperature, and relative humidity data in m/s, deg C, and % respectively. This data set contains 5-minute right-aligned measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'rmis_weather_data.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
print(data.head(10))
Ambient Temperature ... Wind Speed
2022-01-01 00:05:00 -10.59725 ... 1.930175
2022-01-01 00:10:00 -10.63128 ... 2.167881
2022-01-01 00:15:00 -10.66532 ... 0.827218
2022-01-01 00:20:00 -10.71636 ... 0.608528
2022-01-01 00:25:00 -10.66532 ... 1.036399
2022-01-01 00:30:00 -10.81846 ... 1.150498
2022-01-01 00:35:00 -10.78443 ... 1.958699
2022-01-01 00:40:00 -10.86950 ... 0.941317
2022-01-01 00:45:00 -10.88652 ... 1.511812
2022-01-01 00:50:00 -10.85249 ... 0.047541
[10 rows x 12 columns]
First, we use pvanalytics.quality.weather.wind_limits()
to identify any wind speed values that are not within an
acceptable range. We can then filter any of these values out of the
time series.
wind_limit_mask = wind_limits(data['Wind Speed'])
data['Wind Speed'].plot()
data.loc[~wind_limit_mask, 'Wind Speed'].plot(ls='', marker='o')
plt.legend(labels=["Wind Speed", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Wind Speed (m/s)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Next, we use pvanalytics.quality.weather.temperature_limits()
to identify any air temperature values that are not within an
acceptable range. We can then filter any of these values out of the time
series. Here, we set the temperature limits to (-10,10), illustrating how
to use the limits parameter.
temperature_limit_mask = temperature_limits(data['Ambient Temperature'],
limits=(-10, 10))
data['Ambient Temperature'].plot()
data.loc[~temperature_limit_mask, 'Ambient Temperature'].plot(ls='',
marker='o')
plt.legend(labels=["Ambient Temperature", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Ambient Temperature (deg C)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Finally, we use
pvanalytics.quality.weather.relative_humidity_limits()
to identify any RH values that are not within an
acceptable range. We can then filter any of these values out of the time
series.
rh_limit_mask = relative_humidity_limits(data['Relative Humidity'])
data['Relative Humidity'].plot()
data.loc[~rh_limit_mask, 'Relative Humidity'].plot(ls='', marker='o')
plt.legend(labels=['Relative Humidity', "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel('Relative Humidity (%)')
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.704 seconds)
Note
Click here to download the full example code
Module Temperature Check¶
Test whether the module temperature is correlated with irradiance.
Testing the correlation between module temperature and irradiance
measurements can help identify if there are issues with the module
temperature sensor.
In this example, we demonstrate how to use
pvanalytics.quality.weather.module_temperature_check()
, which
runs a linear regression model of module temperature vs. irradiance. Model
performance is then assessed by the Pearson correlation coefficient.
If it meets a minimum threshold, function outputs a True boolean.
If not, it outputs a False boolean.
import pvanalytics
from pvanalytics.quality.weather import module_temperature_check
from pvanalytics.features.daytime import power_or_irradiance
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import linregress
import pathlib
First, we read in example data from the NREL SERF West system, which contains data for module temperature and irradiance under the ‘module_temp_1__781’ and ‘poa_irradiance__771’ columns, respectively. This data set contains 15-minute averaged measurements, and is available via the NREL PVDAQ database as system 51.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
serf_east_file = pvanalytics_dir / 'data' / 'serf_west_15min.csv'
data = pd.read_csv(serf_east_file, index_col=0, parse_dates=True)
print(data[['module_temp_1__781', 'poa_irradiance__771']].head(10))
module_temp_1__781 poa_irradiance__771
2022-01-02 00:01:00 -6.4187 -1.9775
2022-01-02 00:16:00 -6.2204 -2.0451
2022-01-02 00:31:00 -6.1505 -2.1464
2022-01-02 00:46:00 -5.9059 -2.1463
2022-01-02 01:01:00 -5.7022 -1.8083
2022-01-02 01:16:00 -5.4755 -1.9941
2022-01-02 01:31:00 -5.3714 -2.0109
2022-01-02 01:46:00 -4.4072 -1.7236
2022-01-02 02:01:00 -4.1609 -1.6221
2022-01-02 02:16:00 -3.9174 -1.6390
Plot the module temperature to visualize it.
data['module_temp_1__781'].plot()
plt.xlabel("Date")
plt.ylabel("Module Temperature (deg C)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Plot the POA irradiance to visualize it.
data['poa_irradiance__771'].plot()
plt.xlabel("Date")
plt.ylabel("POA irradiance (W/m^2)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

We mask the irradiance time series into day-night periods, and remove any nighttime data to clean up the future regression.
predicted_day_night_mask = power_or_irradiance(
series=data['poa_irradiance__771'], freq='15T')
# Filter out nighttime periods
data = data[predicted_day_night_mask]
We then use pvanalytics.quality.weather.module_temperature_check()
to regress module temperature against irradiance POA, and check if the
relationship meets the minimum correlation coefficient criteria.
corr_coeff_bool = module_temperature_check(data['module_temp_1__781'],
data['poa_irradiance__771'])
print("Passes correlation coeff threshold? " + str(corr_coeff_bool))
Passes correlation coeff threshold? True
We then plot module temperature against irradiance to illustrate the relationship.
data.plot(x='module_temp_1__781',
y='poa_irradiance__771',
style='o', legend=None)
data_reg = data[['module_temp_1__781', 'poa_irradiance__771']].dropna()
# Add the linear regression line
reg = linregress(data_reg['module_temp_1__781'].values,
data_reg['poa_irradiance__771'].values)
plt.axline(xy1=(0, reg.intercept), slope=reg.slope, linestyle="--", color="k")
plt.xlabel("Module Temperature (deg C)")
plt.ylabel("POA irradiance (W/m^2)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()
# Print the Pearson correlation coefficient associated with the regression.
print("Pearson Correlation Coefficient: ")
print(reg.rvalue)

Pearson Correlation Coefficient:
0.6836377995026489
Total running time of the script: ( 0 minutes 0.537 seconds)
Release Notes¶
These are the bug-fixes, new features, and improvements for each release.
0.1.3 (December 16, 2022)¶
Enhancements¶
Added function
calculate_component_sum_series()
for calculating the component sum values of GHI, DHI, and DNI, and performing nighttime corrections (GH157, GH163)Updated the
stale_values_round()
function with pandas functionality, leading to the same results with a 300X speedup. (GH156, GH158)
Documentation¶
Added new gallery example pages:
Clarified parameter descriptions for
pdc0
andpac
inperformance_ratio_nrel()
(GH152, GH162).Restructured the example gallery by separating the examples into categories and adding README’s (GH154, GH155).
Contributors¶
Kirsten Perry (@kperrynrel)
Cliff Hansen (@cwhanse)
Josh Peterson (@PetersonUOregon)
Adam R. Jensen (@adamrjensen)
Will Holmgren (@wholmgren)
Kevin Anderson (@kanderso-nrel)
0.1.2 (August 18, 2022)¶
Enhancements¶
Detect data shifts in daily summed time series with
pvanalytics.quality.data_shifts.detect_data_shifts()
andpvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
. (GH142)
Bug Fixes¶
Fix
pvanalytics.quality.outliers.zscore()
so that the NaN mask is assigned the time series index (GH138)
Documentation¶
Added fifteen new gallery example pages:
pvanalytics.quality.data_shifts
(GH131):
Other¶
Removed empty modules
pvanalytics.filtering
andpvanalytics.fitting
until the relevant functionality is added to the package. (GH145)
Contributors¶
Kirsten Perry (@kperrynrel)
Cliff Hansen (@cwhanse)
Kevin Anderson (@kanderso-nrel)
Will Vining (@wfvining)
0.1.1 (February 18, 2022)¶
Enhancements¶
Quantification of irradiance variability with
pvanalytics.metrics.variability_index()
. (GH60, GH106)Internal refactor of
pvanalytics.metrics.performance_ratio_nrel()
to support other performance ratio formulas. (GH109)Detect shadows from fixed objects in GHI data using
pvanalytics.features.shading.fixed()
. (GH24, GH101)
Bug Fixes¶
Added
nan_policy
parameter to zscore calculation inpvanalytics.quality.outliers.zscore()
. (GH102, GH108)Prohibit pandas versions in the 1.1.x series to avoid an issue in
.groupby().rolling()
. Newer versions starting in 1.2.0 and older versions going back to 0.24.0 are still allowed. (GH82, GH118)Fixed an issue with
pvanalytics.features.clearsky.reno()
in recent pandas versions (GH125, GH128)Improved convergence in
pvanalytics.features.orientation.fixed_nrel()
(GH119, GH120)
Requirements¶
Drop support for python 3.6, which reached end of life Dec 2021 (GH129)
Documentation¶
Started an example gallery and added an example for
pvanalytics.features.clearsky.reno()
(GH125, GH127)
Contributors¶
Kevin Anderson (@kanderso-nrel)
Cliff Hansen (@cwhanse)
Will Vining (@wfvining)
Kirsten Perry (@kperrynrel)
Michael Hopwood (@MichaelHopwood)
Carlos Silva (@camsilva)
Ben Taylor (@bt-)
0.1.0 (November 20, 2020)¶
This is the first release of PVAnalytics. As such, the list of “changes” below is not specific. Future releases will describe specific changes here along with references to the relevant github issue and pull requests.
API Changes¶
Enhancements¶
Quality control functions for irradiance, weather and time series data. See
pvanalytics.quality
for content.Feature labeling functions for clipping, clearsky, daytime, and orientation. See
pvanalytics.features
for content.System parameter inference for tilt, azimuth, and whether the system is tracking or fixed. See
pvanalytics.system
for content.NREL performance ratio metric (
pvanalytics.metrics.performance_ratio_nrel()
).
Bug Fixes¶
Contributors¶
Special thanks to Matt Muller and Kirsten Perry of NREL for their assistance in adapting components from the PVFleets QA project to PVAnalytics.