PVAnalytics#
PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data. It can be used as a standalone analysis package and as a data cleaning “front end” for other PV analysis packages.
PVAnalytics is free and open source under a permissive license. The source code for PVAnalytics is hosted on github.
Library Overview#
The functions provided by PVAnalytics are organized in submodules based on their anticipated use. The list below provides a general overview; however, not all modules have functions at this time, see the API reference for current library status.
quality
contains submodules for different kinds of data quality checks.quality.data_shifts
contains quality checks for detecting and isolating data shifts in PV time series data.quality.irradiance
contains quality checks for irradiance measurements.quality.weather
contains quality checks for weather data (e.g. tests for physically plausible values of temperature, wind speed, humidity).quality.outliers
contains functions for identifying outliers.quality.gaps
contains functions for identifying gaps in the data (i.e. missing values, stuck values, and interpolation).quality.time
quality checks related to time (e.g. timestamp spacing, time shifts).quality.util
general purpose quality functions (e.g. simple range checks).
features
contains submodules with different methods for identifying and labeling salient features.features.clipping
functions for labeling inverter clipping.features.clearsky
functions for identifying periods of clear sky conditions.features.daytime
functions for identifying periods of day and night.features.orientation
functions for identifying orientation-related features in the data (e.g. days where the data looks like there is a functioning tracker). These functions are distinct from the functions in thesystem
module in that we are identifying features of data rather than properties of the system that produced the data.features.shading
functions for identifying shadows.
system
identification of PV system characteristics from data (e.g. nameplate power, tilt, azimuth)metrics
contains functions for computing PV system-level metrics (e.g. performance ratio)
Dependencies#
This project follows the guidelines laid out in NEP-29. It supports:
All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.
All minor versions of numpy released in the 24 months prior to the project, and at minimum the last three minor versions
The latest release of pvlib.
Additionally, PVAnalytics relies on several other packages in the open source scientific python ecosystem. For details on dependencies and versions, see our setup.py.
API Reference#
Quality#
Data Shifts#
Functions for identifying shifts in data values in time series
and for identifying periods with data shifts. For functions
that identify shifts in time, see quality.time
Detect data shifts in a time series of daily values. |
|
Return the start and end dates of the longest serially complete time series segment. |
Irradiance#
The check_*_limits_qcrad
functions use the QCRad algorithm 1 to
identify irradiance measurements that are beyond physical limits.
Test for physical limits on GHI using the QCRad criteria. |
|
Test for physical limits on DHI using the QCRad criteria. |
|
Test for physical limits on DNI using the QCRad criteria. |
All three checks can be combined into a single function call.
Test for physical limits on GHI, DHI or DNI using the QCRad criteria. |
Irradiance measurements can also be checked for consistency.
Check consistency of GHI, DHI and DNI using QCRad criteria. |
GHI and POA irradiance can be validated against clearsky values to eliminate data that is unrealistically high.
|
Identify irradiance values which do not exceed clearsky values. |
You may want to identify entire days that have unrealistically high or low insolation. The following function examines daily insolation, validating that it is within a reasonable range of the expected clearsky insolation for the same day.
Check that daily insolation lies between minimum and maximum values. |
There is function for calculating the component sum for GHI, DHI, and DNI, and correcting for nighttime periods. Using this function, we can estimate one irradiance field using the two other irradiance fields. This can be useful for comparison, as well as to calculate missing data fields.
Use the component sum equations to calculate the missing series, using the other available time series. |
Gaps#
Identify gaps in the data.
|
Identify sequences which appear to be linear. |
Data sometimes contains sequences of values that are “stale” or “stuck.” These are contiguous spans of data where the value does not change within the precision given. The functions below can be used to detect stale values.
Note
If the data has been altered in some way (i.e. temperature that has been rounded to an integer value) before being passed to these functions you may see unexpectedly large amounts of stale data.
|
Identify stale values in the data. |
|
Identify stale values by rounding. |
The following functions identify days with incomplete data.
|
Calculate a data completeness score for each day. |
|
Select data points that are part of days with complete data. |
Many data sets may have leading and trailing periods of days with sporadic or no data. The following functions can be used to remove those periods.
|
Get the start and end of data excluding leading and trailing gaps. |
|
Mask the beginning and end of the data if not all True. |
|
Trim the series based on the completeness score. |
Outliers#
Functions for detecting outliers.
|
Identify outliers based on the interquartile range. |
|
Identify outliers using the z-score. |
|
Identify outliers by the Hampel identifier. |
Time#
Quality control related to time. This includes things like time-stamp spacing, time-shifts, and time zone validation.
|
Check that the spacing between times conforms to freq. |
Timestamp shifts, such as daylight savings, can be identified with the following functions.
|
Identify time shifts using the ruptures library. |
|
Return True if events appears to have daylight savings shifts at the dates on which tz transitions to or from daylight savings time. |
Utilities#
The quality.util
module contains general-purpose/utility
functions for building your own quality checks.
|
Check whether a value falls withing the given limits. |
|
Return True for data on days when the day's minimum exceeds minimum. |
Weather#
Quality checks for weather data.
Identify relative humidity values that are within limits. |
|
|
Identify temperature values that are within limits. |
|
Identify wind speed values that are within limits. |
In addition to validating temperature by comparing with limits, module
temperature should be positively correlated with irradiance. Poor
correlation could indicate that the sensor has become detached from
the module, for example. Unlike other functions in the
quality
module which return Boolean masks over the input
series, this function returns a single Boolean value indicating
whether the entire series has passed (True
) or failed (False
)
the quality check.
Test whether the module temperature is correlated with irradiance. |
References
- 1
C. N. Long and Y. Shi, An Automated Quality Assessment and Control Algorithm for Surface Radiation Measurements, The Open Atmospheric Science Journal 2, pp. 23-37, 2008.
Features#
Functions for detecting features in the data.
Clipping#
Functions for identifying inverter clipping
|
Label clipping in AC power data based on levels in the data. |
|
Detect clipping based on a maximum power threshold. |
|
Identify clipping based on a the shape of the ac_power curve on each day. |
Clearsky#
|
Identify times when GHI is consistent with clearsky conditions. |
Orientation#
System orientation refers to mounting type (fixed or tracker) and the azimuth and tilt of the mounting. A system’s orientation can be determined by examining power or POA irradiance on days that are relatively sunny.
This module provides functions that operate on power or POA irradiance to identify system orientation on a daily basis. These functions can tell you whether a day’s profile matches that of a fixed system or system with a single-axis tracker.
Care should be taken when interpreting function output since other factors such as malfunctioning trackers can interfere with identification.
|
Flag days that match the profile of a fixed PV system on a sunny day. |
|
Flag days that match the profile of a single-axis tracking PV system on a sunny day. |
Daytime#
Functions that relate to determining day/night periods in a time series, and getting sunrise and sunset times based on the day-night mask outputs.
Return True for values that are during the day. |
|
|
Using the outputs of |
|
Using the outputs of |
Shading#
Functions for labeling shadows.
|
Detects shadows from fixed structures such as wires and poles. |
System#
This module contains functions and classes relating to PV system parameters such as nameplate power, tilt, azimuth, or whether the system is equipped with tracker.
Tracking#
|
Enum describing the orientation of a PV System. |
|
Infer whether the system is equipped with a tracker. |
Orientation#
The following function can be used to infer system orientation from power or plane of array irradiance measurements.
Determine system azimuth and tilt from power or POA using solar azimuth at the daily peak. |
|
|
Get the tilt and azimuth that give PVWatts v5 output that most closely fits the data in power_ac. |
Metrics#
Performance Ratio#
The following functions can be used to calculate system performance metrics.
|
Calculate NREL Performance Ratio. |
Variability#
Functions to calculate variability statistics.
|
Calculate the variability index. |
Example Gallery#
This gallery shows examples of pvanalytics functionality. Community contributions are welcome!
Clearsky Detection#
This includes examples for identifying clearsky periods in time series data.
Clipping#
This includes examples for identifying clipping in AC power time series.
Day-Night Masking#
This includes examples for identifying day-night periods in time series data.
Gaps#
This includes examples for identifying gaps and other related issues in time series data, including interpolated periods and stale data periods.
Irradiance-Quality#
This includes examples for running irradiance quality checks on irradiance time series data.
Metrics#
This includes examples for quantifying system time series metrics, including variability index (VI) and NREL performance ratio (PR).
Orientation#
This includes examples related to the orientation of a system (fixed-tilt, tracking).
Outliers#
This includes examples for identifying outliers in time series data.
PVFleets QA Examples#
These examples highlight the QA processes for temperature, power and irradiance data streams that are used in the NREL PV Fleet Performance Data Initiative (https://www.nrel.gov/pv/fleet-performance-data-initiative.html).
Data/Time Shifts#
This includes examples for identifying data/capacity/time shifts in time series data.
System#
This includes examples for system parameter estimation, including azimuth and tilt estimation, and determination if the system is fixed tilt or tracking.
Weather#
This includes examples for weather quality checks.
Clearsky Detection#
This includes examples for identifying clearsky periods in time series data.
Note
Go to the end to download the full example code
Clear-Sky Detection#
Identifying periods of clear-sky conditions using measured irradiance.
Identifying and filtering for clear-sky conditions is a useful way to
reduce noise when analyzing measured data. This example shows how to
use pvanalytics.features.clearsky.reno()
to identify clear-sky
conditions using measured GHI data. For this example we’ll use
GHI measurements from NREL in Golden CO.
import pvanalytics
from pvanalytics.features.clearsky import reno
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in the GHI measurements. For this example we’ll use an example file included in pvanalytics covering a single day, but the same process applies to data of any length.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ghi_file = pvanalytics_dir / 'data' / 'midc_bms_ghi_20220120.csv'
data = pd.read_csv(ghi_file, index_col=0, parse_dates=True)
# or you can fetch the data straight from the source using pvlib:
# date = pd.to_datetime('2022-01-20')
# data = pvlib.iotools.read_midc_raw_data_from_nrel('BMS', date, date)
measured_ghi = data['Global CMP22 (vent/cor) [W/m^2]']
Now model clear-sky irradiance for the location and times of the measured data:
location = pvlib.location.Location(39.742, -105.18)
clearsky = location.get_clearsky(data.index)
clearsky_ghi = clearsky['ghi']
Finally, use pvanalytics.features.clearsky.reno()
to identify
measurements during clear-sky conditions:
is_clearsky = reno(measured_ghi, clearsky_ghi)
# clear-sky times indicated in black
measured_ghi.plot()
measured_ghi[is_clearsky].plot(ls='', marker='o', ms=2, c='k')
plt.ylabel('Global Horizontal Irradiance [W/m2]')
plt.show()

Total running time of the script: (0 minutes 0.333 seconds)
Clipping#
This includes examples for identifying clipping in AC power time series.
Note
Go to the end to download the full example code
Clipping Detection#
Identifying clipping periods using the PVAnalytics clipping module.
Identifying and removing clipping periods from AC power time series
data aids in generating more accurate degradation analysis results,
as using clipped data can lead to under-predicting degradation. In this
example, we show how to use
pvanalytics.features.clipping.geometric()
to mask clipping periods in an AC power time series. We use a
normalized time series example provided by the PV Fleets Initiative,
where clipping periods are labeled as True, and non-clipping periods are
labeled as False. This example is adapted from the DuraMAT DataHub
clipping data set:
https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
import pvanalytics
from pvanalytics.features.clipping import geometric
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import numpy as np
First, read in the ac_power_inv_7539 example, and visualize a subset of the clipping periods via the “label” mask column.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
data['label'] = data['label'].astype(bool)
# This is the known frequency of the time series. You may need to infer
# the frequency or set the frequency with your AC power time series.
freq = "15min"
data['value_normalized'].plot()
data.loc[data['label'], 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Labeled Clipping"],
title="Clipped")
plt.xticks(rotation=20)
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, use pvanalytics.features.clipping.geometric()
to identify
clipping periods in the time series. Re-plot the data subset with this mask.
predicted_clipping_mask = geometric(ac_power=data['value_normalized'],
freq=freq)
data['value_normalized'].plot()
data.loc[predicted_clipping_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Clipping"],
title="Clipped")
plt.xticks(rotation=20)
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Compare the filter results to the ground-truth labeled data side-by-side, and generate an accuracy metric.
acc = 100 * np.sum(np.equal(data.label,
predicted_clipping_mask))/len(data.label)
print("Overall model prediction accuracy: " + str(round(acc, 2)) + "%")
Overall model prediction accuracy: 99.2%
Total running time of the script: (0 minutes 0.462 seconds)
Day-Night Masking#
This includes examples for identifying day-night periods in time series data.
Note
Go to the end to download the full example code
Day-Night Masking#
Masking day-night periods using the PVAnalytics daytime module.
Identifying and masking day-night periods in an AC power time series or
irradiance time series can aid in future data analysis, such as detecting
if a time series has daylight savings time or time shifts. Here, we use
pvanalytics.features.daytime.power_or_irradiance()
to mask day/night
periods, as well as to estimate sunrise and sunset times in the data set.
This function is particularly useful for cases where the time zone of a data
stream is unknown or incorrect, as its outputs can be used to determine time
zone.
import pvanalytics
from pvanalytics.features.daytime import power_or_irradiance
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import pvlib
import numpy as np
First, read in the 1-minute sampled AC power time series data, taken from the SERF East installation on the NREL campus. This sample is provided from the NREL PVDAQ database, and contains a column representing an AC power data stream.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'serf_east_1min_ac_power.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
data = data.sort_index()
# This is the known frequency of the time series. You may need to infer
# the frequency or set the frequency with your AC power time series.
freq = "1min"
# These are the latitude-longitude coordinates associated with the
# SERF East system.
latitude = 39.742
longitude = -105.173
# Plot the time series.
data['ac_power__752'].plot()
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

It is critical to set all negative values in the AC power time series to 0
for pvanalytics.features.daytime.power_or_irradiance()
to work
properly. Negative erroneous data may affect daytime mask assignments.
data.loc[data['ac_power__752'] < 0, 'ac_power__752'] = 0
Now, use pvanalytics.features.daytime.power_or_irradiance()
to mask day periods in the time series.
predicted_day_night_mask = power_or_irradiance(series=data['ac_power__752'],
freq=freq)
Function pvlib.solarposition.sun_rise_set_transit_spa()
is
used to get ground-truth sunrise and sunset times for each day at the site
location, and a SPA-daytime mask is calculated based on these times. Data
associated with SPA daytime periods is labeled as True, and data associated
with SPA nighttime periods is labeled as False.
SPA sunrise and sunset times are used here as a point of comparison to the
pvanalytics.features.daytime.power_or_irradiance()
outputs.
SPA-based sunrise and sunset values are not
needed to run pvanalytics.features.daytime.power_or_irradiance()
.
sunrise_sunset_df = pvlib.solarposition.sun_rise_set_transit_spa(data.index,
latitude,
longitude)
data['sunrise_time'] = sunrise_sunset_df['sunrise']
data['sunset_time'] = sunrise_sunset_df['sunset']
data['daytime_mask'] = True
data.loc[(data.index < data.sunrise_time) |
(data.index > data.sunset_time), "daytime_mask"] = False
Plot the AC power data stream with the mask output from
pvanalytics.features.daytime.power_or_irradiance()
,
as well as the SPA-calculated sunrise and sunset
data['ac_power__752'].plot()
data.loc[predicted_day_night_mask, 'ac_power__752'].plot(ls='', marker='o')
data.loc[~predicted_day_night_mask, 'ac_power__752'].plot(ls='', marker='o')
sunrise_sunset_times = sunrise_sunset_df[['sunrise',
'sunset']].drop_duplicates()
for sunrise, sunset in sunrise_sunset_times.itertuples(index=False):
plt.axvline(x=sunrise, c="blue")
plt.axvline(x=sunset, c="red")
plt.legend(labels=["AC Power", "Daytime", "Nighttime",
"SPA Sunrise", "SPA Sunset"])
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Compare the predicted mask to the ground-truth SPA mask, to get the model accuracy. Also, compare sunrise and sunset times for the predicted mask compared to the ground truth sunrise and sunset times.
acc = 100 * np.sum(np.equal(data.daytime_mask,
predicted_day_night_mask))/len(data.daytime_mask)
print("Overall model prediction accuracy: " + str(round(acc, 2)) + "%")
# Generate predicted + SPA sunrise times for each day
print("Sunrise Comparison:")
print(pd.DataFrame({'predicted_sunrise': predicted_day_night_mask
.index[predicted_day_night_mask]
.to_series().resample("d").first(),
'pvlib_spa_sunrise': sunrise_sunset_df["sunrise"]
.resample("d").first()}))
# Generate predicted + SPA sunset times for each day
print("Sunset Comparison:")
print(pd.DataFrame({'predicted_sunset': predicted_day_night_mask
.index[predicted_day_night_mask]
.to_series().resample("d").last(),
'pvlib_spa_sunset': sunrise_sunset_df["sunrise"]
.resample("d").last()}))
Overall model prediction accuracy: 98.39%
Sunrise Comparison:
predicted_sunrise pvlib_spa_sunrise
measured_on
2022-03-18 00:00:00-07:00 2022-03-18 06:11:00-07:00 2022-03-18 06:07:09.226592-07:00
2022-03-19 00:00:00-07:00 2022-03-19 06:14:00-07:00 2022-03-19 06:05:32.867153920-07:00
Sunset Comparison:
predicted_sunset pvlib_spa_sunset
measured_on
2022-03-18 00:00:00-07:00 2022-03-18 17:56:00-07:00 2022-03-18 06:07:09.226592-07:00
2022-03-19 00:00:00-07:00 2022-03-19 17:52:00-07:00 2022-03-19 06:05:32.867153920-07:00
Total running time of the script: (0 minutes 1.099 seconds)
Gaps#
This includes examples for identifying gaps and other related issues in time series data, including interpolated periods and stale data periods.
Note
Go to the end to download the full example code
Interpolated Data Periods#
Identifying periods in a time series where the data has been linearly interpolated.
Identifying periods where time series data has been linearly interpolated
and removing these periods may help to reduce noise when performing future
data analysis. This example shows how to use
pvanalytics.quality.gaps.interpolation_diff()
, which identifies and
masks linearly interpolated periods.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for interpolated periods. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data. This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. There is also an “interpolated_data_mask” column, where interpolated periods are labeled as True, and all other data is labeled as False. The data is sampled at 15-minute intervals.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173_interpolated_data.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15T")
data['value_normalized'].plot()
data.loc[data["interpolated_data_mask"], "value_normalized"].plot(ls='',
marker='.')
plt.legend(labels=["AC Power", "Interpolated Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.interpolation_diff()
to
identify linearly interpolated periods in the time series. We re-plot
the data with this mask. Please note that nighttime periods generally consist
of repeating 0 values; this means that these periods can be linearly
interpolated. Consequently, these periods are flagged by
pvanalytics.quality.gaps.interpolation_diff()
.
detected_interpolated_data_mask = gaps.interpolation_diff(
data['value_normalized'])
data['value_normalized'].plot()
data.loc[detected_interpolated_data_mask,
"value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Interpolated Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.854 seconds)
Note
Go to the end to download the full example code
Stale Data Periods#
Identifying stale data periods in a time series.
Identifing and removing stale, or consecutive repeating, values in time
series data reduces noise when performing data analysis. This example shows
how to use two PVAnalytics functions,
pvanalytics.quality.gaps.stale_values_diff()
and pvanalytics.quality.gaps.stale_values_round()
, to identify
and mask stale data periods in time series data.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for stale data periods. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. Additionally, there is a “stale_data_mask” column, where stale periods are labeled as True, and all other data is labeled as False. The data is sampled at 15-minute intervals.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173_stale_data.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15min")
data['value_normalized'].plot()
data.loc[data["stale_data_mask"], "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Inserted Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.stale_values_diff()
to
identify stale values in data. We visualize the detected stale periods
graphically. Please note that nighttime periods generally contain consecutive
repeating 0 values, which are flagged by
pvanalytics.quality.gaps.stale_values_diff()
.
stale_data_mask = gaps.stale_values_diff(data['value_normalized'])
data['value_normalized'].plot()
data.loc[stale_data_mask, "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Now, we use pvanalytics.quality.gaps.stale_values_round()
to
identify stale values in data, using rounded data. This function yields
similar results as pvanalytics.quality.gaps.stale_values_diff()
,
except it looks for consecutive repeating data that has been rounded to
a settable decimals place.
Please note that nighttime periods generally
contain consecutive repeating 0 values, which are flagged by
pvanalytics.quality.gaps.stale_values_round()
.
stale_data_round_mask = gaps.stale_values_round(data['value_normalized'])
data['value_normalized'].plot()
data.loc[stale_data_round_mask, "value_normalized"].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Detected Stale Data"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 1.137 seconds)
Note
Go to the end to download the full example code
Missing Data Periods#
Identifying days with missing data using a “completeness” score metric.
Identifying days with missing data and filtering these days out reduces noise
when performing data analysis. This example shows how to use a
daily data “completeness” score to identify and filter out days with missing
data. This includes using
pvanalytics.quality.gaps.completeness_score()
,
pvanalytics.quality.gaps.complete()
, and
pvanalytics.quality.gaps.trim_incomplete()
.
import pvanalytics
from pvanalytics.quality import gaps
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we import the AC power data stream that we are going to check for completeness. The time series we download is a normalized AC power time series from the PV Fleets Initiative, and is available via the DuraMAT DataHub: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data. This data set has a Pandas DateTime index, with the min-max normalized AC power time series represented in the ‘value_normalized’ column. The data is sampled at 15-minute intervals. This data set does contain NaN values.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'ac_power_inv_2173.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
data = data.asfreq("15min")
Now, we use pvanalytics.quality.gaps.completeness_score()
to get the
percentage of daily data that isn’t NaN. This percentage score is calculated
as the total number of non-NA values over a 24-hour period, meaning that
nighttime values are expected.
data_completeness_score = gaps.completeness_score(data['value_normalized'])
# Visualize data completeness score as a time series.
data_completeness_score.plot()
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.tight_layout()
plt.show()

We mask complete days, based on daily completeness score, using
pvanalytics.quality.gaps.complete()
.
min_completeness = 0.333
daily_completeness_mask = gaps.complete(data['value_normalized'],
minimum_completeness=min_completeness)
# Mask complete days, based on daily completeness score
data_completeness_score.plot()
data_completeness_score.loc[daily_completeness_mask].plot(ls='', marker='.')
data_completeness_score.loc[~daily_completeness_mask].plot(ls='', marker='.')
plt.axhline(y=min_completeness, color='r', linestyle='--')
plt.legend(labels=["Completeness Score", "Threshold met",
"Threshold not met", "Completeness Threshold (.33)"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.tight_layout()
plt.show()

We trim the time series based on the completeness score, where the time
series must have at least 10 consecutive days of data that meet the
completeness threshold. This is done using
pvanalytics.quality.gaps.trim_incomplete()
.
number_consecutive_days = 10
completeness_trim_mask = gaps.trim_incomplete(data['value_normalized'],
days=number_consecutive_days)
# Re-visualize the time series with the data masked by the trim mask
data[completeness_trim_mask]['value_normalized'].plot()
data[~completeness_trim_mask]['value_normalized'].plot()
plt.legend(labels=[True, False],
title="Daily Data Passing")
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 1.024 seconds)
Irradiance-Quality#
This includes examples for running irradiance quality checks on irradiance time series data.
Note
Go to the end to download the full example code
Clearsky Limits for Daily Insolation#
Checking the clearsky limits for daily insolation data.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use pvanalytics.quality.irradiance.daily_insolation_limits()
to determine when the daily insolation lies between a minimum
and a maximum value. Irradiance measurements and clear-sky
irradiance on each day are integrated with the trapezoid rule
to calculate daily insolation. For this example we will use data
from the RMIS weather system located on the NREL campus
in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import daily_insolation_limits
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the measured data:
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.daily_insolation_limits()
to identify if the daily insolation lies between a minimum
and a maximum value. Here, we check GHI irradiance field
‘irradiance_ghi__7981’.
pvanalytics.quality.irradiance.daily_insolation_limits()
returns a mask that identifies data that falls between
lower and upper limits. The defaults (used here)
are upper bound of 125% of clear-sky daily insolation,
and lower bound of 40% of clear-sky daily insolation.
daily_insolation_mask = daily_insolation_limits(data['irradiance_ghi__7981'],
clearsky['ghi'])
Plot the ‘irradiance_ghi__7981’ data stream and its associated clearsky GHI data stream. Mask the GHI time series by its daily_insolation_mask.
data['irradiance_ghi__7981'].plot()
clearsky['ghi'].plot()
data.loc[daily_insolation_mask, 'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Clearsky GHI",
"Within Daily Insolation Limit"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.402 seconds)
Note
Go to the end to download the full example code
Clearsky Limits for Irradiance Data#
Checking the clearsky limits of irradiance data.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use pvanalytics.quality.irradiance.clearsky_limits()
to identify irradiance values that do not exceed
a limit based on a clear-sky model. For this example we will
use GHI data from the RMIS weather system located on the NREL campus in CO.
import pvanalytics
from pvanalytics.quality.irradiance import clearsky_limits
from pvanalytics.features.daytime import power_or_irradiance
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned POA, GHI, DNI, DHI, and GNI measurements, but only the GHI is relevant here.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
freq = '5min'
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the
measured data. You can do this using
pvlib.location.Location.get_clearsky()
, using the lat-long
coordinates associated the RMIS NREL system.
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.clearsky_limits()
.
Here, we check GHI data in field ‘irradiance_ghi__7981’.
pvanalytics.quality.irradiance.clearsky_limits()
returns a mask that identifies data that falls between
lower and upper limits. The defaults (used here)
are upper bound of 110% of clear-sky GHI, and
no lower bound.
clearsky_limit_mask = clearsky_limits(data['irradiance_ghi__7981'],
clearsky['ghi'])
Mask nighttime values in the GHI time series using the
pvanalytics.features.daytime.power_or_irradiance()
function.
We will then remove nighttime values from the GHI time series.
day_night_mask = power_or_irradiance(series=data['irradiance_ghi__7981'],
freq=freq)
Plot the ‘irradiance_ghi__7981’ data stream and its associated clearsky GHI data stream. Mask the GHI time series by its clearsky_limit_mask for daytime periods. Please note that a simple Ineichen model with static monthly turbidities isn’t always accurate, as in this case. Other models that may provide better clear-sky estimates include McClear or PSM3.
data['irradiance_ghi__7981'].plot()
clearsky['ghi'].plot()
data.loc[clearsky_limit_mask & day_night_mask][
'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Clearsky GHI",
"Under Clearsky Limit"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.494 seconds)
Note
Go to the end to download the full example code
QCrad Limits for Irradiance Data#
Test for physical limits on GHI, DHI or DNI using the QCRad criteria.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use
pvanalytics.quality.irradiance.check_irradiance_limits_qcrad()
to test for physical limits on GHI, DHI or DNI using the QCRad criteria.
For this example we will use data from the RMIS weather system located
on the NREL campus in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import check_irradiance_limits_qcrad
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location,
based on the data’s time zone and site latitude-longitude
coordinates. This is done using the
pvlib.solarposition.get_solarposition()
function.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Generate the estimated extraterrestrial radiation for the time series,
referred to as dni_extra. This is done using the
pvlib.irradiance.get_extra_radiation()
function.
dni_extra = pvlib.irradiance.get_extra_radiation(data.index)
Use pvanalytics.quality.irradiance.check_irradiance_limits_qcrad()
to generate the QCRAD irradiance limit mask
qcrad_limit_mask = check_irradiance_limits_qcrad(
solar_zenith=solar_position['zenith'],
dni_extra=dni_extra,
ghi=data['irradiance_ghi__7981'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'])
Plot the ‘irradiance_ghi__7981’ data stream with its associated QCRAD limit mask.
data['irradiance_ghi__7981'].plot()
data.loc[qcrad_limit_mask[0], 'irradiance_ghi__7981'].plot(ls='', marker='.')
plt.legend(labels=["RMIS GHI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Plot the ‘irradiance_dhi__7983 data stream with its associated QCRAD limit mask.
data['irradiance_dhi__7983'].plot()
data.loc[qcrad_limit_mask[1], 'irradiance_dhi__7983'].plot(ls='', marker='.')
plt.legend(labels=["RMIS DHI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DHI (W/m^2)")
plt.tight_layout()
plt.show()

Plot the ‘irradiance_dni__7982’ data stream with its associated QCRAD limit mask.
data['irradiance_dni__7982'].plot()
data.loc[qcrad_limit_mask[2], 'irradiance_dni__7982'].plot(ls='', marker='.')
plt.legend(labels=["RMIS DNI", "Within QCRAD Limits"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DNI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.752 seconds)
Note
Go to the end to download the full example code
QCrad Consistency for Irradiance Data#
Check consistency of GHI, DHI and DNI using QCRad criteria.
Identifying and filtering out invalid irradiance data is a
useful way to reduce noise during analysis. In this example,
we use
pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad()
to check the consistency of GHI, DHI and DNI data using QCRad criteria.
For this example we will use data from the RMIS weather system located
on the NREL campus in Colorado, USA.
import pvanalytics
from pvanalytics.quality.irradiance import check_irradiance_consistency_qcrad
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location, based on the data’s time zone and site latitude-longitude coordinates.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Use
pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad()
to generate the QCRAD consistency mask.
qcrad_consistency_mask = check_irradiance_consistency_qcrad(
solar_zenith=solar_position['zenith'],
ghi=data['irradiance_ghi__7981'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'])
Plot the GHI, DHI, and DNI data streams with the QCRAD consistency mask overlay. This mask applies to all 3 data streams.
fig = data[['irradiance_ghi__7981', 'irradiance_dhi__7983',
'irradiance_dni__7982']].plot()
# Highlight periods where the QCRAD consistency mask is True
fig.fill_between(data.index, fig.get_ylim()[0], fig.get_ylim()[1],
where=qcrad_consistency_mask[0], alpha=0.4)
fig.legend(labels=["RMIS GHI", "RMIS DHI", "RMIS DNI", "QCRAD Consistent"],
loc="upper center")
plt.xlabel("Date")
plt.ylabel("Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Plot the GHI, DHI, and DNI data streams with the diffuse ratio limit mask overlay. This mask is true when the DHI / GHI ratio passes the limit test.
fig = data[['irradiance_ghi__7981', 'irradiance_dhi__7983',
'irradiance_dni__7982']].plot()
# Highlight periods where the GHI ratio passes the limit test
fig.fill_between(data.index, fig.get_ylim()[0], fig.get_ylim()[1],
where=qcrad_consistency_mask[1], alpha=0.4)
fig.legend(labels=["RMIS GHI", "RMIS DHI", "RMIS DNI",
"Within Diffuse Ratio Limit"],
loc="upper center")
plt.xlabel("Date")
plt.ylabel("Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.786 seconds)
Note
Go to the end to download the full example code
Component Sum Equations for Irradiance Data#
Estimate GHI, DHI, and DNI using the component sum equations, with nighttime corrections.
Estimating GHI, DHI, and DNI using the component sum equations is useful if the associated field is missing, or as a comparison to an existing physical data stream.
import pvanalytics
from pvanalytics.quality.irradiance import calculate_component_sum_series
import pvlib
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned data. It includes POA, GHI, DNI, DHI, and GNI measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
Now generate solar zenith estimates for the location,
based on the data’s time zone and site latitude-longitude
coordinates. This is done using the
pvlib.solarposition.get_solarposition()
function.
latitude = 39.742
longitude = -105.18
time_zone = "Etc/GMT+7"
data = data.tz_localize(time_zone)
solar_position = pvlib.solarposition.get_solarposition(data.index,
latitude,
longitude)
Get the clearsky DNI values associated with the current location, using
the pvlib.solarposition.get_solarposition()
function. These clearsky
values are used to calculate DNI data.
site = pvlib.location.Location(latitude, longitude, tz=time_zone)
clearsky = site.get_clearsky(data.index)
Use pvanalytics.quality.irradiance.calcuate_ghi_component()
to estimate GHI measurements using DHI and DNI measurements
component_sum_ghi = calculate_component_sum_series(
solar_zenith=solar_position['zenith'],
dhi=data['irradiance_dhi__7983'],
dni=data['irradiance_dni__7982'],
zenith_limit=90,
fill_night_value='equation')
Plot the ‘irradiance_ghi__7981’ data stream against the estimated component sum GHI, for comparison
data['irradiance_ghi__7981'].plot()
component_sum_ghi.plot()
plt.legend(labels=["RMIS GHI", "Component Sum GHI"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Use pvanalytics.quality.irradiance.calcuate_dhi_component()
to estimate DHI measurements using GHI and DNI measurements
component_sum_dhi = calculate_component_sum_series(
solar_zenith=solar_position['zenith'],
dni=data['irradiance_dni__7982'],
ghi=data['irradiance_ghi__7981'],
zenith_limit=90,
fill_night_value='equation')
Plot the ‘irradiance_dhi__7983’ data stream against the estimated component sum GHI, for comparison
data['irradiance_dhi__7983'].plot()
component_sum_dhi.plot()
plt.legend(labels=["RMIS DHI", "Component Sum DHI"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("GHI (W/m^2)")
plt.tight_layout()
plt.show()

Use pvanalytics.quality.irradiance.calcuate_dni_component()
to estimate DNI measurements using GHI and DHI measurements
component_sum_dni = calculate_component_sum_series(
solar_zenith=solar_position['zenith'],
dhi=data['irradiance_dhi__7983'],
ghi=data['irradiance_ghi__7981'],
dni_clear=clearsky['dni'],
zenith_limit=90,
fill_night_value='equation')
Plot the ‘irradiance_dni__7982’ data stream against the estimated component sum GHI, for comparison
data['irradiance_dni__7982'].plot()
component_sum_dni.plot()
plt.legend(labels=["RMIS DNI", "Component Sum DNI"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("DNI (W/m^2)")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.795 seconds)
Metrics#
This includes examples for quantifying system time series metrics, including variability index (VI) and NREL performance ratio (PR).
Note
Go to the end to download the full example code
Calculate Variability Index#
Calculate the Variability Index for a GHI time series.
Highly variable irradiance can cause mismatch between irradiance and power measurements and result in noisy performance metrics. As such, identifying and removing highly variable conditions is useful in certain analyses. Identification and quantification of highly variable conditions are also of interest in grid integration and hourly modeling error contexts. The variability index (VI) is one way of quantifying the variability or jaggedness of an irradiance signal relative to a corresponding reference clear-sky irradiance profile. Note that quantifying variability is related to but distinct from clear-sky detection. For example, both clear and overcast skies have low VI. This example uses GHI data collected from the NREL RMIS system to calculate the variability index as a time series.
import pvanalytics
from pvanalytics.metrics import variability_index
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
import pvlib
First, read in data from the RMIS NREL system. This data set contains 5-minute right-aligned POA, GHI, DNI, DHI, and GNI measurements, but only the GHI is relevant here.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
# Make the datetime index tz-aware.
data.index = data.index.tz_localize("Etc/GMT+7")
Now model clear-sky irradiance for the location and times of the
measured data. You can do this using
pvlib.location.Location.get_clearsky()
, using the lat-long
coordinates associated the RMIS NREL system.
location = pvlib.location.Location(39.7407, -105.1686)
clearsky = location.get_clearsky(data.index)
Calculate the variability index for the system GHI data stream using
the pvanalytics.metrics.variability_index()
function, using
an hourly frequency.
variability_index_series = variability_index(data['irradiance_ghi__7981'],
clearsky['ghi'],
freq='1h')
Plot the calculated VI against the underlying GHI measurements, for the purpose of comparison.
fig, axes = plt.subplots(2, 1, sharex=True)
data['irradiance_ghi__7981'].plot(ax=axes[0], label='measured')
clearsky['ghi'].plot(ax=axes[0], label='clear-sky')
variability_index_series.plot(ax=axes[1], drawstyle='steps-post')
axes[0].legend()
axes[0].set_ylabel("GHI [W/m2]")
axes[1].set_ylabel("Variability Index")
fig.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.443 seconds)
Note
Go to the end to download the full example code
Calculate Performance Ratio (NREL)#
Calculate the NREL Performance Ratio for a system.
When evaluating PV system performance it is often desirable to distinguish
uncontrollable effects like weather variation from controllable effects
like soiling and hardware issues. The NREL Performance Ratio
(or “Weather-Corrected Performance Ratio”) is a unitless metric that
normalizes system output for variation in irradiance and temperature,
making it insensitive to uncontrollable weather variation and more
reflective of system health. In this example, we
show how to calculate the NREL PR at two different frequencies: for a
complete time series, and at daily intervals. We use the
pvanalytics.metrics.performance_ratio_nrel()
function.
import pvanalytics
from pvanalytics.metrics import performance_ratio_nrel
import pandas as pd
import pathlib
import matplotlib.pyplot as plt
First, we read in data from the NREL RSF II system. This data set contains 15-minute interval data for AC power, POA irradiance, ambient temperature, and wind speed, among others. The complete data set for the NREL RSF II installation is available in the PVDAQ database, under system ID 1283.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'nrel_RSF_II.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Now we calculate the PR for the entire time series, using the
POA, ambient temperature, wind speed, and AC power fields. We use this
data as parameters in the
pvanalytics.metrics.performance_ratio_nrel()
function.
In this example we are calculating PR for a single inverter connected
to a 204.12 kW PV array.
pr_whole_series = performance_ratio_nrel(data['poa_irradiance__1055'],
data['ambient_temp__1053'],
data['wind_speed__1051'],
data['inv2_ac_power_w__1047']/1000,
204.12)
print("RSF II, PR for the whole time series:")
print(pr_whole_series)
RSF II, PR for the whole time series:
0.5851958594021633
Next, we recalculate the PR on a daily basis. We separate the time series into daily intervals, and calculate the PR for each day. Note that this inverter was offline for the last day in this dataset, resulting in a PR value of zero for that day.
dates = list(pd.Series(data.index.date).drop_duplicates())
daily_pr_list = list()
for date in dates:
data_subset = data[data.index.date == date]
# Run the PR calculation for the specific day.
pr = performance_ratio_nrel(data_subset['poa_irradiance__1055'],
data_subset['ambient_temp__1053'],
data_subset['wind_speed__1051'],
data_subset['inv2_ac_power_w__1047']/1000,
204.12)
daily_pr_list.append({"date": date,
"PR": pr})
daily_pr_df = pd.DataFrame(daily_pr_list)
# Plot the PR time series to visualize it
daily_pr_df.set_index('date').plot()
plt.axhline(pr_whole_series, color='r', ls='--', label='PR, Entire Series')
plt.xticks(rotation=25)
plt.legend()
plt.ylabel('NREL PR')
plt.xlabel('Date')
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.282 seconds)
Orientation#
This includes examples related to the orientation of a system (fixed-tilt, tracking).
Note
Go to the end to download the full example code
Flag Sunny Days for a Fixed-Tilt System#
Flag sunny days for a fixed-tilt PV system.
Identifying and masking sunny days for a fixed-tilt system is important when performing future analyses that require filtered sunny day data. For this example we will use data from the fixed-tilt NREL SERF East system located on the NREL campus in Colorado, USA, and generate a sunny day mask. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), as system ID 50. This data is timezone-localized.
import pvanalytics
from pvanalytics.features import daytime as day
from pvanalytics.features.orientation import fixed_nrel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the NREL SERF East fixed-tilt system. This data set contains 15-minute interval AC power data.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'serf_east_15min_ac_power.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Mask day-night periods using the
pvanalytics.features.daytime.power_or_irradiance()
function.
Then apply pvanalytics.features.orientation.fixed_nrel()
to the AC power stream and mask the sunny days in the time series.
daytime_mask = day.power_or_irradiance(data['ac_power'])
fixed_sunny_days = fixed_nrel(data['ac_power'],
daytime_mask)
Plot the AC power stream with the sunny day mask applied to it.
data['ac_power'].plot()
data.loc[fixed_sunny_days, 'ac_power'].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Sunny Day"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.948 seconds)
Note
Go to the end to download the full example code
Flag Sunny Days for a Tracking System#
Flag sunny days for a single-axis tracking PV system.
Identifying and masking sunny days for a single-axis tracking system is important when performing future analyses that require filtered sunny day data. For this example we will use data from the single-axis tracking NREL Mesa system located on the NREL campus in Colorado, USA, and generate a sunny day mask. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), as system ID 50. This data is timezone-localized.
import pvanalytics
from pvanalytics.features import daytime as day
from pvanalytics.features.orientation import tracking_nrel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, read in data from the NREL Mesa 1-axis tracking system. This data set contains 15-minute interval AC power data.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'nrel_1axis_tracker_mesa_ac_power.csv'
data = pd.read_csv(file, index_col=0, parse_dates=True)
Mask day-night periods using the
pvanalytics.features.daytime.power_or_irradiance()
function.
Then apply pvanalytics.features.orientation.tracking_nrel()
to the AC power stream and mask the sunny days in the time series.
daytime_mask = day.power_or_irradiance(data['ac_power'])
tracking_sunny_days = tracking_nrel(data['ac_power'],
daytime_mask)
Plot the AC power stream with the sunny day mask applied to it.
data['ac_power'].plot()
data.loc[tracking_sunny_days, 'ac_power'].plot(ls='', marker='.')
plt.legend(labels=["AC Power", "Sunny Day"],
loc="upper left")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.970 seconds)
Outliers#
This includes examples for identifying outliers in time series data.
Note
Go to the end to download the full example code
Z-Score Outlier Detection#
Identifying outliers in time series using z-score outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.zscore()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import zscore
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.zscore()
to identify
outliers in the time series, and plot the data with the z-score outlier mask.
zscore_outlier_mask = zscore(data=data['value_normalized'])
data['value_normalized'].plot()
data.loc[zscore_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.222 seconds)
Note
Go to the end to download the full example code
Tukey Outlier Detection#
Identifying outliers in time series using Tukey outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.tukey()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import tukey
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.tukey()
to identify
outliers in the time series, and plot the data with the tukey outlier mask.
tukey_outlier_mask = tukey(data=data['value_normalized'],
k=0.5)
data['value_normalized'].plot()
data.loc[tukey_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.220 seconds)
Note
Go to the end to download the full example code
Hampel Outlier Detection#
Identifying outliers in time series using Hampel outlier detection.
Identifying and removing outliers from PV sensor time series
data allows for more accurate data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.outliers.hampel()
to identify and filter
out outliers in a time series.
import pvanalytics
from pvanalytics.quality.outliers import hampel
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the ac_power_inv_7539_outliers example. Min-max normalized AC power is represented by the “value_normalized” column. There is a boolean column “outlier” where inserted outliers are labeled as True, and all other values are labeled as False. These outlier values were inserted manually into the data set to illustrate outlier detection by each of the functions. We use a normalized time series example provided by the PV Fleets Initiative. This example is adapted from the DuraMAT DataHub clipping data set: https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539_outliers.csv'
data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
print(data.head(10))
value_normalized outlier
timestamp
2017-04-10 19:15:00+00:00 0.000002 False
2017-04-10 19:30:00+00:00 0.000000 False
2017-04-11 06:15:00+00:00 0.000000 False
2017-04-11 06:45:00+00:00 0.033103 False
2017-04-11 07:00:00+00:00 0.043992 False
2017-04-11 07:15:00+00:00 0.055615 False
2017-04-11 07:30:00+00:00 0.110986 False
2017-04-11 07:45:00+00:00 0.184948 False
2017-04-11 08:00:00+00:00 0.276810 False
2017-04-11 08:15:00+00:00 0.358061 False
We then use pvanalytics.quality.outliers.hampel()
to identify
outliers in the time series, and plot the data with the hampel outlier mask.
hampel_outlier_mask = hampel(data=data['value_normalized'],
window=10)
data['value_normalized'].plot()
data.loc[hampel_outlier_mask, 'value_normalized'].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.341 seconds)
PVFleets QA Examples#
These examples highlight the QA processes for temperature, power and irradiance data streams that are used in the NREL PV Fleet Performance Data Initiative (https://www.nrel.gov/pv/fleet-performance-data-initiative.html).
Note
Go to the end to download the full example code
PV Fleets QA Process: Temperature#
PV Fleets Temperature QA Pipeline
The NREL PV Fleets Data Initiative uses PVAnalytics routines to assess the quality of systems’ PV data. In this example, the PV Fleets process for assessing the data quality of a temperature data stream is shown. This example pipeline illustrates how several PVAnalytics functions can be used in sequence to assess the quality of a temperature data stream.
import pandas as pd
import pathlib
from matplotlib import pyplot as plt
import pvanalytics
from pvanalytics.quality import data_shifts as ds
from pvanalytics.quality import gaps
from pvanalytics.quality.outliers import zscore
First, we import a module temperature data stream from a PV installation at NREL. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), under system ID 4. This data is timezone-localized.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'system_4_module_temperature.parquet'
time_series = pd.read_parquet(file)
time_series.set_index('index', inplace=True)
time_series.index = pd.to_datetime(time_series.index)
time_series = time_series['module_temp_1']
latitude = 39.7406
longitude = -105.1774
# Identify the temperature data stream type (this affects the type of
# checks we do)
data_stream_type = "module"
data_freq = '15min'
time_series = time_series.asfreq(data_freq)
First, let’s visualize the original time series as reference.
time_series.plot(title="Original Time Series")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.tight_layout()
plt.show()

Now, let’s run basic data checks to identify stale and abnormal/outlier data in the time series. Basic data checks include the following steps:
Flatlined/stale data periods (
pvanalytics.quality.gaps.stale_values_round()
)“Abnormal” data periods, which are out of the temperature limits of -40 to 185 deg C. Additional checks based on thresholds are applied depending on the type of temperature sensor (ambient or module) (
pvanalytics.quality.weather.temperature_limits()
)Outliers, which are defined as more than one 4 standard deviations away from the mean (
pvanalytics.quality.outliers.zscore()
)
Additionally, we identify the units of the temperature stream as either Celsius or Fahrenheit.
# REMOVE STALE DATA
stale_data_mask = gaps.stale_values_round(time_series,
window=3,
decimals=2)
# FIND ABNORMAL PERIODS
temperature_limit_mask = pvanalytics.quality.weather.temperature_limits(
time_series, limits=(-40, 185))
temperature_limit_mask = temperature_limit_mask.reindex(
index=time_series.index,
method='ffill',
fill_value=False)
# FIND OUTLIERS (Z-SCORE FILTER)
zscore_outlier_mask = zscore(time_series,
zmax=4,
nan_policy='omit')
# PERFORM ADDITIONAL CHECKS, INCLUDING CHECKING UNITS (CELSIUS OR FAHRENHEIT)
temperature_mean = time_series.mean()
if temperature_mean > 35:
temp_units = 'F'
else:
temp_units = 'C'
print("Estimated Temperature units: " + str(temp_units))
# Run additional checks based on temperature sensor type.
if data_stream_type == 'module':
if temp_units == 'C':
module_limit_mask = (time_series <= 85)
temperature_limit_mask = (temperature_limit_mask & module_limit_mask)
if data_stream_type == 'ambient':
ambient_limit_mask = pvanalytics.quality.weather.temperature_limits(
time_series, limits=(-40, 120))
temperature_limit_mask = (temperature_limit_mask & ambient_limit_mask)
if temp_units == 'C':
ambient_limit_mask_2 = (time_series <= 50)
temperature_limit_mask = (temperature_limit_mask &
ambient_limit_mask_2)
# Get the percentage of data flagged for each issue, so it can later be logged
pct_stale = round((len(time_series[
stale_data_mask].dropna())/len(time_series.dropna())*100), 1)
pct_erroneous = round((len(time_series[
~temperature_limit_mask].dropna())/len(time_series.dropna())*100), 1)
pct_outlier = round((len(time_series[
zscore_outlier_mask].dropna())/len(time_series.dropna())*100), 1)
# Visualize all of the time series issues (stale, abnormal, outlier)
time_series.plot()
labels = ["Temperature"]
if any(stale_data_mask):
time_series.loc[stale_data_mask].plot(ls='',
marker='o',
color="green")
labels.append("Stale")
if any(~temperature_limit_mask):
time_series.loc[~temperature_limit_mask].plot(ls='',
marker='o',
color="yellow")
labels.append("Abnormal")
if any(zscore_outlier_mask):
time_series.loc[zscore_outlier_mask].plot(ls='',
marker='o',
color="purple")
labels.append("Outlier")
plt.legend(labels=labels)
plt.title("Time Series Labeled for Basic Issues")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.tight_layout()
plt.show()

Estimated Temperature units: C
Now, let’s filter out any of the flagged data from the basic temperature checks (stale or abnormal data). Then we can re-visualize the data post-filtering.
# Filter the time series, taking out all of the issues
issue_mask = ((~stale_data_mask) & (temperature_limit_mask) &
(~zscore_outlier_mask))
time_series = time_series[issue_mask]
time_series = time_series.asfreq(data_freq)
# Visualize the time series post-filtering
time_series.plot(title="Time Series Post-Basic Data Filtering")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.tight_layout()
plt.show()

We filter the time series based on its daily completeness score. This filtering scheme requires at least 25% of data to be present for each day to be included. We further require at least 10 consecutive days meeting this 25% threshold to be included.
# Visualize daily data completeness
data_completeness_score = gaps.completeness_score(time_series)
# Visualize data completeness score as a time series.
data_completeness_score.plot()
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.axhline(y=0.25, color='r', linestyle='-',
label='Daily Completeness Cutoff')
plt.legend()
plt.tight_layout()
plt.show()
# Trim the series based on daily completeness score
trim_series = pvanalytics.quality.gaps.trim_incomplete(
time_series,
minimum_completeness=.25,
freq=data_freq)
first_valid_date, last_valid_date = \
pvanalytics.quality.gaps.start_stop_dates(trim_series)
time_series = time_series[first_valid_date.tz_convert(time_series.index.tz):
last_valid_date.tz_convert(time_series.index.tz)]
time_series = time_series.asfreq(data_freq)

Next, we check the time series for any abrupt data shifts. We take the
longest continuous part of the time series that is free of data shifts.
We use pvanalytics.quality.data_shifts.detect_data_shifts()
to
detect data shifts in the time series.
# Resample the time series to daily mean
time_series_daily = time_series.resample('D').mean()
data_shift_start_date, data_shift_end_date = \
ds.get_longest_shift_segment_dates(time_series_daily)
data_shift_period_length = (data_shift_end_date -
data_shift_start_date).days
# Get the number of shift dates
data_shift_mask = ds.detect_data_shifts(time_series_daily)
# Get the shift dates
shift_dates = list(time_series_daily[data_shift_mask].index)
if len(shift_dates) > 0:
shift_found = True
else:
shift_found = False
# Visualize the time shifts for the daily time series
print("Shift Found: ", shift_found)
edges = ([time_series_daily.index[0]] + shift_dates +
[time_series_daily.index[-1]])
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(time_series_daily.loc[st:ed])
plt.title("Daily Time Series Labeled for Data Shifts")
plt.xlabel("Date")
plt.ylabel("Mean Daily Temperature")
plt.tight_layout()
plt.show()

Shift Found: False
Finally, we filter the time series to only include the longest shift-free period. We then visualize the final time series post-QA filtering.
time_series = time_series[
(time_series.index >=
data_shift_start_date.tz_convert(time_series.index.tz)) &
(time_series.index <=
data_shift_end_date.tz_convert(time_series.index.tz))]
time_series = time_series.asfreq(data_freq)
# Plot the final filtered time series.
time_series.plot(title="Final Filtered Time Series")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.tight_layout()
plt.show()

Generate a dictionary output for the QA assessment of this data stream, including the percent stale and erroneous data detected, any shift dates, and the detected temperature units for the data stream.
qa_check_dict = {"temperature_units": temp_units,
"pct_stale": pct_stale,
"pct_erroneous": pct_erroneous,
"pct_outlier": pct_outlier,
"data_shifts": shift_found,
"shift_dates": shift_dates}
print("QA Results:")
print(qa_check_dict)
QA Results:
{'temperature_units': 'C', 'pct_stale': 36.6, 'pct_erroneous': 0.0, 'pct_outlier': 0.0, 'data_shifts': False, 'shift_dates': []}
Total running time of the script: (0 minutes 13.173 seconds)
Note
Go to the end to download the full example code
PV Fleets QA Process: Irradiance#
PV Fleets Irradiance QA Pipeline
The NREL PV Fleets Data Initiative uses PVAnalytics routines to assess the quality of systems’ PV data. In this example, the PV Fleets process for assessing the data quality of an irradiance data stream is shown. This example pipeline illustrates how several PVAnalytics functions can be used in sequence to assess the quality of an irradiance data stream.
import pandas as pd
import pathlib
from matplotlib import pyplot as plt
import pvanalytics
import pvlib
from pvanalytics.quality import data_shifts as ds
from pvanalytics.quality import gaps
from pvanalytics.quality.outliers import zscore
from pvanalytics.features.daytime import power_or_irradiance
from pvanalytics.quality.time import shifts_ruptures
from pvanalytics.features import daytime
First, we import a POA irradiance data stream from a PV installation at NREL. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), under system ID 15. This data is timezone-localized.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'system_15_poa_irradiance.parquet'
time_series = pd.read_parquet(file)
time_series.set_index('measured_on', inplace=True)
time_series.index = pd.to_datetime(time_series.index)
time_series = time_series['poa_irradiance__484']
latitude = 39.7406
longitude = -105.1775
data_freq = '15min'
time_series = time_series.asfreq(data_freq)
First, let’s visualize the original time series as reference.
time_series.plot(title="Original Time Series")
plt.xlabel("Date")
plt.ylabel("Irradiance, W/m^2")
plt.tight_layout()
plt.show()

Now, let’s run basic data checks to identify stale and abnormal/outlier data in the time series. Basic data checks include the following steps:
Flatlined/stale data periods (
pvanalytics.quality.gaps.stale_values_round()
)Negative irradiance data
“Abnormal” data periods, which are defined as days with a daily minimum greater than 50 OR any data greater than 1300
Outliers, which are defined as more than one 4 standard deviations away from the mean (
pvanalytics.quality.outliers.zscore()
)
# REMOVE STALE DATA (that isn't during nighttime periods)
# Day/night mask
daytime_mask = power_or_irradiance(time_series)
# Stale data mask
stale_data_mask = gaps.stale_values_round(time_series,
window=3,
decimals=2)
stale_data_mask = stale_data_mask & daytime_mask
# REMOVE NEGATIVE DATA
negative_mask = (time_series < 0)
# FIND ABNORMAL PERIODS
daily_min = time_series.resample('D').min()
erroneous_mask = (daily_min > 50)
erroneous_mask = erroneous_mask.reindex(index=time_series.index,
method='ffill',
fill_value=False)
# Remove values greater than or equal to 1300
out_of_bounds_mask = (time_series >= 1300)
# FIND OUTLIERS (Z-SCORE FILTER)
zscore_outlier_mask = zscore(time_series,
zmax=4,
nan_policy='omit')
# Get the percentage of data flagged for each issue, so it can later be logged
pct_stale = round((len(time_series[
stale_data_mask].dropna())/len(time_series.dropna())*100), 1)
pct_negative = round((len(time_series[
negative_mask].dropna())/len(time_series.dropna())*100), 1)
pct_erroneous = round((len(time_series[
erroneous_mask].dropna())/len(time_series.dropna())*100), 1)
pct_outlier = round((len(time_series[
zscore_outlier_mask].dropna())/len(time_series.dropna())*100), 1)
# Visualize all of the time series issues (stale, abnormal, outlier, etc)
time_series.plot()
labels = ["Irradiance"]
if any(stale_data_mask):
time_series.loc[stale_data_mask].plot(ls='', marker='o', color="green")
labels.append("Stale")
if any(negative_mask):
time_series.loc[negative_mask].plot(ls='', marker='o', color="orange")
labels.append("Negative")
if any(erroneous_mask):
time_series.loc[erroneous_mask].plot(ls='', marker='o', color="yellow")
labels.append("Abnormal")
if any(out_of_bounds_mask):
time_series.loc[out_of_bounds_mask].plot(ls='', marker='o', color="yellow")
labels.append("Too High")
if any(zscore_outlier_mask):
time_series.loc[zscore_outlier_mask].plot(
ls='', marker='o', color="purple")
labels.append("Outlier")
plt.legend(labels=labels)
plt.title("Time Series Labeled for Basic Issues")
plt.xlabel("Date")
plt.ylabel("Irradiance, W/m^2")
plt.tight_layout()
plt.show()

Now, let’s filter out any of the flagged data from the basic irradiance checks (stale or abnormal data). Then we can re-visualize the data post-filtering.
# Filter the time series, taking out all of the issues
issue_mask = ((~stale_data_mask) & (~negative_mask) & (~erroneous_mask) &
(~out_of_bounds_mask) & (~zscore_outlier_mask))
time_series = time_series[issue_mask]
time_series = time_series.asfreq(data_freq)
# Visualize the time series post-filtering
time_series.plot(title="Time Series Post-Basic Data Filtering")
plt.xlabel("Date")
plt.ylabel("Irradiance, W/m^2")
plt.tight_layout()
plt.show()

We filter the time series based on its daily completeness score. This filtering scheme requires at least 25% of data to be present for each day to be included. We further require at least 10 consecutive days meeting this 25% threshold to be included.
# Visualize daily data completeness
data_completeness_score = gaps.completeness_score(time_series)
# Visualize data completeness score as a time series.
data_completeness_score.plot()
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.axhline(y=0.25, color='r', linestyle='-',
label='Daily Completeness Cutoff')
plt.legend()
plt.tight_layout()
plt.show()
# Trim the series based on daily completeness score
trim_series = pvanalytics.quality.gaps.trim_incomplete(
time_series,
minimum_completeness=.25,
freq=data_freq)
first_valid_date, last_valid_date = \
pvanalytics.quality.gaps.start_stop_dates(trim_series)
time_series = time_series[first_valid_date.tz_convert(time_series.index.tz):
last_valid_date.tz_convert(time_series.index.tz)]
time_series = time_series.asfreq(data_freq)

Next, we check the time series for any time shifts, which may be caused by
time drift or by incorrect time zone assignment. To do this, we compare
the modelled midday time for the particular system location to its
measured midday time. We use
pvanalytics.quality.gaps.stale_values_round()
) to determine the
presence of time shifts in the series.
# Get the modeled sunrise and sunset time series based on the system's
# latitude-longitude coordinates
modeled_sunrise_sunset_df = pvlib.solarposition.sun_rise_set_transit_spa(
time_series.index, latitude, longitude)
# Calculate the midday point between sunrise and sunset for each day
# in the modeled irradiance series
modeled_midday_series = modeled_sunrise_sunset_df['sunrise'] + \
(modeled_sunrise_sunset_df['sunset'] -
modeled_sunrise_sunset_df['sunrise']) / 2
# Run day-night mask on the irradiance time series
daytime_mask = power_or_irradiance(time_series,
freq=data_freq,
low_value_threshold=.005)
# Generate the sunrise, sunset, and halfway points for the data stream
sunrise_series = daytime.get_sunrise(daytime_mask)
sunset_series = daytime.get_sunset(daytime_mask)
midday_series = sunrise_series + ((sunset_series - sunrise_series)/2)
# Convert the midday and modeled midday series to daily values
midday_series_daily, modeled_midday_series_daily = (
midday_series.resample('D').mean(),
modeled_midday_series.resample('D').mean())
# Set midday value series as minutes since midnight, from midday datetime
# values
midday_series_daily = (midday_series_daily.dt.hour * 60 +
midday_series_daily.dt.minute +
midday_series_daily.dt.second / 60)
modeled_midday_series_daily = \
(modeled_midday_series_daily.dt.hour * 60 +
modeled_midday_series_daily.dt.minute +
modeled_midday_series_daily.dt.second / 60)
# Estimate the time shifts by comparing the modelled midday point to the
# measured midday point.
is_shifted, time_shift_series = shifts_ruptures(modeled_midday_series_daily,
midday_series_daily,
period_min=15,
shift_min=15,
zscore_cutoff=1.5)
# Create a midday difference series between modeled and measured midday, to
# visualize time shifts. First, resample each time series to daily frequency,
# and compare the data stream's daily halfway point to the modeled halfway
# point
midday_diff_series = (modeled_midday_series.resample('D').mean() -
midday_series.resample('D').mean()
).dt.total_seconds() / 60
# Generate boolean for detected time shifts
if any(time_shift_series != 0):
time_shifts_detected = True
else:
time_shifts_detected = False
# Build a list of time shifts for re-indexing. We choose to use dicts.
time_shift_series.index = pd.to_datetime(
time_shift_series.index)
changepoints = (time_shift_series != time_shift_series.shift(1))
changepoints = changepoints[changepoints].index
changepoint_amts = pd.Series(time_shift_series.loc[changepoints])
time_shift_list = list()
for idx in range(len(changepoint_amts)):
if idx < (len(changepoint_amts) - 1):
time_shift_list.append({"datetime_start":
str(changepoint_amts.index[idx]),
"datetime_end":
str(changepoint_amts.index[idx + 1]),
"time_shift": changepoint_amts[idx]})
else:
time_shift_list.append({"datetime_start":
str(changepoint_amts.index[idx]),
"datetime_end":
str(time_shift_series.index.max()),
"time_shift": changepoint_amts[idx]})
# Correct any time shifts in the time series
new_index = pd.Series(time_series.index, index=time_series.index)
for i in time_shift_list:
new_index[(time_series.index >= pd.to_datetime(i['datetime_start'])) &
(time_series.index < pd.to_datetime(i['datetime_end']))] = \
time_series.index + pd.Timedelta(minutes=i['time_shift'])
time_series.index = new_index
# Remove duplicated indices and sort the time series (just in case)
time_series = time_series[~time_series.index.duplicated(
keep='first')].sort_index()
# Plot the difference between measured and modeled midday, as well as the
# CPD-estimated time shift series.
midday_diff_series.plot()
time_shift_series.plot()
plt.title("Midday Difference Time Shift Series")
plt.xlabel("Date")
plt.ylabel("Midday Difference (Modeled-Measured), Minutes")
plt.tight_layout()
plt.show()
# Plot the heatmap of the irradiance time series
plt.figure()
# Get time of day from the associated datetime column
time_of_day = pd.Series(time_series.index.hour +
time_series.index.minute/60,
index=time_series.index)
# Pivot the dataframe
dataframe = pd.DataFrame(pd.concat([time_series, time_of_day], axis=1))
dataframe.columns = ["values", 'time_of_day']
dataframe = dataframe.dropna()
dataframe_pivoted = dataframe.pivot_table(index='time_of_day',
columns=dataframe.index.date,
values="values")
plt.pcolormesh(dataframe_pivoted.columns,
dataframe_pivoted.index,
dataframe_pivoted,
shading='auto')
plt.ylabel('Time of day [0-24]')
plt.xlabel('Date')
plt.xticks(rotation=60)
plt.title('Post-Correction Heatmap, Time of Day')
plt.colorbar()
plt.tight_layout()
plt.show()
Next, we check the time series for any abrupt data shifts. We take the
longest continuous part of the time series that is free of data shifts.
We use pvanalytics.quality.data_shifts.detect_data_shifts()
to
detect data shifts in the time series.
# Resample the time series to daily mean
time_series_daily = time_series.resample('D').mean()
data_shift_start_date, data_shift_end_date = \
ds.get_longest_shift_segment_dates(time_series_daily)
data_shift_period_length = (data_shift_end_date - data_shift_start_date).days
# Get the number of shift dates
data_shift_mask = ds.detect_data_shifts(time_series_daily)
# Get the shift dates
shift_dates = list(time_series_daily[data_shift_mask].index)
if len(shift_dates) > 0:
shift_found = True
else:
shift_found = False
# Visualize the time shifts for the daily time series
print("Shift Found:", shift_found)
edges = [time_series_daily.index[0]] + \
shift_dates + [time_series_daily.index[-1]]
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(time_series_daily.loc[st:ed])
plt.title("Daily Time Series Labeled for Data Shifts")
plt.xlabel("Date")
plt.ylabel("Mean Daily Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Shift Found: False
We filter the time series to only include the longest shift-free period.
# Filter the time series to only include the longest shift-free period
time_series = time_series[
(time_series.index >= data_shift_start_date.tz_convert(
time_series.index.tz)) &
(time_series.index <= data_shift_end_date.tz_convert(
time_series.index.tz))]
time_series = time_series.asfreq(data_freq)
Display the final irradiance time series, post-QA filtering.
time_series.plot(title="Final Filtered Time Series")
plt.xlabel("Date")
plt.ylabel("Irradiance (W/m^2)")
plt.tight_layout()
plt.show()

Generate a dictionary output for the QA assessment of this data stream, including the percent stale and erroneous data detected, any shift dates, and any detected time shifts.
qa_check_dict = {"original_time_zone_offset": time_series.index.tz,
"pct_stale": pct_stale,
"pct_negative": pct_negative,
"pct_erroneous": pct_erroneous,
"pct_outlier": pct_outlier,
"time_shifts_detected": time_shifts_detected,
"time_shift_list": time_shift_list,
"data_shifts": shift_found,
"shift_dates": shift_dates}
print("QA Results:")
print(qa_check_dict)
QA Results:
{'original_time_zone_offset': pytz.FixedOffset(-420), 'pct_stale': 0.1, 'pct_negative': 0.0, 'pct_erroneous': 1.3, 'pct_outlier': 1.2, 'time_shifts_detected': False, 'time_shift_list': [{'datetime_start': '2019-03-20 00:00:00-07:00', 'datetime_end': '2023-10-21 00:00:00-07:00', 'time_shift': 0.0}], 'data_shifts': False, 'shift_dates': []}
Total running time of the script: (0 minutes 33.078 seconds)
Note
Go to the end to download the full example code
PV Fleets QA Process: Power#
PV Fleets Power QA Pipeline
The NREL PV Fleets Data Initiative uses PVAnalytics routines to assess the quality of systems’ PV data. In this example, the PV Fleets process for assessing the data quality of an AC power data stream is shown. This example pipeline illustrates how several PVAnalytics functions can be used in sequence to assess the quality of a power or energy data stream.
import pandas as pd
import pathlib
from matplotlib import pyplot as plt
import pvanalytics
from pvanalytics.quality import data_shifts as ds
from pvanalytics.quality import gaps
from pvanalytics.quality.outliers import zscore
from pvanalytics.system import (is_tracking_envelope,
infer_orientation_fit_pvwatts)
from pvanalytics.features.daytime import power_or_irradiance
from pvanalytics.quality.time import shifts_ruptures
from pvanalytics.features import daytime
import pvlib
from pvanalytics.features.clipping import geometric
First, we import an AC power data stream from a PV installation at NREL. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), under system ID 50. This data is timezone-localized.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
file = pvanalytics_dir / 'data' / 'system_50_ac_power_2_full_DST.parquet'
time_series = pd.read_parquet(file)
time_series.set_index('measured_on', inplace=True)
time_series.index = pd.to_datetime(time_series.index)
time_series = time_series['ac_power_2']
latitude = 39.7406
longitude = -105.1775
data_freq = '15min'
time_series = time_series.asfreq(data_freq)
First, let’s visualize the original time series as reference.
time_series.plot(title="Original Time Series")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Now, let’s run basic data checks to identify stale and abnormal/outlier data in the time series. Basic data checks include the following steps:
Flatlined/stale data periods (
pvanalytics.quality.gaps.stale_values_round()
)Negative data
“Abnormal” data periods, which are defined as less than 10% of the daily time series mean
Outliers, which are defined as more than one 4 standard deviations away from the mean (
pvanalytics.quality.outliers.zscore()
)
# REMOVE STALE DATA (that isn't during nighttime periods)
# Day/night mask
daytime_mask = power_or_irradiance(time_series)
# Stale data mask
stale_data_mask = gaps.stale_values_round(time_series,
window=3,
decimals=2)
stale_data_mask = stale_data_mask & daytime_mask
# REMOVE NEGATIVE DATA
negative_mask = (time_series < 0)
# FIND ABNORMAL PERIODS
daily_min = time_series.resample('D').min()
series_min = 0.1 * time_series.mean()
erroneous_mask = (daily_min >= series_min)
erroneous_mask = erroneous_mask.reindex(index=time_series.index,
method='ffill',
fill_value=False)
# FIND OUTLIERS (Z-SCORE FILTER)
zscore_outlier_mask = zscore(time_series, zmax=4,
nan_policy='omit')
# Get the percentage of data flagged for each issue, so it can later be logged
pct_stale = round((len(time_series[
stale_data_mask].dropna())/len(time_series.dropna())*100), 1)
pct_negative = round((len(time_series[
negative_mask].dropna())/len(time_series.dropna())*100), 1)
pct_erroneous = round((len(time_series[
erroneous_mask].dropna())/len(time_series.dropna())*100), 1)
pct_outlier = round((len(time_series[
zscore_outlier_mask].dropna())/len(time_series.dropna())*100), 1)
# Visualize all of the time series issues (stale, abnormal, outlier, etc)
time_series.plot()
labels = ["AC Power"]
if any(stale_data_mask):
time_series.loc[stale_data_mask].plot(ls='', marker='o', color="green")
labels.append("Stale")
if any(negative_mask):
time_series.loc[negative_mask].plot(ls='', marker='o', color="orange")
labels.append("Negative")
if any(erroneous_mask):
time_series.loc[erroneous_mask].plot(ls='', marker='o', color="yellow")
labels.append("Abnormal")
if any(zscore_outlier_mask):
time_series.loc[zscore_outlier_mask].plot(
ls='', marker='o', color="purple")
labels.append("Outlier")
plt.legend(labels=labels)
plt.title("Time Series Labeled for Basic Issues")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Now, let’s filter out any of the flagged data from the basic power checks (stale or abnormal data). Then we can re-visualize the data post-filtering.
# Filter the time series, taking out all of the issues
issue_mask = ((~stale_data_mask) & (~negative_mask) &
(~erroneous_mask) & (~zscore_outlier_mask))
time_series = time_series[issue_mask]
time_series = time_series.asfreq(data_freq)
# Visualize the time series post-filtering
time_series.plot(title="Time Series Post-Basic Data Filtering")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

We filter the time series based on its daily completeness score. This filtering scheme requires at least 25% of data to be present for each day to be included. We further require at least 10 consecutive days meeting this 25% threshold to be included.
# Visualize daily data completeness
data_completeness_score = gaps.completeness_score(time_series)
# Visualize data completeness score as a time series.
data_completeness_score.plot()
plt.xlabel("Date")
plt.ylabel("Daily Completeness Score (Fractional)")
plt.axhline(y=0.25, color='r', linestyle='-',
label='Daily Completeness Cutoff')
plt.legend()
plt.tight_layout()
plt.show()
# Trim the series based on daily completeness score
trim_series = pvanalytics.quality.gaps.trim_incomplete(
time_series, minimum_completeness=.25, freq=data_freq)
first_valid_date, last_valid_date = \
pvanalytics.quality.gaps.start_stop_dates(trim_series)
time_series = time_series[first_valid_date.tz_convert(time_series.index.tz):
last_valid_date.tz_convert(time_series.index.tz)]
time_series = time_series.asfreq(data_freq)

Next, we check the time series for any time shifts, which may be caused by
time drift or by incorrect time zone assignment. To do this, we compare
the modelled midday time for the particular system location to its
measured midday time. We use
pvanalytics.quality.gaps.stale_values_round()
) to determine the
presence of time shifts in the series.
# Plot the heatmap of the AC power time series before time shift correction.
plt.figure()
# Get time of day from the associated datetime column
time_of_day = pd.Series(time_series.index.hour +
time_series.index.minute/60,
index=time_series.index)
# Pivot the dataframe
dataframe = pd.DataFrame(pd.concat([time_series, time_of_day], axis=1))
dataframe.columns = ["values", 'time_of_day']
dataframe = dataframe.dropna()
dataframe_pivoted = dataframe.pivot_table(index='time_of_day',
columns=dataframe.index.date,
values="values")
plt.pcolormesh(dataframe_pivoted.columns,
dataframe_pivoted.index,
dataframe_pivoted,
shading='auto')
plt.ylabel('Time of day [0-24]')
plt.xlabel('Date')
plt.xticks(rotation=60)
plt.title('Pre-Correction Heatmap, Time of Day')
plt.colorbar()
plt.tight_layout()
plt.show()
# Get the modeled sunrise and sunset time series based on the system's
# latitude-longitude coordinates
modeled_sunrise_sunset_df = pvlib.solarposition.sun_rise_set_transit_spa(
time_series.index, latitude, longitude)
# Calculate the midday point between sunrise and sunset for each day
# in the modeled irradiance series
modeled_midday_series = modeled_sunrise_sunset_df['sunrise'] + \
(modeled_sunrise_sunset_df['sunset'] -
modeled_sunrise_sunset_df['sunrise']) / 2
# Run day-night mask on the irradiance time series
daytime_mask = power_or_irradiance(time_series,
freq=data_freq,
low_value_threshold=.005)
# Generate the sunrise, sunset, and halfway points for the data stream
sunrise_series = daytime.get_sunrise(daytime_mask)
sunset_series = daytime.get_sunset(daytime_mask)
midday_series = sunrise_series + ((sunset_series - sunrise_series)/2)
# Convert the midday and modeled midday series to daily values
midday_series_daily, modeled_midday_series_daily = (
midday_series.resample('D').mean(),
modeled_midday_series.resample('D').mean())
# Set midday value series as minutes since midnight, from midday datetime
# values
midday_series_daily = (midday_series_daily.dt.hour * 60 +
midday_series_daily.dt.minute +
midday_series_daily.dt.second / 60)
modeled_midday_series_daily = \
(modeled_midday_series_daily.dt.hour * 60 +
modeled_midday_series_daily.dt.minute +
modeled_midday_series_daily.dt.second / 60)
# Estimate the time shifts by comparing the modelled midday point to the
# measured midday point.
is_shifted, time_shift_series = shifts_ruptures(modeled_midday_series_daily,
midday_series_daily,
period_min=15,
shift_min=15,
zscore_cutoff=1.5)
# Create a midday difference series between modeled and measured midday, to
# visualize time shifts. First, resample each time series to daily frequency,
# and compare the data stream's daily halfway point to the modeled halfway
# point
midday_diff_series = (modeled_midday_series.resample('D').mean() -
midday_series.resample('D').mean()
).dt.total_seconds() / 60
# Generate boolean for detected time shifts
if any(time_shift_series != 0):
time_shifts_detected = True
else:
time_shifts_detected = False
# Build a list of time shifts for re-indexing. We choose to use dicts.
time_shift_series.index = pd.to_datetime(
time_shift_series.index)
changepoints = (time_shift_series != time_shift_series.shift(1))
changepoints = changepoints[changepoints].index
changepoint_amts = pd.Series(time_shift_series.loc[changepoints])
time_shift_list = list()
for idx in range(len(changepoint_amts)):
if idx < (len(changepoint_amts) - 1):
time_shift_list.append({"datetime_start":
str(changepoint_amts.index[idx]),
"datetime_end":
str(changepoint_amts.index[idx + 1]),
"time_shift": changepoint_amts[idx]})
else:
time_shift_list.append({"datetime_start":
str(changepoint_amts.index[idx]),
"datetime_end":
str(time_shift_series.index.max()),
"time_shift": changepoint_amts[idx]})
# Correct any time shifts in the time series
new_index = pd.Series(time_series.index, index=time_series.index)
for i in time_shift_list:
new_index[(time_series.index >= pd.to_datetime(i['datetime_start'])) &
(time_series.index < pd.to_datetime(i['datetime_end']))] = \
time_series.index + pd.Timedelta(minutes=i['time_shift'])
time_series.index = new_index
# Remove duplicated indices and sort the time series (just in case)
time_series = time_series[~time_series.index.duplicated(
keep='first')].sort_index()
# Plot the difference between measured and modeled midday, as well as the
# CPD-estimated time shift series.
plt.figure()
midday_diff_series.plot()
time_shift_series.plot()
plt.title("Midday Difference Time Shift Series")
plt.xlabel("Date")
plt.ylabel("Midday Difference (Modeled-Measured), Minutes")
plt.tight_layout()
plt.show()
# Plot the heatmap of the irradiance time series
plt.figure()
# Get time of day from the associated datetime column
time_of_day = pd.Series(time_series.index.hour +
time_series.index.minute/60,
index=time_series.index)
# Pivot the dataframe
dataframe = pd.DataFrame(pd.concat([time_series, time_of_day], axis=1))
dataframe.columns = ["values", 'time_of_day']
dataframe = dataframe.dropna()
dataframe_pivoted = dataframe.pivot_table(index='time_of_day',
columns=dataframe.index.date,
values="values")
plt.pcolormesh(dataframe_pivoted.columns,
dataframe_pivoted.index,
dataframe_pivoted,
shading='auto')
plt.ylabel('Time of day [0-24]')
plt.xlabel('Date')
plt.xticks(rotation=60)
plt.title('Post-Correction Heatmap, Time of Day')
plt.colorbar()
plt.tight_layout()
plt.show()
Next, we check the time series for any abrupt data shifts. We take the
longest continuous part of the time series that is free of data shifts.
We use pvanalytics.quality.data_shifts.detect_data_shifts()
to
detect data shifts in the time series.
# Resample the time series to daily mean
time_series_daily = time_series.resample('D').mean()
data_shift_start_date, data_shift_end_date = \
ds.get_longest_shift_segment_dates(time_series_daily)
data_shift_period_length = (data_shift_end_date -
data_shift_start_date).days
# Get the number of shift dates
data_shift_mask = ds.detect_data_shifts(time_series_daily)
# Get the shift dates
shift_dates = list(time_series_daily[data_shift_mask].index)
if len(shift_dates) > 0:
shift_found = True
else:
shift_found = False
# Visualize the time shifts for the daily time series
print("Shift Found: ", shift_found)
edges = ([time_series_daily.index[0]] + shift_dates +
[time_series_daily.index[-1]])
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(time_series_daily.loc[st:ed])
plt.title("Daily Time Series Labeled for Data Shifts")
plt.xlabel("Date")
plt.ylabel("Mean Daily AC Power (kW)")
plt.tight_layout()
plt.show()

Shift Found: False
Use logic-based and ML-based clipping functions to identify clipped periods in the time series data, and plot the filtered data.
# REMOVE CLIPPING PERIODS
clipping_mask = geometric(ac_power=time_series,
freq=data_freq)
# Get the pct clipping
clipping_mask.dropna(inplace=True)
pct_clipping = round(100*(len(clipping_mask[
clipping_mask])/len(clipping_mask)), 4)
if pct_clipping >= 0.5:
clipping = True
clip_pwr = time_series[clipping_mask].median()
else:
clipping = False
clip_pwr = None
if clipping:
# Plot the time series with clipping labeled
time_series.plot()
time_series.loc[clipping_mask].plot(ls='', marker='o')
plt.legend(labels=["AC Power", "Clipping"],
title="Clipped")
plt.title("Time Series Labeled for Clipping")
plt.xticks(rotation=20)
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()
plt.close()
else:
print("No clipping detected!!!")
No clipping detected!!!
We filter the time series to only include the longest shift-free period. We then visualize the final time series post-QA filtering.
time_series = time_series[
(time_series.index >=
data_shift_start_date.tz_convert(time_series.index.tz)) &
(time_series.index <=
data_shift_end_date.tz_convert(time_series.index.tz))]
time_series = time_series.asfreq(data_freq)
# Plot the final filtered time series.
time_series.plot(title="Final Filtered Time Series")
plt.xlabel("Date")
plt.ylabel("AC Power (kW)")
plt.tight_layout()
plt.show()

Estimate the azimuth and tilt of the system, based on the power series data. The ground truth azimuth and tilt for this system are 158 and 45 degrees, respectively.
# Import the PSM3 data. This data is pulled via the following function in
# PVLib: :py:func:`pvlib.iotools.get_psm3`
file = pvanalytics_dir / 'data' / 'system_50_ac_power_2_full_DST_psm3.parquet'
psm3 = pd.read_parquet(file)
psm3.set_index('index', inplace=True)
psm3.index = pd.to_datetime(psm3.index)
psm3 = psm3.reindex(pd.date_range(psm3.index[0],
psm3.index[-1],
freq=data_freq)).interpolate()
psm3.index = psm3.index.tz_convert(time_series.index.tz)
psm3 = psm3.reindex(time_series.index)
is_clear = (psm3.ghi_clear == psm3.ghi)
is_daytime = (psm3.ghi > 0)
# Trim based on clearsky and daytime values
time_series_clearsky = time_series.reindex(is_daytime.index)[
(is_clear) & (is_daytime)].dropna()
# Get final PSM3 data
psm3_clearsky = psm3.loc[time_series_clearsky.index]
solpos_clearsky = pvlib.solarposition.get_solarposition(
time_series_clearsky.index, latitude, longitude)
# Estimate the azimuth and tilt using PVWatts-based method
predicted_tilt, predicted_azimuth, r2 = infer_orientation_fit_pvwatts(
time_series_clearsky,
psm3_clearsky.ghi_clear,
psm3_clearsky.dhi_clear,
psm3_clearsky.dni_clear,
solpos_clearsky.zenith,
solpos_clearsky.azimuth,
temperature=psm3_clearsky.temp_air,
azimuth_min=90,
azimuth_max=275)
print("Predicted azimuth: " + str(predicted_azimuth))
print("Predicted tilt: " + str(predicted_tilt))
Predicted azimuth: 161.27396222501486
Predicted tilt: 42.20903060025887
Look at the daily power profile for summer and winter months, and identify if the data stream is associated with a fixed-tilt or single-axis tracking system.
# CHECK MOUNTING CONFIGURATION
daytime_mask = power_or_irradiance(time_series)
predicted_mounting_config = is_tracking_envelope(
time_series,
daytime_mask,
clipping_mask.reindex(index=time_series.index))
print("Predicted Mounting configuration:")
print(predicted_mounting_config.name)
Predicted Mounting configuration:
FIXED
Generate a dictionary output for the QA assessment of this data stream, including the percent stale and erroneous data detected, any shift dates, time shift dates, clipping information, and estimated mounting configuration.
qa_check_dict = {"original_time_zone_offset": time_series.index.tz,
"pct_stale": pct_stale,
"pct_negative": pct_negative,
"pct_erroneous": pct_erroneous,
"pct_outlier": pct_outlier,
"time_shifts_detected": time_shifts_detected,
"time_shift_list": time_shift_list,
"data_shifts": shift_found,
"shift_dates": shift_dates,
"clipping": clipping,
"clipping_threshold": clip_pwr,
"pct_clipping": pct_clipping,
"mounting_config": predicted_mounting_config.name,
"predicted_azimuth": predicted_azimuth,
"predicted_tilt": predicted_tilt}
print("QA Results:")
print(qa_check_dict)
QA Results:
{'original_time_zone_offset': pytz.FixedOffset(-420), 'pct_stale': 0.7, 'pct_negative': 0.0, 'pct_erroneous': 0.0, 'pct_outlier': 0.0, 'time_shifts_detected': True, 'time_shift_list': [{'datetime_start': '2011-04-15 00:00:00-07:00', 'datetime_end': '2011-11-06 00:00:00-07:00', 'time_shift': -60.0}, {'datetime_start': '2011-11-06 00:00:00-07:00', 'datetime_end': '2012-03-11 00:00:00-07:00', 'time_shift': 0.0}, {'datetime_start': '2012-03-11 00:00:00-07:00', 'datetime_end': '2012-11-04 00:00:00-07:00', 'time_shift': -60.0}, {'datetime_start': '2012-11-04 00:00:00-07:00', 'datetime_end': '2013-03-10 00:00:00-07:00', 'time_shift': 0.0}, {'datetime_start': '2013-03-10 00:00:00-07:00', 'datetime_end': '2013-11-03 00:00:00-07:00', 'time_shift': -60.0}, {'datetime_start': '2013-11-03 00:00:00-07:00', 'datetime_end': '2013-12-17 00:00:00-07:00', 'time_shift': 0.0}], 'data_shifts': False, 'shift_dates': [], 'clipping': False, 'clipping_threshold': None, 'pct_clipping': 0.0842, 'mounting_config': 'FIXED', 'predicted_azimuth': 161.27396222501486, 'predicted_tilt': 42.20903060025887}
Total running time of the script: (0 minutes 23.546 seconds)
Data/Time Shifts#
This includes examples for identifying data/capacity/time shifts in time series data.
Note
Go to the end to download the full example code
Data Shift Detection & Filtering#
Identifying data shifts/capacity changes in time series data
This example covers identifying data shifts/capacity changes in a time series
and extracting the longest time series segment free of these shifts, using
pvanalytics.quality.data_shifts.detect_data_shifts()
and
pvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
.
import pvanalytics
import pandas as pd
import matplotlib.pyplot as plt
from pvanalytics.quality import data_shifts as ds
import pathlib
As an example, we load in a simulated pvlib AC power time series with a single changepoint, occurring on October 28, 2015.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
data_shift_file = pvanalytics_dir / 'data' / 'pvlib_data_shift.csv'
df = pd.read_csv(data_shift_file)
df.index = pd.to_datetime(df['timestamp'])
df['value'].plot()
print("Changepoint at: " + str(df[df['label'] == 1].index[0]))

Changepoint at: 2015-10-28 00:00:00
Now we run the data shift algorithm (with default parameters)
on the data stream, using
pvanalytics.quality.data_shifts.detect_data_shifts()
. We plot the
predicted time series segments, based on algorithm results.
shift_mask = ds.detect_data_shifts(df['value'])
shift_list = list(df[shift_mask].index)
edges = [df.index[0]] + shift_list + [df.index[-1]]
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(df.loc[st:ed, "value"])
plt.show()
# We zoom in around the changepoint to more closely show the data shift. Time
# series segments pre- and post-shift are color-coded.
edges = [pd.to_datetime("10-15-2015")] + shift_list + \
[pd.to_datetime("11-15-2015")]
fig, ax = plt.subplots()
for (st, ed) in zip(edges[:-1], edges[1:]):
ax.plot(df.loc[st:ed, "value"])
plt.xticks(rotation=45)
plt.show()
We filter the time series by the detected changepoints, taking the longest
continuous segment free of data shifts, using
pvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
.
The trimmed time series is then plotted.
start_date, end_date = ds.get_longest_shift_segment_dates(df['value'])
df['value'][start_date:end_date].plot()
plt.show()

Total running time of the script: (0 minutes 1.062 seconds)
Note
Go to the end to download the full example code
Identifying and estimating time shifts#
Identifying time shifts from clock errors or uncorrected Daylight Saving Time.
Time shifts can occur in measured data due to clock errors and time zone issues (for example, assuming a dataset is in local standard time when in fact it contains Daylight Saving Time).
This example uses shifts_ruptures()
to identify abrupt time shifts in a time series, and estimate the
corresponding time shift amount.
import pvlib
import pandas as pd
from pvanalytics.quality.time import shifts_ruptures
from pvanalytics.features.daytime import (power_or_irradiance,
get_sunrise, get_sunset)
import matplotlib.pyplot as plt
Typically this process would be applied to measured data with possibly untrustworthy timestamps. However, for instructional purposes here, we’ll create an artificial example dataset that contains a time shift due to DST.
# use a time zone (US/Eastern) that is affected by DST.
# Etc/GMT+5 is the corresponding local standard time zone.
times = pd.date_range('2019-01-01', '2019-12-31', freq='5T', tz='US/Eastern')
location = pvlib.location.Location(40, -80)
cs = location.get_clearsky(times)
measured_signal = cs['ghi']
The shifts_ruptures()
function
is centered around comparing the timing of events observed in the measured
data with expected timings for those same events.
In this case, we’ll use the timing of solar noon as the event.
First, we’ll extract the timing of solar noon from the measured data.
This could be done in several ways; here we will just take the midpoint
between sunrise and sunset using times estimated with
power_or_irradiance()
.
is_daytime = power_or_irradiance(measured_signal)
sunrise_timestamps = get_sunrise(is_daytime)
sunrise_timestamps = sunrise_timestamps.resample('d').first().dropna()
sunset_timestamps = get_sunset(is_daytime)
sunset_timestamps = sunset_timestamps.resample('d').first().dropna()
def ts_to_minutes(ts):
# convert timestamps to minutes since midnight
return ts.dt.hour * 60 + ts.dt.minute + ts.dt.second / 60
midday_minutes = (
ts_to_minutes(sunrise_timestamps) + ts_to_minutes(sunset_timestamps)
) / 2
Now, calculate the expected timing of solar noon at this location for each day. Note that we use a time zone without DST for calculating the expected timings; this means that if the “measured” data does include DST in its timestamps, it will be flagged as a time shift.
dates = midday_minutes.index.tz_localize(None).tz_localize('Etc/GMT+5')
sp = location.get_sun_rise_set_transit(dates, method='spa')
transit_minutes = ts_to_minutes(sp['transit'])
Finally, ask ruptures if it sees any change points in the difference between these two daily event timings, and visualize the result:
is_shifted, shift_amount = shifts_ruptures(midday_minutes, transit_minutes)
fig, axes = plt.subplots(2, 1, sharex=True)
midday_minutes.plot(ax=axes[0], label='"measured" midday')
transit_minutes.plot(ax=axes[0], label='expected midday')
axes[0].set_ylabel('Minutes since midnight')
axes[0].legend()
shift_amount.plot(ax=axes[1])
axes[1].set_ylabel('Estimated shift [minutes]')

Text(47.097222222222214, 0.5, 'Estimated shift [minutes]')
Total running time of the script: (0 minutes 2.127 seconds)
System#
This includes examples for system parameter estimation, including azimuth and tilt estimation, and determination if the system is fixed tilt or tracking.
Note
Go to the end to download the full example code
Detect if a System is Tracking#
Identifying if a system is tracking or fixed tilt
It is valuable to identify if a system is fixed tilt or tracking for
future analysis. This example shows how to use
pvanalytics.system.is_tracking_envelope()
to determine if a
system is tracking or not by fitting data to a maximum power or
irradiance envelope, and fitting this envelope to quadratic and
quartic curves. The r^2 output from these fits is used to determine
if the system fits a tracking or fixed-tilt profile.
import pvanalytics
from pvanalytics.system import is_tracking_envelope
from pvanalytics.features.clipping import geometric
from pvanalytics.features.daytime import power_or_irradiance
import pandas as pd
import pathlib
import matplotlib.pyplot as plt
First, we import an AC power data stream from the SERF East site located at NREL. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), under system ID 50. This data is timezone-localized. This particular data stream is associated with a fixed-tilt system.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / \
'serf_east_15min_ac_power.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
data = data.sort_index()
time_series = data['ac_power']
time_series = time_series.asfreq('15min')
# Plot the first few days of the time series to visualize it
time_series[:pd.to_datetime("2016-07-06 00:00:00-07:00")].plot()
plt.show()

Run the clipping and the daytime filters on the time series.
Both of these masks will be used as inputs to the
pvanalytics.system.is_tracking_envelope()
function.
# Generate the daylight mask for the AC power time series
daytime_mask = power_or_irradiance(time_series)
# Generate the clipping mask for the time series
clipping_mask = geometric(time_series)
Now, we use pvanalytics.system.is_tracking_envelope()
to
identify if the data stream is associated with a tracking or fixed-tilt
system.
predicted_mounting_config = is_tracking_envelope(time_series,
daytime_mask,
clipping_mask)
print("Estimated mounting configuration: " + predicted_mounting_config.name)
Estimated mounting configuration: FIXED
Total running time of the script: (0 minutes 0.546 seconds)
Note
Go to the end to download the full example code
Infer Array Tilt/Azimuth - PVWatts Method#
Infer the azimuth and tilt of a system using PVWatts-based methods
Identifing and/or validating the azimuth and tilt information for a
system is important, as these values must be correct for degradation
and system yield analysis. This example shows how to use
pvanalytics.system.infer_orientation_fit_pvwatts()
to estimate
a fixed-tilt system’s azimuth and tilt, using the system’s known
latitude-longitude coordinates and an associated AC power time series.
import pvanalytics
import matplotlib.pyplot as plt
from pvanalytics import system
import pandas as pd
import pathlib
import pvlib
First, we import an AC power data stream from the SERF East site located at NREL. This data set is publicly available via the PVDAQ database in the DOE Open Energy Data Initiative (OEDI) (https://data.openei.org/submissions/4568), under system ID 50. This data is timezone-localized.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
ac_power_file = pvanalytics_dir / 'data' / 'serf_east_15min_ac_power.csv'
data = pd.read_csv(ac_power_file, index_col=0, parse_dates=True)
data = data.sort_index()
time_series = data['ac_power']
time_series = time_series.asfreq('15min')
# Plot the first few days of the time series to visualize it
time_series[:pd.to_datetime("2016-07-06 00:00:00-07:00")].plot()
plt.show()
# Outline the ground truth metadata associated with the system
latitude = 39.742
longitude = -105.1727
actual_azimuth = 158
actual_tilt = 45

Next, we import the PSM3 data generated via the
pvlib.iotools.get_psm3()
function, using
site latitude-longitude coordinates. To generate the
PSM3 data, you must first register for NREL’s NSDRB API at the
following link: https://developer.nrel.gov/signup/.
PSM3 data can then be retrieved using pvlib.iotools.get_psm3()
.
The PSM3 data has been resampled to 15 minute intervals, to match the AC
power data.
psm3_file = pvanalytics_dir / 'data' / 'serf_east_psm3_data.csv'
psm3 = pd.read_csv(psm3_file, index_col=0, parse_dates=True)
Filter the PSM3 data to only include clearsky periods
is_clear = (psm3.ghi_clear == psm3.ghi)
is_daytime = (psm3.ghi > 0)
time_series_clearsky = time_series[is_clear & is_daytime]
time_series_clearsky = time_series_clearsky.dropna()
psm3_clearsky = psm3.loc[time_series_clearsky.index]
# Get solar azimuth and zenith from pvlib, based on
# lat-long coords
solpos_clearsky = pvlib.solarposition.get_solarposition(
time_series_clearsky.index, latitude, longitude)
Run the pvlib data and the sensor-based time series data through
the pvanalytics.system.infer_orientation_fit_pvwatts()
function.
best_tilt, best_azimuth, r2 = system.infer_orientation_fit_pvwatts(
time_series_clearsky,
psm3_clearsky.ghi_clear,
psm3_clearsky.dhi_clear,
psm3_clearsky.dni_clear,
solpos_clearsky.zenith,
solpos_clearsky.azimuth,
temperature=psm3_clearsky.temp_air,
)
# Compare actual system azimuth and tilt to predicted azimuth and tilt
print("Actual Azimuth: " + str(actual_azimuth))
print("Predicted Azimuth: " + str(best_azimuth))
print("Actual Tilt: " + str(actual_tilt))
print("Predicted Tilt: " + str(best_tilt))
Actual Azimuth: 158
Predicted Azimuth: 162.01767819075172
Actual Tilt: 45
Predicted Tilt: 42.14660650908418
Total running time of the script: (0 minutes 2.344 seconds)
Weather#
This includes examples for weather quality checks.
Note
Go to the end to download the full example code
Weather Limits#
Identifying weather values that are within limits.
Identifying weather values that are within logical, expected limits
and filtering data outside of these limits allows for more accurate
future data analysis.
In this example, we demonstrate how to use
pvanalytics.quality.weather.wind_limits()
,
pvanalytics.quality.weather.temperature_limits()
,
and pvanalytics.quality.weather.relative_humidity_limits()
to identify and filter out values that are not within expected limits,
for wind speed, ambient temperature, and relative humidity, respectively.
import pvanalytics
from pvanalytics.quality.weather import wind_limits, \
temperature_limits, relative_humidity_limits
import matplotlib.pyplot as plt
import pandas as pd
import pathlib
First, we read in the NREL RMIS weather station example, which contains wind speed, temperature, and relative humidity data in m/s, deg C, and % respectively. This data set contains 5-minute right-aligned measurements.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
rmis_file = pvanalytics_dir / 'data' / 'rmis_weather_data.csv'
data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
print(data.head(10))
Ambient Temperature ... Wind Speed
2022-01-01 00:05:00 -10.59725 ... 1.930175
2022-01-01 00:10:00 -10.63128 ... 2.167881
2022-01-01 00:15:00 -10.66532 ... 0.827218
2022-01-01 00:20:00 -10.71636 ... 0.608528
2022-01-01 00:25:00 -10.66532 ... 1.036399
2022-01-01 00:30:00 -10.81846 ... 1.150498
2022-01-01 00:35:00 -10.78443 ... 1.958699
2022-01-01 00:40:00 -10.86950 ... 0.941317
2022-01-01 00:45:00 -10.88652 ... 1.511812
2022-01-01 00:50:00 -10.85249 ... 0.047541
[10 rows x 12 columns]
First, we use pvanalytics.quality.weather.wind_limits()
to identify any wind speed values that are not within an
acceptable range. We can then filter any of these values out of the
time series.
wind_limit_mask = wind_limits(data['Wind Speed'])
data['Wind Speed'].plot()
data.loc[~wind_limit_mask, 'Wind Speed'].plot(ls='', marker='o')
plt.legend(labels=["Wind Speed", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Wind Speed (m/s)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Next, we use pvanalytics.quality.weather.temperature_limits()
to identify any air temperature values that are not within an
acceptable range. We can then filter any of these values out of the time
series. Here, we set the temperature limits to (-10,10), illustrating how
to use the limits parameter.
temperature_limit_mask = temperature_limits(data['Ambient Temperature'],
limits=(-10, 10))
data['Ambient Temperature'].plot()
data.loc[~temperature_limit_mask, 'Ambient Temperature'].plot(ls='',
marker='o')
plt.legend(labels=["Ambient Temperature", "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel("Ambient Temperature (deg C)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Finally, we use
pvanalytics.quality.weather.relative_humidity_limits()
to identify any RH values that are not within an
acceptable range. We can then filter any of these values out of the time
series.
rh_limit_mask = relative_humidity_limits(data['Relative Humidity'])
data['Relative Humidity'].plot()
data.loc[~rh_limit_mask, 'Relative Humidity'].plot(ls='', marker='o')
plt.legend(labels=['Relative Humidity', "Detected Outlier"])
plt.xlabel("Date")
plt.ylabel('Relative Humidity (%)')
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Total running time of the script: (0 minutes 0.748 seconds)
Note
Go to the end to download the full example code
Module Temperature Check#
Test whether the module temperature is correlated with irradiance.
Testing the correlation between module temperature and irradiance
measurements can help identify if there are issues with the module
temperature sensor.
In this example, we demonstrate how to use
pvanalytics.quality.weather.module_temperature_check()
, which
runs a linear regression model of module temperature vs. irradiance. Model
performance is then assessed by the Pearson correlation coefficient.
If it meets a minimum threshold, function outputs a True boolean.
If not, it outputs a False boolean.
import pvanalytics
from pvanalytics.quality.weather import module_temperature_check
from pvanalytics.features.daytime import power_or_irradiance
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import linregress
import pathlib
First, we read in example data from the NREL SERF West system, which contains data for module temperature and irradiance under the ‘module_temp_1__781’ and ‘poa_irradiance__771’ columns, respectively. This data set contains 15-minute averaged measurements, and is available via the NREL PVDAQ database as system 51.
pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
serf_east_file = pvanalytics_dir / 'data' / 'serf_west_15min.csv'
data = pd.read_csv(serf_east_file, index_col=0, parse_dates=True)
print(data[['module_temp_1__781', 'poa_irradiance__771']].head(10))
module_temp_1__781 poa_irradiance__771
2022-01-02 00:01:00 -6.4187 -1.9775
2022-01-02 00:16:00 -6.2204 -2.0451
2022-01-02 00:31:00 -6.1505 -2.1464
2022-01-02 00:46:00 -5.9059 -2.1463
2022-01-02 01:01:00 -5.7022 -1.8083
2022-01-02 01:16:00 -5.4755 -1.9941
2022-01-02 01:31:00 -5.3714 -2.0109
2022-01-02 01:46:00 -4.4072 -1.7236
2022-01-02 02:01:00 -4.1609 -1.6221
2022-01-02 02:16:00 -3.9174 -1.6390
Plot the module temperature to visualize it.
data['module_temp_1__781'].plot()
plt.xlabel("Date")
plt.ylabel("Module Temperature (deg C)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

Plot the POA irradiance to visualize it.
data['poa_irradiance__771'].plot()
plt.xlabel("Date")
plt.ylabel("POA irradiance (W/m^2)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()

We mask the irradiance time series into day-night periods, and remove any nighttime data to clean up the future regression.
predicted_day_night_mask = power_or_irradiance(
series=data['poa_irradiance__771'], freq='15min')
# Filter out nighttime periods
data = data[predicted_day_night_mask]
We then use pvanalytics.quality.weather.module_temperature_check()
to regress module temperature against irradiance POA, and check if the
relationship meets the minimum correlation coefficient criteria.
corr_coeff_bool = module_temperature_check(data['module_temp_1__781'],
data['poa_irradiance__771'])
print("Passes correlation coeff threshold? " + str(corr_coeff_bool))
Passes correlation coeff threshold? True
We then plot module temperature against irradiance to illustrate the relationship.
data.plot(x='module_temp_1__781',
y='poa_irradiance__771',
style='o', legend=None)
data_reg = data[['module_temp_1__781', 'poa_irradiance__771']].dropna()
# Add the linear regression line
reg = linregress(data_reg['module_temp_1__781'].values,
data_reg['poa_irradiance__771'].values)
plt.axline(xy1=(0, reg.intercept), slope=reg.slope, linestyle="--", color="k")
plt.xlabel("Module Temperature (deg C)")
plt.ylabel("POA irradiance (W/m^2)")
plt.xticks(rotation=25)
plt.tight_layout()
plt.show()
# Print the Pearson correlation coefficient associated with the regression.
print("Pearson Correlation Coefficient: ")
print(reg.rvalue)

Pearson Correlation Coefficient:
0.6836377995026489
Total running time of the script: (0 minutes 0.540 seconds)
Release Notes#
These are the bug-fixes, new features, and improvements for each release.
0.2.0 (February 14, 2024)#
Breaking Changes#
Updated function
infer_orientation_fit_pvwatts()
to more closely align with the PVWatts v5 methodology. This includes incorporating relative airmass and extraterrestrial irradiance into the Perez total irradiance model, accounting for array incidence loss (IAM), and including losses in the PVWatts inverter model. Additionally, added optional arguments for bounding the azimuth range in during least squares optimization. (GH147, GH180)Updated function
shifts_ruptures()
to align with the methodology tested and reported on at PVRW 2023 (“Survey of Time Shift Detection Algorithms for Measured PV Data”). This includes converting the changepoint detection algorithm from Pelt to Binary Segmentation (which runs much faster), and performing additional processing to each detected segment to remove outliers and filter by a quantile cutoff instead of the original rounding technique. (GH197)
Enhancements#
Added function
get_sunrise()
for calculating the daily sunrise datetimes for a time series, based on thepower_or_irradiance()
day/night mask output. (GH187)Added function
get_sunset()
for calculating the daily sunset datetimes for a time series, based on thepower_or_irradiance()
day/night mask output. (GH187)Updated function
power_or_irradiance()
to be more performant by vectorization; the original logic was using a lambda call that was slowing the function speed down considerably. This update resulted in a ~50X speedup. (GH186)
Bug Fixes#
pvanalytics.__version__
now correctly reports the version string instead of raisingAttributeError
. (GH181)Compatibility with pandas 2.0.0 (GH185) and future versions of pandas (GH203)
Compatibility with scipy 1.11 (GH196)
Updated function
trim()
to handle pandas 2.0.0 update for tz-aware timeseries (GH206)
Requirements#
Documentation#
Testing#
Contributors#
Kirsten Perry (@kperrynrel)
Kevin Anderson (@kanderso-nrel)
Cliff Hansen (@cwhanse)
Abhishek Parikh (@abhisheksparikh)
Quyen Nguyen (@qnguyen345)
Adam R. Jensen (@adamrjensen)
Chris Deline (@cdeline)
0.1.3 (December 16, 2022)#
Enhancements#
Added function
calculate_component_sum_series()
for calculating the component sum values of GHI, DHI, and DNI, and performing nighttime corrections (GH157, GH163)Updated the
stale_values_round()
function with pandas functionality, leading to the same results with a 300X speedup. (GH156, GH158)
Documentation#
Added new gallery example pages:
Clarified parameter descriptions for
pdc0
andpac
inperformance_ratio_nrel()
(GH152, GH162).Restructured the example gallery by separating the examples into categories and adding README’s (GH154, GH155).
Contributors#
Kirsten Perry (@kperrynrel)
Cliff Hansen (@cwhanse)
Josh Peterson (@PetersonUOregon)
Adam R. Jensen (@adamrjensen)
Will Holmgren (@wholmgren)
Kevin Anderson (@kanderso-nrel)
0.1.2 (August 18, 2022)#
Enhancements#
Detect data shifts in daily summed time series with
pvanalytics.quality.data_shifts.detect_data_shifts()
andpvanalytics.quality.data_shifts.get_longest_shift_segment_dates()
. (GH142)
Bug Fixes#
Fix
pvanalytics.quality.outliers.zscore()
so that the NaN mask is assigned the time series index (GH138)
Documentation#
Added fifteen new gallery example pages:
pvanalytics.quality.data_shifts
(GH131):
Other#
Removed empty modules
pvanalytics.filtering
andpvanalytics.fitting
until the relevant functionality is added to the package. (GH145)
Contributors#
Kirsten Perry (@kperrynrel)
Cliff Hansen (@cwhanse)
Kevin Anderson (@kanderso-nrel)
Will Vining (@wfvining)
0.1.1 (February 18, 2022)#
Enhancements#
Quantification of irradiance variability with
pvanalytics.metrics.variability_index()
. (GH60, GH106)Internal refactor of
pvanalytics.metrics.performance_ratio_nrel()
to support other performance ratio formulas. (GH109)Detect shadows from fixed objects in GHI data using
pvanalytics.features.shading.fixed()
. (GH24, GH101)
Bug Fixes#
Added
nan_policy
parameter to zscore calculation inpvanalytics.quality.outliers.zscore()
. (GH102, GH108)Prohibit pandas versions in the 1.1.x series to avoid an issue in
.groupby().rolling()
. Newer versions starting in 1.2.0 and older versions going back to 0.24.0 are still allowed. (GH82, GH118)Fixed an issue with
pvanalytics.features.clearsky.reno()
in recent pandas versions (GH125, GH128)Improved convergence in
pvanalytics.features.orientation.fixed_nrel()
(GH119, GH120)
Requirements#
Drop support for python 3.6, which reached end of life Dec 2021 (GH129)
Documentation#
Started an example gallery and added an example for
pvanalytics.features.clearsky.reno()
(GH125, GH127)
Contributors#
Kevin Anderson (@kanderso-nrel)
Cliff Hansen (@cwhanse)
Will Vining (@wfvining)
Kirsten Perry (@kperrynrel)
Michael Hopwood (@MichaelHopwood)
Carlos Silva (@camsilva)
Ben Taylor (@bt-)
0.1.0 (November 20, 2020)#
This is the first release of PVAnalytics. As such, the list of “changes” below is not specific. Future releases will describe specific changes here along with references to the relevant github issue and pull requests.
API Changes#
Enhancements#
Quality control functions for irradiance, weather and time series data. See
pvanalytics.quality
for content.Feature labeling functions for clipping, clearsky, daytime, and orientation. See
pvanalytics.features
for content.System parameter inference for tilt, azimuth, and whether the system is tracking or fixed. See
pvanalytics.system
for content.NREL performance ratio metric (
pvanalytics.metrics.performance_ratio_nrel()
).
Bug Fixes#
Contributors#
Special thanks to Matt Muller and Kirsten Perry of NREL for their assistance in adapting components from the PVFleets QA project to PVAnalytics.