PVAnalytics#

PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data. It can be used as a standalone analysis package and as a data cleaning “front end” for other PV analysis packages.

PVAnalytics is free and open source under a permissive license. The source code for PVAnalytics is hosted on github.

Library Overview#

The functions provided by PVAnalytics are organized in submodules based on their anticipated use. The list below provides a general overview; however, not all modules have functions at this time, see the API reference for current library status.

  • quality contains submodules for different kinds of data quality checks.

    • quality.data_shifts contains quality checks for detecting and isolating data shifts in PV time series data.

    • quality.irradiance contains quality checks for irradiance measurements.

    • quality.weather contains quality checks for weather data (e.g. tests for physically plausible values of temperature, wind speed, humidity).

    • quality.outliers contains functions for identifying outliers.

    • quality.gaps contains functions for identifying gaps in the data (i.e. missing values, stuck values, and interpolation).

    • quality.time quality checks related to time (e.g. timestamp spacing, time shifts).

    • quality.util general purpose quality functions (e.g. simple range checks).

  • features contains submodules with different methods for identifying and labeling salient features.

    • features.clipping functions for labeling inverter clipping.

    • features.clearsky functions for identifying periods of clear sky conditions.

    • features.daytime functions for identifying periods of day and night.

    • features.orientation functions for identifying orientation-related features in the data (e.g. days where the data looks like there is a functioning tracker). These functions are distinct from the functions in the system module in that we are identifying features of data rather than properties of the system that produced the data.

    • features.shading functions for identifying shadows.

  • system identification of PV system characteristics from data (e.g. nameplate power, tilt, azimuth)

  • metrics contains functions for computing PV system-level metrics (e.g. performance ratio)

Dependencies#

This project follows the guidelines laid out in NEP-29. It supports:

  • All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.

  • All minor versions of numpy released in the 24 months prior to the project, and at minimum the last three minor versions

  • The latest release of pvlib.

Additionally, PVAnalytics relies on several other packages in the open source scientific python ecosystem. For details on dependencies and versions, see our setup.py.

API Reference#

Quality#

Data Shifts#

Functions for identifying shifts in data values in time series and for identifying periods with data shifts. For functions that identify shifts in time, see quality.time

quality.data_shifts.detect_data_shifts(series)

Detect data shifts in a time series of daily values.

quality.data_shifts.get_longest_shift_segment_dates(series)

Return the start and end dates of the longest serially complete time series segment.

Irradiance#

The check_*_limits_qcrad functions use the QCRad algorithm 1 to identify irradiance measurements that are beyond physical limits.

quality.irradiance.check_ghi_limits_qcrad(...)

Test for physical limits on GHI using the QCRad criteria.

quality.irradiance.check_dhi_limits_qcrad(...)

Test for physical limits on DHI using the QCRad criteria.

quality.irradiance.check_dni_limits_qcrad(...)

Test for physical limits on DNI using the QCRad criteria.

All three checks can be combined into a single function call.

quality.irradiance.check_irradiance_limits_qcrad(...)

Test for physical limits on GHI, DHI or DNI using the QCRad criteria.

Irradiance measurements can also be checked for consistency.

quality.irradiance.check_irradiance_consistency_qcrad(...)

Check consistency of GHI, DHI and DNI using QCRad criteria.

GHI and POA irradiance can be validated against clearsky values to eliminate data that is unrealistically high.

quality.irradiance.clearsky_limits(measured, ...)

Identify irradiance values which do not exceed clearsky values.

You may want to identify entire days that have unrealistically high or low insolation. The following function examines daily insolation, validating that it is within a reasonable range of the expected clearsky insolation for the same day.

quality.irradiance.daily_insolation_limits(...)

Check that daily insolation lies between minimum and maximum values.

There is function for calculating the component sum for GHI, DHI, and DNI, and correcting for nighttime periods. Using this function, we can estimate one irradiance field using the two other irradiance fields. This can be useful for comparison, as well as to calculate missing data fields.

quality.irradiance.calculate_component_sum_series(...)

Use the component sum equations to calculate the missing series, using the other available time series.

Gaps#

Identify gaps in the data.

quality.gaps.interpolation_diff(x[, window, ...])

Identify sequences which appear to be linear.

Data sometimes contains sequences of values that are “stale” or “stuck.” These are contiguous spans of data where the value does not change within the precision given. The functions below can be used to detect stale values.

Note

If the data has been altered in some way (i.e. temperature that has been rounded to an integer value) before being passed to these functions you may see unexpectedly large amounts of stale data.

quality.gaps.stale_values_diff(x[, window, ...])

Identify stale values in the data.

quality.gaps.stale_values_round(x[, window, ...])

Identify stale values by rounding.

The following functions identify days with incomplete data.

quality.gaps.completeness_score(series[, ...])

Calculate a data completeness score for each day.

quality.gaps.complete(series[, ...])

Select data points that are part of days with complete data.

Many data sets may have leading and trailing periods of days with sporadic or no data. The following functions can be used to remove those periods.

quality.gaps.start_stop_dates(series[, days])

Get the start and end of data excluding leading and trailing gaps.

quality.gaps.trim(series[, days])

Mask the beginning and end of the data if not all True.

quality.gaps.trim_incomplete(series[, ...])

Trim the series based on the completeness score.

Outliers#

Functions for detecting outliers.

quality.outliers.tukey(data[, k])

Identify outliers based on the interquartile range.

quality.outliers.zscore(data[, zmax, nan_policy])

Identify outliers using the z-score.

quality.outliers.hampel(data[, window, ...])

Identify outliers by the Hampel identifier.

Time#

Quality control related to time. This includes things like time-stamp spacing, time-shifts, and time zone validation.

quality.time.spacing(times, freq)

Check that the spacing between times conforms to freq.

Timestamp shifts, such as daylight savings, can be identified with the following functions.

quality.time.shifts_ruptures(event_times, ...)

Identify time shifts using the ruptures library.

quality.time.has_dst(events, tz[, window, ...])

Return True if events appears to have daylight savings shifts at the dates on which tz transitions to or from daylight savings time.

Utilities#

The quality.util module contains general-purpose/utility functions for building your own quality checks.

quality.util.check_limits(val[, ...])

Check whether a value falls withing the given limits.

quality.util.daily_min(series, minimum[, ...])

Return True for data on days when the day's minimum exceeds minimum.

Weather#

Quality checks for weather data.

quality.weather.relative_humidity_limits(...)

Identify relative humidity values that are within limits.

quality.weather.temperature_limits(...[, limits])

Identify temperature values that are within limits.

quality.weather.wind_limits(wind_speed[, limits])

Identify wind speed values that are within limits.

In addition to validating temperature by comparing with limits, module temperature should be positively correlated with irradiance. Poor correlation could indicate that the sensor has become detached from the module, for example. Unlike other functions in the quality module which return Boolean masks over the input series, this function returns a single Boolean value indicating whether the entire series has passed (True) or failed (False) the quality check.

quality.weather.module_temperature_check(...)

Test whether the module temperature is correlated with irradiance.

References

1

C. N. Long and Y. Shi, An Automated Quality Assessment and Control Algorithm for Surface Radiation Measurements, The Open Atmospheric Science Journal 2, pp. 23-37, 2008.

Features#

Functions for detecting features in the data.

Clipping#

Functions for identifying inverter clipping

features.clipping.levels(ac_power[, window, ...])

Label clipping in AC power data based on levels in the data.

features.clipping.threshold(ac_power[, ...])

Detect clipping based on a maximum power threshold.

features.clipping.geometric(ac_power[, ...])

Identify clipping based on a the shape of the ac_power curve on each day.

Clearsky#

features.clearsky.reno(ghi, ghi_clearsky)

Identify times when GHI is consistent with clearsky conditions.

Orientation#

System orientation refers to mounting type (fixed or tracker) and the azimuth and tilt of the mounting. A system’s orientation can be determined by examining power or POA irradiance on days that are relatively sunny.

This module provides functions that operate on power or POA irradiance to identify system orientation on a daily basis. These functions can tell you whether a day’s profile matches that of a fixed system or system with a single-axis tracker.

Care should be taken when interpreting function output since other factors such as malfunctioning trackers can interfere with identification.

features.orientation.fixed_nrel(...[, ...])

Flag days that match the profile of a fixed PV system on a sunny day.

features.orientation.tracking_nrel(...[, ...])

Flag days that match the profile of a single-axis tracking PV system on a sunny day.

Daytime#

Functions that relate to determining day/night periods in a time series, and getting sunrise and sunset times based on the day-night mask outputs.

features.daytime.power_or_irradiance(series)

Return True for values that are during the day.

features.daytime.get_sunrise(daytime_mask[, ...])

Using the outputs of power_or_irradiance(), derive sunrise values for each day in the associated time series.

features.daytime.get_sunset(daytime_mask[, ...])

Using the outputs of power_or_irradiance(), derive sunset values for each day in the associated time series.

Shading#

Functions for labeling shadows.

features.shading.fixed(ghi, daytime, clearsky)

Detects shadows from fixed structures such as wires and poles.

System#

This module contains functions and classes relating to PV system parameters such as nameplate power, tilt, azimuth, or whether the system is equipped with tracker.

Tracking#

system.Tracker(value)

Enum describing the orientation of a PV System.

system.is_tracking_envelope(series, daytime, ...)

Infer whether the system is equipped with a tracker.

Orientation#

The following function can be used to infer system orientation from power or plane of array irradiance measurements.

system.infer_orientation_daily_peak(...)

Determine system azimuth and tilt from power or POA using solar azimuth at the daily peak.

system.infer_orientation_fit_pvwatts(...[, ...])

Get the tilt and azimuth that give PVWatts v5 output that most closely fits the data in power_ac.

Metrics#

Performance Ratio#

The following functions can be used to calculate system performance metrics.

metrics.performance_ratio_nrel(poa_global, ...)

Calculate NREL Performance Ratio.

Variability#

Functions to calculate variability statistics.

metrics.variability_index(measured, clearsky)

Calculate the variability index.

Release Notes#

These are the bug-fixes, new features, and improvements for each release.

0.2.0 (February 14, 2024)#

Breaking Changes#
  • Updated function infer_orientation_fit_pvwatts() to more closely align with the PVWatts v5 methodology. This includes incorporating relative airmass and extraterrestrial irradiance into the Perez total irradiance model, accounting for array incidence loss (IAM), and including losses in the PVWatts inverter model. Additionally, added optional arguments for bounding the azimuth range in during least squares optimization. (GH147, GH180)

  • Updated function shifts_ruptures() to align with the methodology tested and reported on at PVRW 2023 (“Survey of Time Shift Detection Algorithms for Measured PV Data”). This includes converting the changepoint detection algorithm from Pelt to Binary Segmentation (which runs much faster), and performing additional processing to each detected segment to remove outliers and filter by a quantile cutoff instead of the original rounding technique. (GH197)

Enhancements#
  • Added function get_sunrise() for calculating the daily sunrise datetimes for a time series, based on the power_or_irradiance() day/night mask output. (GH187)

  • Added function get_sunset() for calculating the daily sunset datetimes for a time series, based on the power_or_irradiance() day/night mask output. (GH187)

  • Updated function power_or_irradiance() to be more performant by vectorization; the original logic was using a lambda call that was slowing the function speed down considerably. This update resulted in a ~50X speedup. (GH186)

Bug Fixes#
  • pvanalytics.__version__ now correctly reports the version string instead of raising AttributeError. (GH181)

  • Compatibility with pandas 2.0.0 (GH185) and future versions of pandas (GH203)

  • Compatibility with scipy 1.11 (GH196)

  • Updated function trim() to handle pandas 2.0.0 update for tz-aware timeseries (GH206)

Requirements#
  • Advance minimum pvlib to 0.9.4, numpy to 0.16.0, pandas to 1.0.0, and scipy to 1.6.0. (GH179, GH185)

Documentation#
  • Online docs now use pydata-sphinx-theme instead of the built-in alabaster theme. (GH176, GH178)

  • Added PVFleets QA pipeline examples for checking temperature, irradiance, and power streams. (GH201, GH202)

  • Added a gallery page for shifts_ruptures(). (GH192)

Testing#
  • Added testing for python 3.11 and 3.12. (GH189, GH204)

Contributors#

0.1.3 (December 16, 2022)#

Enhancements#
Documentation#

Added new gallery example pages:

Contributors#

0.1.2 (August 18, 2022)#

Enhancements#
Bug Fixes#
Documentation#

Added fifteen new gallery example pages:

Other#
  • Removed empty modules pvanalytics.filtering and pvanalytics.fitting until the relevant functionality is added to the package. (GH145)

Contributors#

0.1.1 (February 18, 2022)#

Enhancements#
Bug Fixes#
Requirements#
  • Drop support for python 3.6, which reached end of life Dec 2021 (GH129)

Documentation#
Contributors#

0.1.0 (November 20, 2020)#

This is the first release of PVAnalytics. As such, the list of “changes” below is not specific. Future releases will describe specific changes here along with references to the relevant github issue and pull requests.

API Changes#
Enhancements#
  • Quality control functions for irradiance, weather and time series data. See pvanalytics.quality for content.

  • Feature labeling functions for clipping, clearsky, daytime, and orientation. See pvanalytics.features for content.

  • System parameter inference for tilt, azimuth, and whether the system is tracking or fixed. See pvanalytics.system for content.

  • NREL performance ratio metric (pvanalytics.metrics.performance_ratio_nrel()).

Bug Fixes#
Contributors#

Special thanks to Matt Muller and Kirsten Perry of NREL for their assistance in adapting components from the PVFleets QA project to PVAnalytics.