pvanalytics.quality.data_shifts.get_longest_shift_segment_dates#

pvanalytics.quality.data_shifts.get_longest_shift_segment_dates(series, filtering=True, use_default_models=True, method=None, cost=None, penalty=40, buffer_day_length=7)#

Return the start and end dates of the longest serially complete time series segment.

During this process, data shift detection is performed, and the longest time series segment between changepoints is identified, and the start and end dates of that segment are returned, with a settable buffer period added to the start date and subtracted from the end date, to allow for the segment to stabilize (this helps if the changepoint is detected a few days early or a few days late, compared to the actual shift date).

Parameters
  • series (Pandas series with datetime index.) – Daily time series of a PV data stream, which can include irradiance and power data streams. This series represents the summed daily values of the particular data stream.

  • filtering (Boolean, default True.) – Whether or not to filter out outliers and stale data from the time series. If True, then this data is filtered out before running the data shift detection sequence. If False, this data is not filtered out. Default set to True.

  • use_default_models (Boolean, default True) – If True, then default change point detection search parameters are used. For time series shorter than 2 years in length, the search function is rpt.Window with model=’rbf’, width=50 and penalty=30. For time series 2 years or longer in length, the search function is rpt.BottomUp with model=’rbf’ and penalty=40.

  • method (ruptures search method instance or None, default None.) – Ruptures search method instance. See https://centre-borelli.github.io/ruptures-docs/user-guide/.

  • cost (str or None, default None) – Cost function passed to the ruptures changepoint search instance. See https://centre-borelli.github.io/ruptures-docs/user-guide/

  • penalty (int, default 40) – Penalty value passed to the ruptures changepoint detection method. Default set to 40.

  • buffer_day_length (int, default 7) – Number of days to add to the start date and subtract from the end date of the longest detected data shift-free period. This buffer period helps to filter out any data that doesn’t fit within the current data segment. This issue occurs when the changepoint is detected a few days early or late compared to the actual data shift date.

Returns

  • start_date (Pandas datetime) – Start date of the longest continuous time series segment that is free of data shifts.

  • end_date (Pandas datetime) – End date of the longest continuous time series segment that is free of data shifts.

References

1

Perry K., and Muller, M. “Automated shift detection in sensor-based PV power and irradiance time series”, 2022 IEEE 48th Photovoltaic Specialists Conference (PVSC). Submitted.

Examples using pvanalytics.quality.data_shifts.get_longest_shift_segment_dates#

PV Fleets QA Process: Temperature

PV Fleets QA Process: Temperature

PV Fleets QA Process: Irradiance

PV Fleets QA Process: Irradiance

PV Fleets QA Process: Power

PV Fleets QA Process: Power

Data Shift Detection & Filtering

Data Shift Detection & Filtering