Text2Time Guide
Module Overview
Aligning production data with O&M tickets is not a trivial task since intersection of dates and identification of anomalies depends on the nuances within the two datasets. This set of functions facilitate this data fusion. Key features include:
conducting quality checks and controls on data.
identification of overlapping periods between O&M and production data.
generation of baseline values for production loss estimations.
calculation of losses from production anomalies for specific time periods.
An example of usage can be found in tutorial_text2time_module.ipynb.
The text2time package can be broken down into three main components: data pre-processing, utils, and visualizations.
Data pre-processing
These functions pre-process user O&M and production data to prepare them for further analyses and visualizations.
om_date_convert()
andprod_date_convert()
convert dates in string format to date-time objects in the O&M and production data respectively.data_site_na()
handles missing site IDs in the user data. This function can be used for both O&M and production data.om_datelogic_check()
detects and handles issues with the logic of the O&M date, specifically when the conclusion of an event occurs before it begins.om_nadate_process()
andprod_nadate_process()
detect and handle any missing time-stamps in the O&M and production data respectively.
Utils
These functions perform secondary calcuations on the O&M and production data to aid in data analyses and visualizations.
iec_calc()
calculates a comparison dataset for the production data based on an irradiance as calculated by IEC calculation.summarize_overlaps()
summarizes the overlapping production and O&M data.om_summary_stats()
summarizes statistics (e.g., event duration and month of occurrence) of O&M data.overlapping_data()
trims the production and O&M data frames and only retain the data where both datasets overlap in time.prod_anomalies()
detects and handles issues when the production data is input in cumulative format and unexpected dips show up in the data.prod_quant()
calculates a comparison between the actual production data and a baseline (e.g. from a model from timeseries models).
Visualizations
These functions visualize the processed O&M and production data:
visualize_categorical_scatter()
generates categorical scatter plots of chosen variable based on specified category (e.g. site ID) for the O&M data.visualize_counts()
generates a count plot of categories based on a chosen categorical variable column for the O&M data. If that variable is the user’s site ID for every ticket, a plot for total count of events can be generated.visualize_om_prod_overlap()
creates a visualization that overlays the O&M data on top of the coinciding production data.
Example Code
Load in OM data and convert dates to python date-time objects
>>> import pandas as pd
>>> import os
>>> from pvops.text2time import preprocess
>>> example_OMpath = os.path.join('example_data', 'example_om_data2.csv')
>>> om_data = pd.read_csv(example_OMpath, on_bad_lines='skip', engine='python')
>>> om_col_dict = {
... 'siteid': 'randid',
... 'datestart': 'date_start',
... 'dateend': 'date_end',
... 'workID': 'WONumber',
... 'worktype': 'WOType',
... 'asset': 'Asset',
... 'eventdur': 'EventDur', #user's name choice for new column (Repair Duration)
... 'modatestart': 'MonthStart', #user's name choice for new column (Month when an event begins)
... 'agedatestart': 'AgeStart'} #user's name choice for new column (Age of system when event begins)
>>> om_data_converted = preprocess.om_date_convert(om_data, om_col_dict)