Text2Time Module Tutorial

This notebook demonstrates the use of pvops to analyze a combination of operations and maintenance (OM) and production data. The data will be processed and cleaned in preparation for an intersection analysis and subsequent visualizations.

Import modules

[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import shutil
import sys
import os
[2]:
from pvops.text2time import visualize, utils, preprocess
from pvops.timeseries.models import linear, iec

1. Load and explore data

Define csv paths to OM, production, and meta data.

[3]:
example_OMpath = os.path.join('example_data', 'example_om_data2.csv')
example_prodpath = os.path.join('example_data', 'example_prod_data_cumE2.csv')
example_metapath = os.path.join('example_data', 'example_metadata2.csv')

Load in csv files as pandas DataFrames. prod_data contains energy production and irradiance data over time for potentially multiple sites. om_data contains operations and maintenance tickets. metadata contains information about the sites, such as location and DC size.

[4]:
prod_data = pd.read_csv(example_prodpath, on_bad_lines='skip', engine='python')
om_data = pd.read_csv(example_OMpath, on_bad_lines='skip', engine='python')
metadata = pd.read_csv(example_metapath, on_bad_lines='skip', engine='python')

Explore production data

[5]:
prod_data
[5]:
randid Date Energy Irradiance
0 R23 NaN 1000.0 NaN
1 R23 7/19/2018 1:00 1000.0 NaN
2 R23 7/19/2018 2:00 0.0 NaN
3 R23 7/19/2018 3:00 0.0 NaN
4 R23 7/19/2018 4:00 1000.0 NaN
... ... ... ... ...
1049 R27 10/13/2018 23:45 5289528.0 NaN
1050 R27 10/14/2018 0:00 5289528.0 NaN
1051 R27 10/14/2018 0:15 5289528.0 NaN
1052 R27 10/14/2018 0:30 5289528.0 NaN
1053 R27 10/14/2018 0:45 5289528.0 NaN

1054 rows × 4 columns

Explore OM data

[6]:
om_data
[6]:
randid Asset date_start date_end WONumber WOType GeneralDesc
0 NaN Inverter 5/2/2018 12:00 5/17/2018 16:00 100 Corrective Inverter 1.1 Contactor 7, Inverter 1.2 Contact...
1 R23 Facility 5/19/2018 15:44 5/19/2018 13:04 101 Preventive Site offline due to grid disturbance
2 R23 Facility 6/15/2018 6:46 6/15/2018 10:30 102 Corrective Plant trip due to grid disturbance
3 R23 Facility 6/18/2018 11:20 6/18/2018 14:03 103 Corrective Site trip due to cause grid disturbance
4 R23 Facility 7/21/2018 4:45 7/21/2018 13:15 104 Vegetation Site tripped due to grid disturbance
5 R23 Inverter 7/21/2018 13:16 7/21/2018 14:25 105 Corrective Inverter failed to start following plant trip
6 R23 Inverter 7/25/2018 14:20 7/25/2018 16:40 106 Corrective inverter offline due to high ambient temp fault
7 R23 Inverter 8/1/2018 11:45 NaN 107 Corrective Inverter major underperformance
8 R23 Facility 8/2/2018 1:05 8/2/2018 9:28 108 Corrective Site trip due to grid disturbance
9 R27 Facility 9/14/2018 10:00 9/16/2018 16:00 1 corrective hurricane florence outages/response. complete ...
10 R27 Facility 9/24/2018 10:00 9/16/2018 17:00 2 vegetation Vegetation maintenance activities were performed
11 R27 Other 9/19/2018 7:00 10/11/2018 20:00 3 corrective hurricane response. perform site inspection to...
12 R27 Facility 10/13/2018 12:00 10/13/2018 17:00 4 preventive Monthly visual inspection
13 R27 other 10/14/2018 11:00 NaN 5 preventive Monthly visual inspection

Explore metadata

[7]:
metadata
[7]:
randid DC_Size_kW COD latitude longitude
0 R23 2500 10/20/2013 -80 -35
1 R27 475 10/21/2017 -81 -36

2. Prepare data for analysis

Assigning dictionaries to connect pvOps variables with user’s column names.

[8]:
#Format for dictionaries is {pvops variable: user-specific column names}
prod_col_dict = {'siteid': 'randid',
                 'timestamp': 'Date',
                 'energyprod': 'Energy',
                 'irradiance':'Irradiance',
                 'baseline': 'IEC_pstep', #user's name choice for new column (baseline expected energy defined by user or calculated based on IEC)
                 'dcsize': 'dcsize', #user's name choice for new column (System DC-size, extracted from meta-data)
                 'compared': 'Compared',#user's name choice for new column
                 'energy_pstep': 'Energy_pstep'} #user's name choice for new column

om_col_dict = {'siteid': 'randid',
               'datestart': 'date_start',
               'dateend': 'date_end',
               'workID': 'WONumber',
               'worktype': 'WOType',
               'asset': 'Asset',
               'eventdur': 'EventDur', #user's name choice for new column (Repair Duration)
               'modatestart': 'MonthStart', #user's name choice for new column (Month when an event begins)
               'agedatestart': 'AgeStart'} #user's name choice for new column (Age of system when event begins)

metad_col_dict = {'siteid': 'randid',
                  'dcsize': 'DC_Size_kW',
                  'COD': 'COD'}

2.1 Convert date strings to date-time objects

O&M Data

[9]:
#Note:  NaNs are converted to NaTs
om_data_converted = preprocess.om_date_convert(om_data, om_col_dict)

print('---Original data types---')
print(om_data.dtypes)
print('\n---Post-processed data types---')
print(om_data_converted.dtypes)
print('\n---Converted data frame---')
om_data_converted.head()
---Original data types---
randid         object
Asset          object
date_start     object
date_end       object
WONumber        int64
WOType         object
GeneralDesc    object
dtype: object

---Post-processed data types---
randid                 object
Asset                  object
date_start     datetime64[ns]
date_end       datetime64[ns]
WONumber                int64
WOType                 object
GeneralDesc            object
dtype: object

---Converted data frame---
[9]:
randid Asset date_start date_end WONumber WOType GeneralDesc
0 NaN Inverter 2018-05-02 12:00:00 2018-05-17 16:00:00 100 Corrective Inverter 1.1 Contactor 7, Inverter 1.2 Contact...
1 R23 Facility 2018-05-19 15:44:00 2018-05-19 13:04:00 101 Preventive Site offline due to grid disturbance
2 R23 Facility 2018-06-15 06:46:00 2018-06-15 10:30:00 102 Corrective Plant trip due to grid disturbance
3 R23 Facility 2018-06-18 11:20:00 2018-06-18 14:03:00 103 Corrective Site trip due to cause grid disturbance
4 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance

Production data

[10]:
prod_data_converted = preprocess.prod_date_convert(prod_data, prod_col_dict)

print('---Original data types---')
print(prod_data.dtypes)
print('\n---Post-processed data types---')
print(prod_data_converted.dtypes)
print('\n---Converted data frame---')
prod_data_converted.head()
---Original data types---
randid         object
Date           object
Energy        float64
Irradiance    float64
dtype: object

---Post-processed data types---
randid                object
Date          datetime64[ns]
Energy               float64
Irradiance           float64
dtype: object

---Converted data frame---
[10]:
randid Date Energy Irradiance
0 R23 NaT 1000.0 NaN
1 R23 2018-07-19 01:00:00 1000.0 NaN
2 R23 2018-07-19 02:00:00 0.0 NaN
3 R23 2018-07-19 03:00:00 0.0 NaN
4 R23 2018-07-19 04:00:00 1000.0 NaN

3. Handling data quality issues

3.1 Missing site-IDs

Drop rows where site-ID is NAN in OM-data (helpful when multiple sites are in O&M data frame and a NAN doesn’t identify a specific site)

[11]:
om_data_sitena, addressed = preprocess.data_site_na(om_data_converted, om_col_dict)
addressed #printing row that was addressed
[11]:
randid Asset date_start date_end WONumber WOType GeneralDesc
0 NaN Inverter 2018-05-02 12:00:00 2018-05-17 16:00:00 100 Corrective Inverter 1.1 Contactor 7, Inverter 1.2 Contact...

Print post-processed data frame

[12]:
om_data_sitena.head()
[12]:
randid Asset date_start date_end WONumber WOType GeneralDesc
1 R23 Facility 2018-05-19 15:44:00 2018-05-19 13:04:00 101 Preventive Site offline due to grid disturbance
2 R23 Facility 2018-06-15 06:46:00 2018-06-15 10:30:00 102 Corrective Plant trip due to grid disturbance
3 R23 Facility 2018-06-18 11:20:00 2018-06-18 14:03:00 103 Corrective Site trip due to cause grid disturbance
4 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance
5 R23 Inverter 2018-07-21 13:16:00 2018-07-21 14:25:00 105 Corrective Inverter failed to start following plant trip

3.2 O&M Start-dates that occur after concluding date (inverted dates)

Addressing issue by swapping dates

[13]:
om_data_checked_s, addressed = preprocess.om_datelogic_check(om_data_sitena, om_col_dict, 'swap')
addressed
[13]:
randid Asset date_start date_end WONumber WOType GeneralDesc
1 R23 Facility 2018-05-19 15:44:00 2018-05-19 13:04:00 101 Preventive Site offline due to grid disturbance
10 R27 Facility 2018-09-24 10:00:00 2018-09-16 17:00:00 2 vegetation Vegetation maintenance activities were performed

Print post-processed data frame

[14]:
om_data_checked_s.head()
[14]:
randid Asset date_start date_end WONumber WOType GeneralDesc
1 R23 Facility 2018-05-19 13:04:00 2018-05-19 15:44:00 101 Preventive Site offline due to grid disturbance
2 R23 Facility 2018-06-15 06:46:00 2018-06-15 10:30:00 102 Corrective Plant trip due to grid disturbance
3 R23 Facility 2018-06-18 11:20:00 2018-06-18 14:03:00 103 Corrective Site trip due to cause grid disturbance
4 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance
5 R23 Inverter 2018-07-21 13:16:00 2018-07-21 14:25:00 105 Corrective Inverter failed to start following plant trip

Addressing issue by dropping rows instead of swapping dates

[15]:
om_data_checked_d, date_error = preprocess.om_datelogic_check(om_data_sitena, om_col_dict, 'drop')
om_data_checked_d.head()
[15]:
randid Asset date_start date_end WONumber WOType GeneralDesc
2 R23 Facility 2018-06-15 06:46:00 2018-06-15 10:30:00 102 Corrective Plant trip due to grid disturbance
3 R23 Facility 2018-06-18 11:20:00 2018-06-18 14:03:00 103 Corrective Site trip due to cause grid disturbance
4 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance
5 R23 Inverter 2018-07-21 13:16:00 2018-07-21 14:25:00 105 Corrective Inverter failed to start following plant trip
6 R23 Inverter 2018-07-25 14:20:00 2018-07-25 16:40:00 106 Corrective inverter offline due to high ambient temp fault

3.3 Unexpected drops in energy delivered (when collected on cumulative basis)

Visualize pre-processed data

[16]:
plotvar = 'Energy'
for sid in prod_data_converted.loc[:, 'randid'].unique():
    mask = prod_data_converted.loc[:, 'randid']== sid
    prod_data_converted.loc[mask].plot(x='Date',y=plotvar, title=sid)
    plt.ylabel('Joules')
../../_images/pages_tutorials_tutorial_text2time_module_37_0.png
../../_images/pages_tutorials_tutorial_text2time_module_37_1.png

Addressing issue by forward-filling, which propagates last valid observation forward.

[17]:
prod_data_anom, addressed = utils.prod_anomalies(prod_data_converted, prod_col_dict, 1.0, np.nan, ffill=True)
addressed
[17]:
randid Date Energy Irradiance
2 R23 2018-07-19 02:00:00 0.0 NaN
3 R23 2018-07-19 03:00:00 0.0 NaN
194 R23 2018-07-27 02:00:00 0.0 NaN
395 R27 2018-09-14 04:00:00 0.0 0.616897
397 R27 2018-09-14 04:30:00 0.0 0.306548
860 R27 2018-09-19 00:15:00 0.0 0.000000

Print post-processed data frame

[18]:
prod_data_anom
[18]:
randid Date Energy Irradiance
0 R23 NaT 1000.0 NaN
1 R23 2018-07-19 01:00:00 1000.0 NaN
2 R23 2018-07-19 02:00:00 1000.0 NaN
3 R23 2018-07-19 03:00:00 1000.0 NaN
4 R23 2018-07-19 04:00:00 1000.0 NaN
... ... ... ... ...
1049 R27 2018-10-13 23:45:00 5289528.0 NaN
1050 R27 2018-10-14 00:00:00 5289528.0 NaN
1051 R27 2018-10-14 00:15:00 5289528.0 NaN
1052 R27 2018-10-14 00:30:00 5289528.0 NaN
1053 R27 2018-10-14 00:45:00 5289528.0 NaN

1054 rows × 4 columns

Quick visualization of post-processed data

[19]:
plotvar = 'Energy'
for sid in prod_data_anom.loc[:, 'randid'].unique():
    mask = prod_data_anom.loc[:, 'randid']== sid
    prod_data_anom.loc[mask].plot(x='Date',y=plotvar, title=sid)
    plt.ylabel('Joules')
../../_images/pages_tutorials_tutorial_text2time_module_43_0.png
../../_images/pages_tutorials_tutorial_text2time_module_43_1.png

3.4 Missing time-stamps in data

Production Data

Dropping rows with missing time-stamps

[20]:
prod_data_datena_d, addressed = preprocess.prod_nadate_process(prod_data_anom, prod_col_dict, pnadrop=True)
addressed
[20]:
randid Date Energy Irradiance
0 R23 NaT 1000.0 NaN
388 R27 NaT 5220831.0 0.095835

Print post-processed data frame

[21]:
prod_data_datena_d
[21]:
randid Date Energy Irradiance
1 R23 2018-07-19 01:00:00 1000.0 NaN
2 R23 2018-07-19 02:00:00 1000.0 NaN
3 R23 2018-07-19 03:00:00 1000.0 NaN
4 R23 2018-07-19 04:00:00 1000.0 NaN
5 R23 2018-07-19 05:00:00 1000.0 NaN
... ... ... ... ...
1049 R27 2018-10-13 23:45:00 5289528.0 NaN
1050 R27 2018-10-14 00:00:00 5289528.0 NaN
1051 R27 2018-10-14 00:15:00 5289528.0 NaN
1052 R27 2018-10-14 00:30:00 5289528.0 NaN
1053 R27 2018-10-14 00:45:00 5289528.0 NaN

1052 rows × 4 columns

Identifying rows with missing time-stamps but not dropping them

[22]:
prod_data_datena_id, addressed = preprocess.prod_nadate_process(prod_data_anom, prod_col_dict, pnadrop=False)
prod_data_datena_id
[22]:
randid Date Energy Irradiance
0 R23 NaT 1000.0 NaN
1 R23 2018-07-19 01:00:00 1000.0 NaN
2 R23 2018-07-19 02:00:00 1000.0 NaN
3 R23 2018-07-19 03:00:00 1000.0 NaN
4 R23 2018-07-19 04:00:00 1000.0 NaN
... ... ... ... ...
1049 R27 2018-10-13 23:45:00 5289528.0 NaN
1050 R27 2018-10-14 00:00:00 5289528.0 NaN
1051 R27 2018-10-14 00:15:00 5289528.0 NaN
1052 R27 2018-10-14 00:30:00 5289528.0 NaN
1053 R27 2018-10-14 00:45:00 5289528.0 NaN

1054 rows × 4 columns

O&M Data

Dropping rows when end-date of an event is missing

[23]:
om_data_datena_d, addressed = preprocess.om_nadate_process(om_data_checked_s, om_col_dict, om_dendflag='drop')
addressed
[23]:
randid Asset date_start date_end WONumber WOType GeneralDesc
7 R23 Inverter 2018-08-01 11:45:00 NaT 107 Corrective Inverter major underperformance
13 R27 other 2018-10-14 11:00:00 NaT 5 preventive Monthly visual inspection

Print post-processed data frame

[24]:
om_data_datena_d
[24]:
randid Asset date_start date_end WONumber WOType GeneralDesc
1 R23 Facility 2018-05-19 13:04:00 2018-05-19 15:44:00 101 Preventive Site offline due to grid disturbance
2 R23 Facility 2018-06-15 06:46:00 2018-06-15 10:30:00 102 Corrective Plant trip due to grid disturbance
3 R23 Facility 2018-06-18 11:20:00 2018-06-18 14:03:00 103 Corrective Site trip due to cause grid disturbance
4 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance
5 R23 Inverter 2018-07-21 13:16:00 2018-07-21 14:25:00 105 Corrective Inverter failed to start following plant trip
6 R23 Inverter 2018-07-25 14:20:00 2018-07-25 16:40:00 106 Corrective inverter offline due to high ambient temp fault
8 R23 Facility 2018-08-02 01:05:00 2018-08-02 09:28:00 108 Corrective Site trip due to grid disturbance
9 R27 Facility 2018-09-14 10:00:00 2018-09-16 16:00:00 1 corrective hurricane florence outages/response. complete ...
10 R27 Facility 2018-05-19 13:04:00 2018-05-19 15:44:00 2 vegetation Vegetation maintenance activities were performed
11 R27 Other 2018-09-19 07:00:00 2018-10-11 20:00:00 3 corrective hurricane response. perform site inspection to...
12 R27 Facility 2018-10-13 12:00:00 2018-10-13 17:00:00 4 preventive Monthly visual inspection

Rather than dropping rows, assigning “today’s” time-stamp for missing end-dates to consider an open ticket

[25]:
om_data_datena_t, addressed = preprocess.om_nadate_process(om_data_checked_s, om_col_dict, om_dendflag='today')
om_data_datena_t
[25]:
randid Asset date_start date_end WONumber WOType GeneralDesc
1 R23 Facility 2018-05-19 13:04:00 2018-05-19 15:44:00 101 Preventive Site offline due to grid disturbance
2 R23 Facility 2018-06-15 06:46:00 2018-06-15 10:30:00 102 Corrective Plant trip due to grid disturbance
3 R23 Facility 2018-06-18 11:20:00 2018-06-18 14:03:00 103 Corrective Site trip due to cause grid disturbance
4 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance
5 R23 Inverter 2018-07-21 13:16:00 2018-07-21 14:25:00 105 Corrective Inverter failed to start following plant trip
6 R23 Inverter 2018-07-25 14:20:00 2018-07-25 16:40:00 106 Corrective inverter offline due to high ambient temp fault
7 R23 Inverter 2018-08-01 11:45:00 2023-01-12 12:26:26 107 Corrective Inverter major underperformance
8 R23 Facility 2018-08-02 01:05:00 2018-08-02 09:28:00 108 Corrective Site trip due to grid disturbance
9 R27 Facility 2018-09-14 10:00:00 2018-09-16 16:00:00 1 corrective hurricane florence outages/response. complete ...
10 R27 Facility 2018-05-19 13:04:00 2018-05-19 15:44:00 2 vegetation Vegetation maintenance activities were performed
11 R27 Other 2018-09-19 07:00:00 2018-10-11 20:00:00 3 corrective hurricane response. perform site inspection to...
12 R27 Facility 2018-10-13 12:00:00 2018-10-13 17:00:00 4 preventive Monthly visual inspection
13 R27 other 2018-10-14 11:00:00 2023-01-12 12:26:26 5 preventive Monthly visual inspection

4. Pre-visualizing preparation

4.1 Print out overview of the overlap of OM and production data

prod_summary indicates how many time stamps overlapped with OM data versus the total number of time stamps, broken down by site. om_data indicates the lower and upper time bounds on OM data and the number of events, broken down by site.

[26]:
prod_summary, om_summary = utils.summarize_overlaps(prod_data_datena_d, om_data_datena_t, prod_col_dict, om_col_dict)

Production

[27]:
prod_summary
[27]:
Actual # Time Stamps Max # Time Stamps
randid
R23 387 387
R27 665 665

O&M

[28]:
om_summary
[28]:
Earliest Event Start Latest Event End Total Events
randid
R23 2018-07-21 04:45:00 2023-01-12 12:26:26 5
R27 2018-09-14 10:00:00 2018-10-13 17:00:00 3

4.2 Extract overlapping data

[29]:
prod_data_clean, om_data_clean = utils.overlapping_data(prod_data_datena_d, om_data_datena_d, prod_col_dict, om_col_dict)

Print post-processed production data frame

[30]:
prod_data_clean
[30]:
randid Date Energy Irradiance
0 R23 2018-07-19 01:00:00 1000.0 NaN
1 R23 2018-07-19 02:00:00 1000.0 NaN
2 R23 2018-07-19 03:00:00 1000.0 NaN
3 R23 2018-07-19 04:00:00 1000.0 NaN
4 R23 2018-07-19 05:00:00 1000.0 NaN
... ... ... ... ...
1015 R27 2018-10-13 22:45:00 5289528.0 0.0
1016 R27 2018-10-13 23:00:00 5289528.0 0.0
1017 R27 2018-10-13 23:15:00 5289528.0 NaN
1018 R27 2018-10-13 23:30:00 5289528.0 NaN
1019 R27 2018-10-13 23:45:00 5289528.0 NaN

1020 rows × 4 columns

Print post-processed O&M data frame

[31]:
om_data_clean
[31]:
randid Asset date_start date_end WONumber WOType GeneralDesc
0 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance
1 R23 Inverter 2018-07-21 13:16:00 2018-07-21 14:25:00 105 Corrective Inverter failed to start following plant trip
2 R23 Inverter 2018-07-25 14:20:00 2018-07-25 16:40:00 106 Corrective inverter offline due to high ambient temp fault
3 R23 Facility 2018-08-02 01:05:00 2018-08-02 09:28:00 108 Corrective Site trip due to grid disturbance
4 R27 Facility 2018-09-14 10:00:00 2018-09-16 16:00:00 1 corrective hurricane florence outages/response. complete ...
5 R27 Other 2018-09-19 07:00:00 2018-10-11 20:00:00 3 corrective hurricane response. perform site inspection to...
6 R27 Facility 2018-10-13 12:00:00 2018-10-13 17:00:00 4 preventive Monthly visual inspection

4.3 Calculate reference production data using IEC standards

[32]:
prod_data_clean_iec = iec.iec_calc(prod_data_clean, prod_col_dict, metadata, metad_col_dict, gi_ref=1000.)

Expected energy is calculated based on irradiance information and shows up as a new column in the production data frame.

[33]:
prod_data_clean_iec.head(n=15)
[33]:
randid Date Energy Irradiance IEC_pstep
0 R23 2018-07-19 01:00:00 1000.000 NaN NaN
1 R23 2018-07-19 02:00:00 1000.000 NaN NaN
2 R23 2018-07-19 03:00:00 1000.000 NaN NaN
3 R23 2018-07-19 04:00:00 1000.000 NaN NaN
4 R23 2018-07-19 05:00:00 1000.000 NaN NaN
5 R23 2018-07-19 06:00:00 1000.000 NaN NaN
6 R23 2018-07-19 07:00:00 1032.712 28.6245 71.56125
7 R23 2018-07-19 08:00:00 1217.521 136.8305 342.07625
8 R23 2018-07-19 09:00:00 1889.859 347.5645 868.91125
9 R23 2018-07-19 10:00:00 3073.485 565.9015 1414.75375
10 R23 2018-07-19 11:00:00 4662.416 754.6965 1886.74125
11 R23 2018-07-19 12:00:00 6518.864 896.4945 2241.23625
12 R23 2018-07-19 13:00:00 8469.309 984.3710 2460.92750
13 R23 2018-07-19 14:00:00 10059.862 1018.6565 2546.64125
14 R23 2018-07-19 15:00:00 11122.754 339.1815 847.95375

4.4 Calculating a comparison of production data relative to baseline

Calculate as a ratio (actual/baseline)

[34]:
prod_data_quant = utils.prod_quant(prod_data_clean_iec, prod_col_dict, comp_type='norm', ecumu=True)
prod_data_quant.head(10)
[34]:
randid Date Energy Irradiance IEC_pstep Energy_pstep Compared
0 R23 2018-07-19 01:00:00 1000.000 NaN NaN NaN NaN
1 R23 2018-07-19 02:00:00 1000.000 NaN NaN 0.000 NaN
2 R23 2018-07-19 03:00:00 1000.000 NaN NaN 0.000 NaN
3 R23 2018-07-19 04:00:00 1000.000 NaN NaN 0.000 NaN
4 R23 2018-07-19 05:00:00 1000.000 NaN NaN 0.000 NaN
5 R23 2018-07-19 06:00:00 1000.000 NaN NaN 0.000 NaN
6 R23 2018-07-19 07:00:00 1032.712 28.6245 71.56125 32.712 0.457119
7 R23 2018-07-19 08:00:00 1217.521 136.8305 342.07625 184.809 0.540257
8 R23 2018-07-19 09:00:00 1889.859 347.5645 868.91125 672.338 0.773771
9 R23 2018-07-19 10:00:00 3073.485 565.9015 1414.75375 1183.626 0.836630

Calculate as a difference (baseline-actual)

[35]:
prod_data_quant = utils.prod_quant(prod_data_clean_iec, prod_col_dict, comp_type='diff', ecumu=True)
prod_data_quant.head(10)
[35]:
randid Date Energy Irradiance IEC_pstep Energy_pstep Compared
0 R23 2018-07-19 01:00:00 1000.000 NaN NaN NaN NaN
1 R23 2018-07-19 02:00:00 1000.000 NaN NaN 0.000 NaN
2 R23 2018-07-19 03:00:00 1000.000 NaN NaN 0.000 NaN
3 R23 2018-07-19 04:00:00 1000.000 NaN NaN 0.000 NaN
4 R23 2018-07-19 05:00:00 1000.000 NaN NaN 0.000 NaN
5 R23 2018-07-19 06:00:00 1000.000 NaN NaN 0.000 NaN
6 R23 2018-07-19 07:00:00 1032.712 28.6245 71.56125 32.712 38.84925
7 R23 2018-07-19 08:00:00 1217.521 136.8305 342.07625 184.809 157.26725
8 R23 2018-07-19 09:00:00 1889.859 347.5645 868.91125 672.338 196.57325
9 R23 2018-07-19 10:00:00 3073.485 565.9015 1414.75375 1183.626 231.12775

5. Visualizations

5.1 Visualizing overlapping production and O&M data

Making directories to store generated visualizations.

[36]:
#User should modify paths as needed
main_fldr = 'analysis'
if os.path.isdir(main_fldr):
    shutil.rmtree(main_fldr)
prod_fldr = os.path.join(main_fldr, 'perf_plots')
site_fldr = os.path.join(main_fldr, 'site_plots')
os.makedirs(prod_fldr)
os.makedirs(site_fldr)

Making visualizations

[37]:
figs = visualize.visualize_om_prod_overlap(prod_data_quant, om_data_clean, prod_col_dict, om_col_dict, prod_fldr=prod_fldr, e_cumu=True, be_cumu=False, samp_freq='H', pshift=0.0, baselineflag=True)

Display figure handles of overlapping data (“figs” contains one figure per site in overlapping data frames)

[38]:
for i in range(len(figs)):
    figs[i].show()

Data type cannot be displayed: application/vnd.plotly.v1+json

Data type cannot be displayed: application/vnd.plotly.v1+json

5.2 Calculate additional O&M metrics and generate relevant plots

Calculating individual event duration and age of system at time of event occurence

[39]:
om_data_update = utils.om_summary_stats(om_data_clean, metadata, om_col_dict, metad_col_dict)
om_data_update.head()
[39]:
randid Asset date_start date_end WONumber WOType GeneralDesc EventDur MonthStart COD AgeStart
0 R23 Facility 2018-07-21 04:45:00 2018-07-21 13:15:00 104 Vegetation Site tripped due to grid disturbance 8.500000 7 2013-10-20 1735
1 R23 Inverter 2018-07-21 13:16:00 2018-07-21 14:25:00 105 Corrective Inverter failed to start following plant trip 1.150000 7 2013-10-20 1736
2 R23 Inverter 2018-07-25 14:20:00 2018-07-25 16:40:00 106 Corrective inverter offline due to high ambient temp fault 2.333333 7 2013-10-20 1740
3 R23 Facility 2018-08-02 01:05:00 2018-08-02 09:28:00 108 Corrective Site trip due to grid disturbance 8.383333 8 2013-10-20 1747
4 R27 Facility 2018-09-14 10:00:00 2018-09-16 16:00:00 1 corrective hurricane florence outages/response. complete ... 6.000000 9 2017-10-21 328

Setting Seaborn fig and font settings (inputs to count_fig and catscat_fig below)

[40]:
my_figsize = (12,10)
my_fontsize = 20
my_savedpi = 300
fig_sets = {'figsize': my_figsize,
            'fontsize': my_fontsize
            }

Creating scatter-plot of system age at beginning of each event, per site

[41]:
cat_varx = om_col_dict['siteid']
cat_vary= om_col_dict['agedatestart']
sv_nm = 'system_age.png'
myfig = visualize.visualize_categorical_scatter(om_data_update, om_col_dict, cat_varx, cat_vary, fig_sets)
../../_images/pages_tutorials_tutorial_text2time_module_96_0.png

Creating scatter-plot of each event-duration, per site

[42]:
cat_varx = om_col_dict['siteid']
cat_vary= om_col_dict['eventdur']
sv_nm = 'event_dur.png'
myfig = visualize.visualize_categorical_scatter(om_data_update, om_col_dict, cat_varx, cat_vary, fig_sets)
../../_images/pages_tutorials_tutorial_text2time_module_98_0.png

Count-plot of # of events, per site

[43]:
count_var = om_col_dict['siteid']
sv_nm = 'event_count_per_site.png'
myfig = visualize.visualize_counts(om_data_update, om_col_dict, count_var, fig_sets)
../../_images/pages_tutorials_tutorial_text2time_module_100_0.png

Count-plot of # of events, per month

[44]:
count_var = om_col_dict['modatestart']
sv_nm = 'event_count_per_month.png'
myfig = visualize.visualize_counts(om_data_update, om_col_dict, count_var, fig_sets)
../../_images/pages_tutorials_tutorial_text2time_module_102_0.png