timeseries module

timeseries.preprocess module

pvops.timeseries.preprocess.establish_solar_loc(prod_df, prod_col_dict, meta_df, meta_col_dict)[source]

Adds solar position column using pvLib.

Parameters:

prod_df (DataFrame) – A data frame corresponding to production data containing a datetime index.
prod_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
- siteid (string), should be assigned to site-ID column name in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present. The index must contain the site IDs used in prod_df.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
- longitude (string), should be assigned to site’s longitude
- latitude (string), should be assigned to site’s latitude

Returns:

Original dataframe (copied) with new timeseries solar position data using
the same column name definitions provided in pvLib.

pvops.timeseries.preprocess.normalize_production_by_capacity(prod_df, prod_col_dict, meta_df, meta_col_dict)[source]

Normalize power by capacity. This preprocessing step is meant as a step prior to a modeling attempt where a model is trained on multiple sites simultaneously.

Parameters:

prod_df (DataFrame) – A data frame corresponding to production data.
prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
- energyprod (string), should be assigned to production data in prod_df
- siteid (string), should be assigned to site-ID column name in prod_df
- capacity_normalized_power (string), should be assigned to a column name where the normalized output signal will be stored
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
- siteid (string), should be assigned to site-ID column name
- dcsize (string), should be assigned to column name corresponding to site’s DC size

Returns:

prod_df (DataFrame) – normalized production data

pvops.timeseries.preprocess.prod_inverter_clipping_filter(prod_df, prod_col_dict, meta_df, meta_col_dict, model, **kwargs)[source]

Filter rows of production data frame according to performance and data quality

Parameters:

prod_df (DataFrame) – A data frame corresponding to production data.
prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
- timestamp (string), should be assigned to associated time-stamp column name in prod_df
- siteid (string), should be assigned to site-ID column name in prod_df
- powerprod (string), should be assigned to associated power production column name in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
- siteid (string), should be assigned to site-ID column name
- latitude (string), should be assigned to column name corresponding to site’s latitude
- longitude (string), should be assigned to column name corresponding to site’s longitude
model (str) – A string distinguishing the inverter clipping detection model programmed in pvanalytics. Available options: [‘geometric’, ‘threshold’, ‘levels’]
kwargs – Extra parameters passed to the relevant pvanalytics model. If none passed, defaults are used.

Returns:

prod_df (DataFrame) – If drop=True, a filtered dataframe with clipping periods removed is returned.

pvops.timeseries.preprocess.prod_irradiance_filter(prod_df, prod_col_dict, meta_df, meta_col_dict, drop=True, irradiance_type='ghi', csi_max=1.1)[source]

Filter rows of production data frame according to performance and data quality.

THIS METHOD IS CURRENTLY IN DEVELOPMENT.

Parameters:

prod_df (DataFrame) – A data frame corresponding to production data.
prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
- timestamp (string), should be assigned to associated time-stamp column name in prod_df
- siteid (string), should be assigned to site-ID column name in prod_df
- irradiance (string), should be assigned to associated irradiance column name in prod_df
- clearsky_irr (string), should be assigned to clearsky irradiance column name in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
- siteid (string), should be assigned to site-ID column name
- latitude (string), should be assigned to column name corresponding to site’s latitude
- longitude (string), should be assigned to column name corresponding to site’s longitude
irradiance_type (str) – A string description of the irradiance_type which was passed in prod_df. Options: ghi, dni, dhi. In future, poa may be a feature.
csi_max (int) – A pvanalytics parameter of maximum ratio of measured to clearsky (clearsky index).

Returns:

prod_df (DataFrame) – A dataframe with new clearsky_irr column. If drop=True, a filtered prod_df according to clearsky.
clearsky_mask (series) – Returns True for each value where the clearsky index is less than or equal to csi_mask

timeseries models

timeseries.models.linear module

class pvops.timeseries.models.linear.DefaultModel(time_weighted=None, estimators=None, verbose=0, X_parameters=[])[source]

Bases: Model, TimeWeightedProcess

Generate a simple model using the input data, without any data transposition.

construct(X, y, data_split='train')[source]

class pvops.timeseries.models.linear.Model(estimators=None)[source]

Bases: object

Linear model kernel

predict()[source]: Predict using the model.

train()[source]: Train the model.

class pvops.timeseries.models.linear.PolynomialModel(degree=2, estimators=None, time_weighted=None, verbose=0, X_parameters=[], exclude_params=[])[source]

Bases: Model, TimeWeightedProcess

Add all interactions between terms with a degree.

construct(X, y, data_split='train')[source]

class pvops.timeseries.models.linear.TimeWeightedProcess(verbose=0)[source]

Bases: object

Generate time-oriented dummy variables for linear regression. Available timeframes include “month”, “season”, and “hour”.

time_weight(X, time_weighted='season', data_split='train')[source]

pvops.timeseries.models.linear.modeller(prod_col_dict, kernel_type='default', time_weighted='month', X_parameters=[], Y_parameter=None, estimators=None, prod_df=None, test_split=0.2, train_df=None, test_df=None, degree=3, exclude_params=[], verbose=0)[source]

Wrapper method to conduct the modelling of the timeseries data.

To input the data, there are two options.

Option 1: include full production data in prod_df parameter and test_split so that the test split is conducted
Option 2: conduct the test-train split prior to calling the function and pass in data under test_df and train_df

Parameters:

prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data
- siteid (string), should be assigned to site-ID column name in prod_df
- timestamp (string), should be assigned to time-stamp column name in prod_df
- irradiance (string), should be assigned to irradiance column name in prod_df, where data should be in [W/m^2]
- baseline (string), should be assigned to preferred column name to capture model calculations in prod_df
- dcsize, (string), should be assigned to preferred column name for site capacity in prod_df
- powerprod, (string), should be assigned to the column name holding the power or energy production. This will be used as the output column if Y_parameter is not passed.
kernel_type (str) – Type of kernel type for the statistical model
- ‘default’, establishes a kernel where one component is instantiated in the model for each feature.
- ‘polynomial’, a paraboiloidal polynomial with a dynamic number of covariates (Xs) and degrees (n). For example, with 2 covariates and a degree of 2, the formula would be: Y(α , X) = α_0 + α_1 X_1 + α_2 X_2 + α_3 X_1 X_2 + α_4 X_1^2 + α_5 X_2^2
time_weighted (str or None) – Interval for time-based feature generation. For each interval in this time-weight, a dummy variable is established in the model prior to training. Options include:
- if ‘hour’, establish discrete model components for each hour of day
- if ‘month’, establish discrete model components for each month
- if ‘season’, establish discrete model components for each season
- if None, no time-weighted dummy-variable generation is conducted.
X_parameters (list of str) – List of prod_df column names used in the model
Y_parameter (str) – Optional, name of the y column. Defaults to prod_col_dict[‘powerprod’].
estimators (dict) – Optional, dictionary with key as regressor identifier (str) and value as a dictionary with key “estimator” and value the regressor instance following sklearn’s base model convention: sklearn_docs.
```
estimators = {'OLS': {'estimator': LinearRegression()},
              'RANSAC': {'estimator': RANSACRegressor()}
              }
```
prod_df (DataFrame) – A data frame corresponding to the production data used for model development and evaluation. This data frame needs at least the columns specified in prod_col_dict.
test_split (float) – A value between 0 and 1 indicating the proportion of data used for testing. Only utilized if prod_df is specified. If you want to specify your own test-train splits, pass values to test_df and train_df.
test_df (DataFrame) – A data frame corresponding to the test-split of the production data. Only needed if prod_df and test_split are not specified.
train_df (DataFrame) – A data frame corresponding to the test-split of the production data. Only needed if prod_df and test_split are not specified.
degree (int) – Utilized for ‘polynomial’ and ‘polynomial_log’ kernel_type options, this parameter defines the highest degree utilized in the polynomial kernel.
exclude_params (list) – A list of parameter definitions (defined as lists) to be excluded in the model. For example, if want to exclude a parameter in a 4-covariate model that uses 1 degree on first covariate, 2 degrees on second covariate, and no degrees for 3rd and 4th covariates, you would specify a exclude_params as [ [1,2,0,0] ]. Multiple definitions can be added to list depending on how many terms need to be excluded.

If a time_weighted parameter is selected, a time weighted definition will need to be appended to each exclusion definition. Continuing the example above, if one wants to exclude “hour 0” for the same term, then the exclude_params must be [ [1,2,0,0,0] ], where the last 0 represents the time-weighted partition setting.
verbose (int) – Define the specificity of the print statements during this function’s execution.

Returns:

model – which is a pvops.timeseries.models.linear.Model object, has a useful attribute
estimators – which allows access to model performance and data splitting information.
train_df – which is the training split of prod_df
test_df – which is the testing split of prod_df

pvops.timeseries.models.linear.predicter(model, df, Y_parameter, X_parameters, prod_col_dict, verbose=0)[source]

timeseries.models.AIT module

class pvops.timeseries.models.AIT.AIT[source]

Bases: Processer, Predictor

predict(prod_df, prod_col_dict)[source]

predict_subset(prod_df, scaler, model_terms, prod_col_dict)[source]

pvops.timeseries.models.AIT.AIT_calc(prod_df, prod_col_dict)[source]

Calculates expected energy using measured irradiance based on trained regression model from field data. Plane-of-array irradiance is recommended when using the pre-trained AIT model.

Parameters:

prod_df (DataFrame) – A data frame corresponding to the production data
prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data
- irradiance (string), should be assigned to irradiance column name in prod_df, where data should be in [W/m^2]
- dcsize, (string), should be assigned to preferred column name for site capacity in prod_df
- energyprod, (string), should be assigned to the column name holding the power or energy production. If this is passed, an evaluation will be provided.
- baseline, (string), should be assigned to preferred column name to capture the calculations in prod_df

Example

production_col_dict = {'irradiance': 'irrad_poa_Wm2',
                    'ambient_temperature': 'temp_amb_C',
                    'dcsize': 'capacity_DC_kW',
                    'energyprod': 'energy_generated_kWh',
                    'baseline': 'predicted'
                    }
data = AIT_calc(data, production_col_dict)

Returns:: DataFrame – A data frame for production data with a new column, the predicted energy

class pvops.timeseries.models.AIT.Predictor[source]

Bases: object

Predictor class

apply_additive_polynomial_model(model_terms, Xs)[source]

Predict energy using a model derived by pvOps.

Parameters:

df (dataframe) – Data containing columns with the values in the prod_col_dict

model_terms (list of tuples) – Contain model coefficients and powers. For example,

[(0.29359785963294494, [1, 0]),
(0.754806343190528, [0, 1]),
(0.396833207207238, [1, 1]),
(-0.0588375219110795, [0, 0])]

prod_col_dict (dict) – Dictionary mapping nicknamed parameters to the named parameters in the dataframe df.

Returns:

Array of predicted energy values

evaluate(real, pred)[source]

class pvops.timeseries.models.AIT.Processer[source]

Bases: object

check_data(data, prod_col_dict)[source]

timeseries.models.iec module

pvops.timeseries.models.iec.iec_calc(prod_df, prod_col_dict, meta_df, meta_col_dict, gi_ref=1000.0)[source]

Calculates expected energy using measured irradiance based on IEC calculations.

Parameters:

prod_df (DataFrame) – A data frame corresponding to the production data after having been processed by the perf_om_NA_qc and overlappingDFs functions. This data frame needs at least the columns specified in prod_col_dict.
prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data
- siteid (string), should be assigned to site-ID column name in prod_df
- timestamp (string), should be assigned to time-stamp column name in prod_df
- irradiance (string), plane-of-array. Should be assigned to irradiance column name in prod_df, where data should be in [W/m^2].
- baseline (string), should be assigned to preferred column name to capture IEC calculations in prod_df
- dcsize, (string), should be assigned to preferred column name for site capacity in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
- siteid (string), should be assigned to site-ID column name
- dcsize (string), should be assigned to column name corresponding to site capacity, where data is in [kW]
gi_ref (float) – reference plane of array irradiance in W/m^2 at which a site capacity is determined (default value is 1000 [W/m^2])

Returns:

DataFrame – A data frame for production data with a new column, iecE, which is the predicted energy calculated based on the IEC standard using measured irradiance data