timeseries module

timeseries.preprocess module

pvops.timeseries.preprocess.establish_solar_loc(prod_df, prod_col_dict, meta_df, meta_col_dict)[source]

Adds solar position column using pvLib.

Parameters:
  • prod_df (DataFrame) – A data frame corresponding to production data containing a datetime index.

  • prod_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:

    • siteid (string), should be assigned to site-ID column name in prod_df

  • meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present. The index must contain the site IDs used in prod_df.

  • meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data

    • longitude (string), should be assigned to site’s longitude

    • latitude (string), should be assigned to site’s latitude

Returns:

  • Original dataframe (copied) with new timeseries solar position data using

  • the same column name definitions provided in pvLib.

pvops.timeseries.preprocess.normalize_production_by_capacity(prod_df, prod_col_dict, meta_df, meta_col_dict)[source]

Normalize power by capacity. This preprocessing step is meant as a step prior to a modeling attempt where a model is trained on multiple sites simultaneously.

Parameters:
  • prod_df (DataFrame) – A data frame corresponding to production data.

  • prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:

    • energyprod (string), should be assigned to production data in prod_df

    • siteid (string), should be assigned to site-ID column name in prod_df

    • capacity_normalized_power (string), should be assigned to a column name where the normalized output signal will be stored

  • meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.

  • meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data

    • siteid (string), should be assigned to site-ID column name

    • dcsize (string), should be assigned to column name corresponding to site’s DC size

Returns:

prod_df (DataFrame) – normalized production data

pvops.timeseries.preprocess.prod_inverter_clipping_filter(prod_df, prod_col_dict, meta_df, meta_col_dict, model, **kwargs)[source]

Filter rows of production data frame according to performance and data quality

Parameters:
  • prod_df (DataFrame) – A data frame corresponding to production data.

  • prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:

    • timestamp (string), should be assigned to associated time-stamp column name in prod_df

    • siteid (string), should be assigned to site-ID column name in prod_df

    • powerprod (string), should be assigned to associated power production column name in prod_df

  • meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.

  • meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data

    • siteid (string), should be assigned to site-ID column name

    • latitude (string), should be assigned to column name corresponding to site’s latitude

    • longitude (string), should be assigned to column name corresponding to site’s longitude

  • model (str) – A string distinguishing the inverter clipping detection model programmed in pvanalytics. Available options: [‘geometric’, ‘threshold’, ‘levels’]

  • kwargs – Extra parameters passed to the relevant pvanalytics model. If none passed, defaults are used.

Returns:

prod_df (DataFrame) – If drop=True, a filtered dataframe with clipping periods removed is returned.

pvops.timeseries.preprocess.prod_irradiance_filter(prod_df, prod_col_dict, meta_df, meta_col_dict, drop=True, irradiance_type='ghi', csi_max=1.1)[source]

Filter rows of production data frame according to performance and data quality.

THIS METHOD IS CURRENTLY IN DEVELOPMENT.

Parameters:
  • prod_df (DataFrame) – A data frame corresponding to production data.

  • prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:

    • timestamp (string), should be assigned to associated time-stamp column name in prod_df

    • siteid (string), should be assigned to site-ID column name in prod_df

    • irradiance (string), should be assigned to associated irradiance column name in prod_df

    • clearsky_irr (string), should be assigned to clearsky irradiance column name in prod_df

  • meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.

  • meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data

    • siteid (string), should be assigned to site-ID column name

    • latitude (string), should be assigned to column name corresponding to site’s latitude

    • longitude (string), should be assigned to column name corresponding to site’s longitude

  • irradiance_type (str) – A string description of the irradiance_type which was passed in prod_df. Options: ghi, dni, dhi. In future, poa may be a feature.

  • csi_max (int) – A pvanalytics parameter of maximum ratio of measured to clearsky (clearsky index).

Returns:

  • prod_df (DataFrame) – A dataframe with new clearsky_irr column. If drop=True, a filtered prod_df according to clearsky.

  • clearsky_mask (series) – Returns True for each value where the clearsky index is less than or equal to csi_mask

timeseries models

timeseries.models.linear module

class pvops.timeseries.models.linear.DefaultModel(time_weighted=None, estimators=None, verbose=0, X_parameters=[])[source]

Bases: Model, TimeWeightedProcess

Generate a simple model using the input data, without any data transposition.

construct(X, y, data_split='train')[source]
class pvops.timeseries.models.linear.Model(estimators=None)[source]

Bases: object

Linear model kernel

predict()[source]

Predict using the model.

train()[source]

Train the model.

class pvops.timeseries.models.linear.PolynomialModel(degree=2, estimators=None, time_weighted=None, verbose=0, X_parameters=[], exclude_params=[])[source]

Bases: Model, TimeWeightedProcess

Add all interactions between terms with a degree.

construct(X, y, data_split='train')[source]
class pvops.timeseries.models.linear.TimeWeightedProcess(verbose=0)[source]

Bases: object

Generate time-oriented dummy variables for linear regression. Available timeframes include “month”, “season”, and “hour”.

time_weight(X, time_weighted='season', data_split='train')[source]
pvops.timeseries.models.linear.modeller(prod_col_dict, kernel_type='default', time_weighted='month', X_parameters=[], Y_parameter=None, estimators=None, prod_df=None, test_split=0.2, train_df=None, test_df=None, degree=3, exclude_params=[], verbose=0)[source]

Wrapper method to conduct the modelling of the timeseries data.

To input the data, there are two options.

  • Option 1: include full production data in prod_df parameter and test_split so that the test split is conducted

  • Option 2: conduct the test-train split prior to calling the function and pass in data under test_df and train_df

Parameters:
  • prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data

    • siteid (string), should be assigned to site-ID column name in prod_df

    • timestamp (string), should be assigned to time-stamp column name in prod_df

    • irradiance (string), should be assigned to irradiance column name in prod_df, where data should be in [W/m^2]

    • baseline (string), should be assigned to preferred column name to capture model calculations in prod_df

    • dcsize, (string), should be assigned to preferred column name for site capacity in prod_df

    • powerprod, (string), should be assigned to the column name holding the power or energy production. This will be used as the output column if Y_parameter is not passed.

  • kernel_type (str) – Type of kernel type for the statistical model

    • ‘default’, establishes a kernel where one component is instantiated in the model for each feature.

    • ‘polynomial’, a paraboiloidal polynomial with a dynamic number of covariates (Xs) and degrees (n). For example, with 2 covariates and a degree of 2, the formula would be: Y(α , X) = α_0 + α_1 X_1 + α_2 X_2 + α_3 X_1 X_2 + α_4 X_1^2 + α_5 X_2^2

  • time_weighted (str or None) – Interval for time-based feature generation. For each interval in this time-weight, a dummy variable is established in the model prior to training. Options include:

    • if ‘hour’, establish discrete model components for each hour of day

    • if ‘month’, establish discrete model components for each month

    • if ‘season’, establish discrete model components for each season

    • if None, no time-weighted dummy-variable generation is conducted.

  • X_parameters (list of str) – List of prod_df column names used in the model

  • Y_parameter (str) – Optional, name of the y column. Defaults to prod_col_dict[‘powerprod’].

  • estimators (dict) – Optional, dictionary with key as regressor identifier (str) and value as a dictionary with key “estimator” and value the regressor instance following sklearn’s base model convention: sklearn_docs.

    estimators = {'OLS': {'estimator': LinearRegression()},
                  'RANSAC': {'estimator': RANSACRegressor()}
                  }
    
  • prod_df (DataFrame) – A data frame corresponding to the production data used for model development and evaluation. This data frame needs at least the columns specified in prod_col_dict.

  • test_split (float) – A value between 0 and 1 indicating the proportion of data used for testing. Only utilized if prod_df is specified. If you want to specify your own test-train splits, pass values to test_df and train_df.

  • test_df (DataFrame) – A data frame corresponding to the test-split of the production data. Only needed if prod_df and test_split are not specified.

  • train_df (DataFrame) – A data frame corresponding to the test-split of the production data. Only needed if prod_df and test_split are not specified.

  • degree (int) – Utilized for ‘polynomial’ and ‘polynomial_log’ kernel_type options, this parameter defines the highest degree utilized in the polynomial kernel.

  • exclude_params (list) – A list of parameter definitions (defined as lists) to be excluded in the model. For example, if want to exclude a parameter in a 4-covariate model that uses 1 degree on first covariate, 2 degrees on second covariate, and no degrees for 3rd and 4th covariates, you would specify a exclude_params as [ [1,2,0,0] ]. Multiple definitions can be added to list depending on how many terms need to be excluded.

    If a time_weighted parameter is selected, a time weighted definition will need to be appended to each exclusion definition. Continuing the example above, if one wants to exclude “hour 0” for the same term, then the exclude_params must be [ [1,2,0,0,0] ], where the last 0 represents the time-weighted partition setting.

  • verbose (int) – Define the specificity of the print statements during this function’s execution.

Returns:

  • model – which is a pvops.timeseries.models.linear.Model object, has a useful attribute

  • estimators – which allows access to model performance and data splitting information.

  • train_df – which is the training split of prod_df

  • test_df – which is the testing split of prod_df

pvops.timeseries.models.linear.predicter(model, df, Y_parameter, X_parameters, prod_col_dict, verbose=0)[source]

timeseries.models.AIT module

class pvops.timeseries.models.AIT.AIT[source]

Bases: Processer, Predictor

predict(prod_df, prod_col_dict)[source]
predict_subset(prod_df, scaler, model_terms, prod_col_dict)[source]
pvops.timeseries.models.AIT.AIT_calc(prod_df, prod_col_dict)[source]

Calculates expected energy using measured irradiance based on trained regression model from field data. Plane-of-array irradiance is recommended when using the pre-trained AIT model.

Parameters:
  • prod_df (DataFrame) – A data frame corresponding to the production data

  • prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data

    • irradiance (string), should be assigned to irradiance column name in prod_df, where data should be in [W/m^2]

    • dcsize, (string), should be assigned to preferred column name for site capacity in prod_df

    • energyprod, (string), should be assigned to the column name holding the power or energy production. If this is passed, an evaluation will be provided.

    • baseline, (string), should be assigned to preferred column name to capture the calculations in prod_df

Example

production_col_dict = {'irradiance': 'irrad_poa_Wm2',
                    'ambient_temperature': 'temp_amb_C',
                    'dcsize': 'capacity_DC_kW',
                    'energyprod': 'energy_generated_kWh',
                    'baseline': 'predicted'
                    }
data = AIT_calc(data, production_col_dict)
Returns:

DataFrame – A data frame for production data with a new column, the predicted energy

class pvops.timeseries.models.AIT.Predictor[source]

Bases: object

Predictor class

apply_additive_polynomial_model(model_terms, Xs)[source]

Predict energy using a model derived by pvOps.

Parameters:
  • df (dataframe) – Data containing columns with the values in the prod_col_dict

  • model_terms (list of tuples) – Contain model coefficients and powers. For example,

    [(0.29359785963294494, [1, 0]),
    (0.754806343190528, [0, 1]),
    (0.396833207207238, [1, 1]),
    (-0.0588375219110795, [0, 0])]
    
  • prod_col_dict (dict) – Dictionary mapping nicknamed parameters to the named parameters in the dataframe df.

Returns:

Array of predicted energy values

evaluate(real, pred)[source]
class pvops.timeseries.models.AIT.Processer[source]

Bases: object

check_data(data, prod_col_dict)[source]

timeseries.models.iec module

pvops.timeseries.models.iec.iec_calc(prod_df, prod_col_dict, meta_df, meta_col_dict, gi_ref=1000.0)[source]

Calculates expected energy using measured irradiance based on IEC calculations.

Parameters:
  • prod_df (DataFrame) – A data frame corresponding to the production data after having been processed by the perf_om_NA_qc and overlappingDFs functions. This data frame needs at least the columns specified in prod_col_dict.

  • prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data

    • siteid (string), should be assigned to site-ID column name in prod_df

    • timestamp (string), should be assigned to time-stamp column name in prod_df

    • irradiance (string), plane-of-array. Should be assigned to irradiance column name in prod_df, where data should be in [W/m^2].

    • baseline (string), should be assigned to preferred column name to capture IEC calculations in prod_df

    • dcsize, (string), should be assigned to preferred column name for site capacity in prod_df

  • meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.

  • meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data

    • siteid (string), should be assigned to site-ID column name

    • dcsize (string), should be assigned to column name corresponding to site capacity, where data is in [kW]

  • gi_ref (float) – reference plane of array irradiance in W/m^2 at which a site capacity is determined (default value is 1000 [W/m^2])

Returns:

DataFrame – A data frame for production data with a new column, iecE, which is the predicted energy calculated based on the IEC standard using measured irradiance data