timeseries module
timeseries.preprocess module
- pvops.timeseries.preprocess.establish_solar_loc(prod_df, prod_col_dict, meta_df, meta_col_dict)[source]
Adds solar position column using pvLib.
- Parameters:
prod_df (DataFrame) – A data frame corresponding to production data containing a datetime index.
prod_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
siteid (string), should be assigned to site-ID column name in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present. The index must contain the site IDs used in prod_df.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
longitude (string), should be assigned to site’s longitude
latitude (string), should be assigned to site’s latitude
- Returns:
Original dataframe (copied) with new timeseries solar position data using
the same column name definitions provided in pvLib.
- pvops.timeseries.preprocess.normalize_production_by_capacity(prod_df, prod_col_dict, meta_df, meta_col_dict)[source]
Normalize power by capacity. This preprocessing step is meant as a step prior to a modeling attempt where a model is trained on multiple sites simultaneously.
- Parameters:
prod_df (DataFrame) – A data frame corresponding to production data.
prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
energyprod (string), should be assigned to production data in prod_df
siteid (string), should be assigned to site-ID column name in prod_df
capacity_normalized_power (string), should be assigned to a column name where the normalized output signal will be stored
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
siteid (string), should be assigned to site-ID column name
dcsize (string), should be assigned to column name corresponding to site’s DC size
- Returns:
prod_df (DataFrame) – normalized production data
- pvops.timeseries.preprocess.prod_inverter_clipping_filter(prod_df, prod_col_dict, meta_df, meta_col_dict, model, **kwargs)[source]
Filter rows of production data frame according to performance and data quality
- Parameters:
prod_df (DataFrame) – A data frame corresponding to production data.
prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
timestamp (string), should be assigned to associated time-stamp column name in prod_df
siteid (string), should be assigned to site-ID column name in prod_df
powerprod (string), should be assigned to associated power production column name in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
siteid (string), should be assigned to site-ID column name
latitude (string), should be assigned to column name corresponding to site’s latitude
longitude (string), should be assigned to column name corresponding to site’s longitude
model (str) – A string distinguishing the inverter clipping detection model programmed in pvanalytics. Available options: [‘geometric’, ‘threshold’, ‘levels’]
kwargs – Extra parameters passed to the relevant pvanalytics model. If none passed, defaults are used.
- Returns:
prod_df (DataFrame) – If drop=True, a filtered dataframe with clipping periods removed is returned.
- pvops.timeseries.preprocess.prod_irradiance_filter(prod_df, prod_col_dict, meta_df, meta_col_dict, drop=True, irradiance_type='ghi', csi_max=1.1)[source]
Filter rows of production data frame according to performance and data quality.
THIS METHOD IS CURRENTLY IN DEVELOPMENT.
- Parameters:
prod_df (DataFrame) – A data frame corresponding to production data.
prod_df_col_dict (dict of {str : str}) – A dictionary that contains the column names associated with the production data, which consist of at least:
timestamp (string), should be assigned to associated time-stamp column name in prod_df
siteid (string), should be assigned to site-ID column name in prod_df
irradiance (string), should be assigned to associated irradiance column name in prod_df
clearsky_irr (string), should be assigned to clearsky irradiance column name in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
siteid (string), should be assigned to site-ID column name
latitude (string), should be assigned to column name corresponding to site’s latitude
longitude (string), should be assigned to column name corresponding to site’s longitude
irradiance_type (str) – A string description of the irradiance_type which was passed in prod_df. Options: ghi, dni, dhi. In future, poa may be a feature.
csi_max (int) – A pvanalytics parameter of maximum ratio of measured to clearsky (clearsky index).
- Returns:
prod_df (DataFrame) – A dataframe with new clearsky_irr column. If drop=True, a filtered prod_df according to clearsky.
clearsky_mask (series) – Returns True for each value where the clearsky index is less than or equal to csi_mask
timeseries models
timeseries.models.linear module
- class pvops.timeseries.models.linear.DefaultModel(time_weighted=None, estimators=None, verbose=0, X_parameters=[])[source]
Bases:
Model
,TimeWeightedProcess
Generate a simple model using the input data, without any data transposition.
- class pvops.timeseries.models.linear.Model(estimators=None)[source]
Bases:
object
Linear model kernel
- class pvops.timeseries.models.linear.PolynomialModel(degree=2, estimators=None, time_weighted=None, verbose=0, X_parameters=[], exclude_params=[])[source]
Bases:
Model
,TimeWeightedProcess
Add all interactions between terms with a degree.
- class pvops.timeseries.models.linear.TimeWeightedProcess(verbose=0)[source]
Bases:
object
Generate time-oriented dummy variables for linear regression. Available timeframes include “month”, “season”, and “hour”.
- pvops.timeseries.models.linear.modeller(prod_col_dict, kernel_type='default', time_weighted='month', X_parameters=[], Y_parameter=None, estimators=None, prod_df=None, test_split=0.2, train_df=None, test_df=None, degree=3, exclude_params=[], verbose=0)[source]
Wrapper method to conduct the modelling of the timeseries data.
To input the data, there are two options.
Option 1: include full production data in prod_df parameter and test_split so that the test split is conducted
Option 2: conduct the test-train split prior to calling the function and pass in data under test_df and train_df
- Parameters:
prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data
siteid (string), should be assigned to site-ID column name in prod_df
timestamp (string), should be assigned to time-stamp column name in prod_df
irradiance (string), should be assigned to irradiance column name in prod_df, where data should be in [W/m^2]
baseline (string), should be assigned to preferred column name to capture model calculations in prod_df
dcsize, (string), should be assigned to preferred column name for site capacity in prod_df
powerprod, (string), should be assigned to the column name holding the power or energy production. This will be used as the output column if Y_parameter is not passed.
kernel_type (str) – Type of kernel type for the statistical model
‘default’, establishes a kernel where one component is instantiated in the model for each feature.
‘polynomial’, a paraboiloidal polynomial with a dynamic number of covariates (Xs) and degrees (n). For example, with 2 covariates and a degree of 2, the formula would be: Y(α , X) = α_0 + α_1 X_1 + α_2 X_2 + α_3 X_1 X_2 + α_4 X_1^2 + α_5 X_2^2
time_weighted (str or None) – Interval for time-based feature generation. For each interval in this time-weight, a dummy variable is established in the model prior to training. Options include:
if ‘hour’, establish discrete model components for each hour of day
if ‘month’, establish discrete model components for each month
if ‘season’, establish discrete model components for each season
if None, no time-weighted dummy-variable generation is conducted.
X_parameters (list of str) – List of prod_df column names used in the model
Y_parameter (str) – Optional, name of the y column. Defaults to prod_col_dict[‘powerprod’].
estimators (dict) – Optional, dictionary with key as regressor identifier (str) and value as a dictionary with key “estimator” and value the regressor instance following sklearn’s base model convention: sklearn_docs.
estimators = {'OLS': {'estimator': LinearRegression()}, 'RANSAC': {'estimator': RANSACRegressor()} }
prod_df (DataFrame) – A data frame corresponding to the production data used for model development and evaluation. This data frame needs at least the columns specified in prod_col_dict.
test_split (float) – A value between 0 and 1 indicating the proportion of data used for testing. Only utilized if prod_df is specified. If you want to specify your own test-train splits, pass values to test_df and train_df.
test_df (DataFrame) – A data frame corresponding to the test-split of the production data. Only needed if prod_df and test_split are not specified.
train_df (DataFrame) – A data frame corresponding to the test-split of the production data. Only needed if prod_df and test_split are not specified.
degree (int) – Utilized for ‘polynomial’ and ‘polynomial_log’ kernel_type options, this parameter defines the highest degree utilized in the polynomial kernel.
exclude_params (list) – A list of parameter definitions (defined as lists) to be excluded in the model. For example, if want to exclude a parameter in a 4-covariate model that uses 1 degree on first covariate, 2 degrees on second covariate, and no degrees for 3rd and 4th covariates, you would specify a exclude_params as
[ [1,2,0,0] ]
. Multiple definitions can be added to list depending on how many terms need to be excluded.If a time_weighted parameter is selected, a time weighted definition will need to be appended to each exclusion definition. Continuing the example above, if one wants to exclude “hour 0” for the same term, then the exclude_params must be
[ [1,2,0,0,0] ]
, where the last 0 represents the time-weighted partition setting.verbose (int) – Define the specificity of the print statements during this function’s execution.
- Returns:
model – which is a
pvops.timeseries.models.linear.Model
object, has a useful attributeestimators – which allows access to model performance and data splitting information.
train_df – which is the training split of prod_df
test_df – which is the testing split of prod_df
timeseries.models.AIT module
- pvops.timeseries.models.AIT.AIT_calc(prod_df, prod_col_dict)[source]
Calculates expected energy using measured irradiance based on trained regression model from field data. Plane-of-array irradiance is recommended when using the pre-trained AIT model.
- Parameters:
prod_df (DataFrame) – A data frame corresponding to the production data
prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data
irradiance (string), should be assigned to irradiance column name in prod_df, where data should be in [W/m^2]
dcsize, (string), should be assigned to preferred column name for site capacity in prod_df
energyprod, (string), should be assigned to the column name holding the power or energy production. If this is passed, an evaluation will be provided.
baseline, (string), should be assigned to preferred column name to capture the calculations in prod_df
Example
production_col_dict = {'irradiance': 'irrad_poa_Wm2', 'ambient_temperature': 'temp_amb_C', 'dcsize': 'capacity_DC_kW', 'energyprod': 'energy_generated_kWh', 'baseline': 'predicted' } data = AIT_calc(data, production_col_dict)
- Returns:
DataFrame – A data frame for production data with a new column, the predicted energy
- class pvops.timeseries.models.AIT.Predictor[source]
Bases:
object
Predictor class
- apply_additive_polynomial_model(model_terms, Xs)[source]
Predict energy using a model derived by pvOps.
- Parameters:
df (dataframe) – Data containing columns with the values in the prod_col_dict
model_terms (list of tuples) – Contain model coefficients and powers. For example,
[(0.29359785963294494, [1, 0]), (0.754806343190528, [0, 1]), (0.396833207207238, [1, 1]), (-0.0588375219110795, [0, 0])]
prod_col_dict (dict) – Dictionary mapping nicknamed parameters to the named parameters in the dataframe df.
- Returns:
Array of predicted energy values
timeseries.models.iec module
- pvops.timeseries.models.iec.iec_calc(prod_df, prod_col_dict, meta_df, meta_col_dict, gi_ref=1000.0)[source]
Calculates expected energy using measured irradiance based on IEC calculations.
- Parameters:
prod_df (DataFrame) – A data frame corresponding to the production data after having been processed by the perf_om_NA_qc and overlappingDFs functions. This data frame needs at least the columns specified in prod_col_dict.
prod_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the production data
siteid (string), should be assigned to site-ID column name in prod_df
timestamp (string), should be assigned to time-stamp column name in prod_df
irradiance (string), plane-of-array. Should be assigned to irradiance column name in prod_df, where data should be in [W/m^2].
baseline (string), should be assigned to preferred column name to capture IEC calculations in prod_df
dcsize, (string), should be assigned to preferred column name for site capacity in prod_df
meta_df (DataFrame) – A data frame corresponding to site metadata. At the least, the columns in meta_col_dict be present.
meta_col_dict (dict of {str : str}) – A dictionary that contains the column names relevant for the meta-data
siteid (string), should be assigned to site-ID column name
dcsize (string), should be assigned to column name corresponding to site capacity, where data is in [kW]
gi_ref (float) – reference plane of array irradiance in W/m^2 at which a site capacity is determined (default value is 1000 [W/m^2])
- Returns:
DataFrame – A data frame for production data with a new column, iecE, which is the predicted energy calculated based on the IEC standard using measured irradiance data