eemeter.modeling.formatters

The formatter classes are designed to provide a standard interface to model fit and predict methods. The formatters add weather data to daily or monthly energy data. The interface assumes that the model class will be responsible for applying data sufficiency rules and additional formatting necessary for performing model fits or predictions.

class eemeter.modeling.formatters.ModelDataFormatter(freq_str)[source]

Formatter for model data of known or predictable frequency. Basic usage:

>>> formatter = ModelDataFormatter("D")
>>> formatter.create_input(energy_trace, weather_source)
                           energy tempF
2013-06-01 00:00:00+00:00    3.10  74.3
2013-06-02 00:00:00+00:00    2.42  71.0
2013-06-03 00:00:00+00:00    1.38  73.1
                                   ...
2016-05-27 00:00:00+00:00    0.11  71.1
2016-05-28 00:00:00+00:00    0.04  78.1
2016-05-29 00:00:00+00:00    0.21  69.6
>>> index = pd.date_range('2013-01-01', periods=365, freq='D')
>>> formatter.create_input(index, weather_source)
                           tempF
2013-01-01 00:00:00+00:00   28.3
2013-01-02 00:00:00+00:00   31.0
2013-01-03 00:00:00+00:00   34.1
                            ...
2013-12-29 00:00:00+00:00   12.3
2013-12-30 00:00:00+00:00   26.0
2013-12-31 00:00:00+00:00   24.1
create_demand_fixture(index, weather_source)[source]

Creates a DatetimeIndex ed dataframe containing formatted demand fixture data.

Parameters:
  • index (pandas.DatetimeIndex) – The desired index for demand fixture data.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather fixture data.
Returns:

input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.predict() methods.

Return type:

pandas.DataFrame

create_input(trace, weather_source)[source]

Creates a DatetimeIndex ed dataframe containing formatted model input data formatted as follows.

Parameters:
  • trace (eemeter.structures.EnergyTrace) – The source of energy data for inclusion in model input.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather data.
Returns:

input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.fit() methods.

Return type:

pandas.DataFrame

daily_trace_data(trace)[source]

Transforms a trace for this formatter to a daily series

get_input_data_mask(input_data)[source]

Boolean list of missing/not missing values: True => missing False => not missing

hourly_trace_data(trace)[source]

Transforms a trace for this formatter to an hourly series

serialize_demand_fixture(demand_fixture_data)[source]

Serialize demand fixture data

serialize_input(input_data)[source]

Serialize input data

class eemeter.modeling.formatters.ModelDataBillingFormatter[source]

Formatter for model data of unknown or unpredictable frequency. Basic usage:

>>> formatter = ModelDataBillingFormatter()
>>> energy_trace = EnergyTrace(
        "ELECTRICITY_CONSUMPTION_SUPPLIED",
        pd.DataFrame(
            {
                "value": [1, 1, 1, 1, np.nan],
                "estimated": [False, False, True, False, False]
            },
            index=[
                datetime(2011, 1, 1, tzinfo=pytz.UTC),
                datetime(2011, 2, 1, tzinfo=pytz.UTC),
                datetime(2011, 3, 2, tzinfo=pytz.UTC),
                datetime(2011, 4, 3, tzinfo=pytz.UTC),
                datetime(2011, 4, 29, tzinfo=pytz.UTC),
            ],
            columns=["value", "estimated"]
        ),
        unit="KWH")
>>> trace_data, temp_data = formatter.create_input(energy_trace, weather_source)
>>> trace_data
2011-01-01 00:00:00+00:00    1.0
2011-02-01 00:00:00+00:00    1.0
2011-03-02 00:00:00+00:00    2.0
2011-04-29 00:00:00+00:00    NaN
dtype: float64
>>> temp_data
period                    hourly
2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00  32.0
                          2011-01-01 01:00:00+00:00  32.0
                          2011-01-01 02:00:00+00:00  32.0
...                                                   ...
2011-03-02 00:00:00+00:00 2011-04-28 21:00:00+00:00  32.0
                          2011-04-28 22:00:00+00:00  32.0
                          2011-04-28 23:00:00+00:00  32.0
>>> index = pd.date_range('2013-01-01', periods=365, freq='D')
>>> formatter.create_input(index, weather_source)
                           tempF
2013-01-01 00:00:00+00:00   28.3
2013-01-02 00:00:00+00:00   31.0
2013-01-03 00:00:00+00:00   34.1
                            ...
2013-12-29 00:00:00+00:00   12.3
2013-12-30 00:00:00+00:00   26.0
2013-12-31 00:00:00+00:00   24.1
create_demand_fixture(index, weather_source)[source]

Creates a DatetimeIndex ed dataframe containing formatted demand fixture data.

Parameters:
  • index (pandas.DatetimeIndex) – The desired index for demand fixture data.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather fixture data.
Returns:

input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.predict() methods.

Return type:

pandas.DataFrame

create_input(trace, weather_source)[source]

Creates two DatetimeIndex ed dataframes containing formatted model input data formatted as follows.

Parameters:
  • trace (eemeter.structures.EnergyTrace) – The source of energy data for inclusion in model input.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather data.
Returns:

  • trace_data (pandas.DataFrame) – Predictably formatted trace data with estimated data removed. This data should be directly usable as input to applicable model.fit() methods.

  • temperature_data (pandas.DataFrame) – Predictably formatted temperature data with a pandas MultiIndex. The MultiIndex contains two levels - ‘period’, which corresponds directly to the trace_data index, and ‘hourly’ or ‘daily’, which contains, respectively, hourly or daily temperature data. This is intended for use like the following:

    >>> temperature_data.groupby(level='period')
    

    This data should be directly usable as input to applicable model.fit() methods.

daily_trace_data(trace)[source]

Transforms a trace for this formatter to a daily series

get_input_data_mask(input_data)[source]

Boolean list of missing/not missing values: True => missing False => not missing

hourly_trace_data(trace)[source]

Transforms a trace for this formatter to a hourly series

eemeter.modeling.models

class eemeter.modeling.models.seasonal.SeasonalElasticNetCVModel(cooling_base_temp=65, heating_base_temp=65, n_bootstrap=100, modeling_period_interpretation='baseline')[source]

Linear regression using daily frequency data to build a model of formatted energy trace data that takes into account HDD, CDD, day of week, month, and holiday effects, with elastic net regularization.

Parameters:
  • cooling_base_temp (float) – Base temperature (degrees F) used in calculating cooling degree days.
  • heating_base_temp (float) – Base temperature (degrees F) used in calculating heating degree days.
  • n_bootstrap (int) – Number of points to exclude during bootstrap error estimation.
class eemeter.modeling.models.billing.BillingElasticNetCVModel(cooling_base_temp=65, heating_base_temp=65, n_bootstrap=100, modeling_period_interpretation='baseline')[source]

Linear regression of energy values against CDD/HDD with elastic net regularization.

Parameters:
  • cooling_base_temp (float) – Base temperature (degrees F) used in calculating cooling degree days.
  • heating_base_temp (float) – Base temperature (degrees F) used in calculating heating degree days.
  • n_bootstrap (int) – Number of points to exclude during bootstrap error estimation.
class eemeter.modeling.models.caltrack.CaltrackMonthlyModel(fit_cdd=True, grid_search=False, min_contiguous_baseline_months=12, min_contiguous_reporting_months=12, modeling_period_interpretation='baseline', weighted=False, **kwargs)[source]

This class implements the two-stage modeling routine agreed upon as part of the Caltrack beta test.

If fit_cdd is True, then all four candidate models (HDD+CDD, CDD-only, HDD-only, and Intercept-only) are used in stage 1 estimation. If it’s false, then only HDD-only and Intercept-only are used.

If grid_search is set to True, the balance point temperatures are determined by maximizing R^2 across the range 50-85 degF. Otherwise, 70 and 60 degF are used for cooling and heating, respectively.

Min_contiguous_months sets the number of contiguous months of data required at the beginning of the reporting period/end of the baseline period in order for the weather normalization to be valid.

billing_to_monthly_avg(trace_and_temp)[source]

Helper function to handle monthly billing or other irregular data.

daily_to_monthly_avg(df)[source]

Convert from daily usage and temperature to monthly usage per day and average HDD/CDD.

predict(demand_fixture_data, params=None, summed=True)[source]

Predicts across index using fitted model params

Parameters:
  • demand_fixture_data (pandas.DataFrame) – Formatted input data as returned by CaltrackFormatter.create_demand_fixture()
  • params (dict, default None) –

    Parameters found during model fit. If None, .fit() must be called before this method can be used.

    • X_design_matrix: patsy design matrix used in formatting design matrix.
    • formula: patsy formula used in creating design matrix.
    • coefficients: ElasticNetCV coefficients.
    • intercept: ElasticNetCV intercept.
Returns:

output – Dataframe of energy values as given by the fitted model across the index given in demand_fixture_data.

Return type:

pandas.DataFrame