Basic Usage: eemeter package

This tutorial is also available as a jupyter notebook

Note:

Most users of the EEmeter stack do not directly use the eemeter package for loading their data. Instead, they use the datastore application, which uses the eemeter internally. To learn to use the datastore, head over to the datastore basic usage tutorial.

Running a meter

Please download a preformatted input file.

We can load this input file into memory with the following:

In [1]:
import json

with open('meter_input_example.json', 'r') as f:  # modify to point to your downloaded input file.
    meter_input = json.load(f)

The file has a single trace of hourly electricity consumption data and some associated project data. Its contents looks like this:

In [2]:
!head -15 meter_input_example.json
{
  "type": "SINGLE_TRACE_SIMPLE_PROJECT",
  "trace": {
    "type": "ARBITRARY_START",
    "interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
    "unit": "KWH",
    "trace_id": "TRACE_ID_123",
    "interval": "daily",
    "records": [
      {
        "start": "2011-01-01T00:00:00+00:00",
        "value": 57.8,
        "estimated": false
      },
      {
In [3]:
!tail -25 meter_input_example.json
        "estimated": false
      },
      {
        "start": "2015-01-01T00:00:00+00:00",
        "value": null,
        "estimated": false
      }
    ]
  },
  "project": {
    "type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP",
    "zipcode": "50321",
    "project_id": "PROJECT_ID_ABC",
    "modeling_period_group": {
      "baseline_period": {
        "start": null,
        "end": "2013-06-01T00:00:00+00:00"
      },
      "reporting_period": {
        "start": "2013-07-01T00:00:00+00:00",
        "end": null
      }
    }
  }
}

Next, we can create a meter, model and formatter. These work in tandem to create a model of energy usage.

The meter coordinates loading the input data, matching it with appropriate weather data, and passing it to the formatter and model. It then uses these to calculate a set of outputs, including energy savings estimates such as annualized weather normalized usage.

The formatter formats the trace and project data for use within the model.

The model fits a model of energy usage to this formatted data which can be used, given covariate weather data, to predict or model energy usage over an arbitrary period of time.

In [4]:
from eemeter.ee.meter import EnergyEfficiencyMeter
from eemeter.modeling.models import CaltrackMonthlyModel
from eemeter.modeling.formatters import ModelDataFormatter

meter = EnergyEfficiencyMeter()
model = (CaltrackMonthlyModel, {"fit_cdd": False, "grid_search": True})
formatter = (ModelDataFormatter, {"freq_str": "D"})

The meter we created is an instance of the EEmeter class which operates on single energy traces.

The model we created is a tuple of (model class, model keyword arguments), not an instantiation of the model. We do it this way to allow easy creation of multiple instances of the model class.

The formatter is, like the model, a tuple of (formatter class, formatter keyword arguments), for the same reason - we want to make multiple instances of the formatter class.

These can be used directly to “evaluate” the meter on the meter input. We’ll store the output in meter_output.

In [5]:
meter_output = meter.evaluate(meter_input, model=model, formatter=formatter)

This meter_ouput is quite verbose, so we’ll export it to a json file which is a bit more readable.

In [6]:
with open('meter_output_example.json', 'w') as f:  # change this path if desired.
    json.dump(meter_output, f, indent=2)

The content of this file will look something like this:

In [7]:
!head -40 meter_output_example.json
{
  "status": "SUCCESS",
  "failure_message": null,
  "logs": [
    "Using weather_source ISDWeatherSource(\"725460\")",
    "Using weather_normal_source TMY3WeatherSource(\"725460\")"
  ],
  "eemeter_version": "0.5.3",
  "model_class": "CaltrackMonthlyModel",
  "model_kwargs": {
    "fit_cdd": false,
    "grid_search": true
  },
  "formatter_class": "ModelDataFormatter",
  "formatter_kwargs": {
    "freq_str": "D"
  },
  "weather_source_station": "725460",
  "weather_normal_source_station": "725460",
  "derivatives": [
    {
      "modeling_period_group": [
        "baseline",
        "reporting"
      ],
      "series": "Cumulative baseline model minus reporting model, normal year",
      "description": "Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed.",
      "orderable": [
        null
      ],
      "value": [
        2479.015638036155
      ],
      "variance": [
        7354.084609086982
      ]
    },
    {
      "modeling_period_group": [
        "baseline",

Note how this file is organized: it contains a summary of the operations done during meter execution, including everything necessary to recreate the meter run, like the model class and keyword arguments used to initialize it, and the weather data (degrees F, called “demand_fixture”) that was used in model building.

Not everyone has data ready to go, so if you are in that bucket, the next section covers how you can get started with data of your own.

Data preparation

All we’ll be doing in this section is creating a data structure that has the same format as meter_input_example.json file above. We are using the eemeter EnergyTrace helper structure.

Of course, this is not the only way to get data into the necessary format; use this for inspiration, but make changes as necessary to accomodate the particulars of your dataset.

In [8]:
# library imports
from eemeter.structures import EnergyTrace
from eemeter.io.serializers import ArbitraryStartSerializer
from eemeter.ee.meter import EnergyEfficiencyMeter
import pandas as pd
import pytz

First, we import the energy data from the sample CSV and transform it into records

In [9]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
                          parse_dates=['date'], dtype={'zipcode': str})
records = [{
    "start": pytz.UTC.localize(row.date.to_datetime()),
    "value": row.value,
    "estimated": row.estimated,
} for _, row in energy_data.iterrows()]

The records we created look like this:

In [10]:
records[:3]  # the first three records
Out[10]:
[{'estimated': False,
  'start': datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<UTC>),
  'value': 57.8},
 {'estimated': False,
  'start': datetime.datetime(2011, 1, 2, 0, 0, tzinfo=<UTC>),
  'value': 64.8},
 {'estimated': False,
  'start': datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
  'value': 49.5}]

Next, we load our records into an EnergyTrace. We give it units "KWH" and interpretation "ELECTRICITY_CONSUMPTION_SUPPLIED", which means that this is electricity consumed by the building and supplied by a utility (rather than by solar panels or other on-site generation). We also pass in an instance of the record serializer ArbitraryStartSerializer to show it how to interpret the records.

In [11]:
energy_trace = EnergyTrace(
    records=records,
    unit="KWH",
    interpretation="ELECTRICITY_CONSUMPTION_SUPPLIED",
    serializer=ArbitraryStartSerializer(),
    trace_id='TRACE_ID_123',
    interval='daily'
)

The energy trace data we created looks like this:

In [12]:
energy_trace.data[:3]  # first three records
Out[12]:
value estimated
2011-01-01 00:00:00+00:00 57.8 False
2011-01-02 00:00:00+00:00 64.8 False
2011-01-03 00:00:00+00:00 49.5 False

Now we load the rest of the project data from the sample project data CSV. This CSV includes the project_id (we don’t use it in this tutorial, but this is how you might identify the saved meter results), the ZIP code of the building, and the dates retrofit work for this project started and completed.

In [13]:
project_data = pd.read_csv('sample-project-data.csv',
                           parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]

Here’s what our project data looks like.

In [14]:
project_data
Out[14]:
project_id                             ABC
zipcode                              50321
retrofit_start_date    2013-06-01 00:00:00
retrofit_end_date      2013-07-01 00:00:00
Name: 0, dtype: object
In [15]:
zipcode = "{:05d}".format(project_data.zipcode)
retrofit_start_date = pytz.UTC.localize(project_data.retrofit_start_date)
retrofit_end_date = pytz.UTC.localize(project_data.retrofit_end_date)

Here’s an example of how to get this data into the format the meter expects (exactly the format of the meter_input_example.json from above).

In [16]:
from collections import OrderedDict

def serialize_meter_input(trace, zipcode, retrofit_start_date, retrofit_end_date):

    data = OrderedDict([
        ("type", "SINGLE_TRACE_SIMPLE_PROJECT"),
        ("trace", trace_serializer(trace)),
        ("project", project_serializer(zipcode, retrofit_start_date, retrofit_end_date)),
    ])

    return data


def trace_serializer(trace):
    data = OrderedDict([
        ("type", "ARBITRARY_START"),
        ("interpretation", trace.interpretation),
        ("unit", trace.unit),
        ("trace_id", trace.trace_id),
        ("interval", trace.interval),
        ("records", [
            OrderedDict([
                ("start", start.isoformat()),
                ("value", record.value if pd.notnull(record.value) else None),
                ("estimated", bool(record.estimated)),
            ])
            for start, record in trace.data.iterrows()
        ]),
    ])
    return data


def project_serializer(zipcode, retrofit_start_date, retrofit_end_date):
    data = OrderedDict([
        ("type", "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP"),
        ("zipcode", zipcode),
        ("project_id", 'PROJECT_ID_ABC'),
        ("modeling_period_group", OrderedDict([
            ("baseline_period", OrderedDict([
                ("start", None),
                ("end", retrofit_start_date.isoformat()),
            ])),
            ("reporting_period", OrderedDict([
                ("start", retrofit_end_date.isoformat()),
                ("end", None),
            ]))
        ]))
    ])
    return data
In [17]:
my_meter_input = serialize_meter_input(
    energy_trace, zipcode, retrofit_start_date, retrofit_end_date)
In [18]:
with open('my_meter_input.json', 'w') as f:
    json.dump(my_meter_input, f, indent=2)
In [19]:
!head -15 my_meter_input.json
{
  "type": "SINGLE_TRACE_SIMPLE_PROJECT",
  "trace": {
    "type": "ARBITRARY_START",
    "interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
    "unit": "KWH",
    "trace_id": "TRACE_ID_123",
    "interval": "daily",
    "records": [
      {
        "start": "2011-01-01T00:00:00+00:00",
        "value": 57.8,
        "estimated": false
      },
      {
In [20]:
!tail -25 my_meter_input.json
        "estimated": false
      },
      {
        "start": "2015-01-01T00:00:00+00:00",
        "value": null,
        "estimated": false
      }
    ]
  },
  "project": {
    "type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP",
    "zipcode": "50321",
    "project_id": "PROJECT_ID_ABC",
    "modeling_period_group": {
      "baseline_period": {
        "start": null,
        "end": "2013-06-01T00:00:00+00:00"
      },
      "reporting_period": {
        "start": "2013-07-01T00:00:00+00:00",
        "end": null
      }
    }
  }
}

Now we can run this through the meter exactly the same way we did before:

In [21]:
my_meter_output = meter.evaluate(my_meter_input, model=model, formatter=formatter)

Inspecting results

Now that we have some results at our fingertips, let’s inspect them. We’ll be using the meter output from the first example trace.

The output is mostly made up of a set of “derivatives”. These aren’t derivatives in the calculus sense - they’re just derived from the model output.

Let’s take a look at the first one.

In [22]:
derivative = meter_output["derivatives"][0]

We can take a peek at the contents by looking at the keys of the dict.

In [23]:
[k for k in derivative.keys()]
Out[23]:
['modeling_period_group',
 'series',
 'description',
 'orderable',
 'value',
 'variance']

Each derivative is a series with a name and a description

In [24]:
derivative['series'], derivative['description']
Out[24]:
('Cumulative baseline model minus reporting model, normal year',
 'Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed.')

The values associated with the derivative are stored in value, their variances are stored in variance, and the orderables act as keys. A single orderable of None indicates (as in this case) that the value and variance are singleton values.

In [25]:
derivative['orderable'], derivative['value'], derivative['variance']
Out[25]:
([None], [2479.015638036155], [7354.0846090869818])

Other derivatives are computed as well:

In [26]:
print(json.dumps([(d['series'], d['description']) for d in sorted(meter_output["derivatives"], key=lambda o: o['series'])], indent=2))
[
  [
    "Baseline model minus observed, reporting period",
    "Predicted usage according to the baseline model minus observed usage over the reporting period."
  ],
  [
    "Baseline model minus reporting model, normal year",
    "Predicted usage according to the baseline model over the normal weather year, minus the predicted usage according to the reporting model over the normal weather year."
  ],
  [
    "Baseline model, baseline period",
    "Predicted usage according to the baseline model over the baseline period."
  ],
  [
    "Baseline model, normal year",
    "Predicted usage according to the baseline model over the normal weather year."
  ],
  [
    "Baseline model, reporting period",
    "Predicted usage according to the baseline model over the reporting period."
  ],
  [
    "Cumulative baseline model minus observed, reporting period",
    "Total predicted usage according to the baseline model minus observed usage over the reporting period. Days for which reporting period weather data or usage do not exist are removed."
  ],
  [
    "Cumulative baseline model minus reporting model, normal year",
    "Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed."
  ],
  [
    "Cumulative baseline model, normal year",
    "Total predicted usage according to the baseline model over the normal weather year. Days for which normal year weather data does not exist are removed."
  ],
  [
    "Cumulative baseline model, reporting period",
    "Total predicted usage according to the baseline model over the reporting period. Days for which reporting period weather data does not exist are removed."
  ],
  [
    "Cumulative observed, baseline period",
    "Total observed usage over the baseline period. Days for which weather data does not exist are NOT removed."
  ],
  [
    "Cumulative observed, reporting period",
    "Total observed usage over the reporting period. Days for which weather data does not exist are NOT removed."
  ],
  [
    "Cumulative reporting model, normal year",
    "Total predicted usage according to the reporting model over the reporting period. Days for which normal year weather data does not exist are removed."
  ],
  [
    "Inclusion mask, baseline period",
    "Mask for baseline period data which is included in model and savings cumulatives."
  ],
  [
    "Inclusion mask, reporting period",
    "Mask for reporting period data which is included in model and savings cumulatives."
  ],
  [
    "Observed, baseline period",
    "Observed usage over the baseline period."
  ],
  [
    "Observed, project period",
    "Observed usage over the project period."
  ],
  [
    "Observed, reporting period",
    "Observed usage over the reporting period."
  ],
  [
    "Reporting model, normal year",
    "Predicted usage according to the reporting model over the reporting period."
  ],
  [
    "Reporting model, reporting period",
    "Predicted usage according to the reporting model over the reporting period."
  ],
  [
    "Temperature, baseline period",
    "Observed temperature (degF) over the baseline period."
  ],
  [
    "Temperature, normal year",
    "Observed temperature (degF) over the normal year."
  ],
  [
    "Temperature, reporting period",
    "Observed temperature (degF) over the reporting period."
  ]
]