Basic Usage
===========

Loading sample data
-------------------

EEMeter comes packages with some simulated sample data.

.. note::

    This data is not to be used for methods testing! It is designed to have
    obvious (but completely unrealistic) behavior to showcase building
    temperature response.

See a list of available sample data files, use :any:`eemeter.samples`::

    >>> eemeter.samples()
    ['il-electricity-cdd-hdd-hourly',
     'il-electricity-cdd-hdd-daily',
     'il-electricity-cdd-hdd-billing_monthly',
     'il-electricity-cdd-hdd-billing_bimonthly',
     'il-electricity-cdd-only-hourly',
     'il-electricity-cdd-only-daily',
     'il-electricity-cdd-only-billing_monthly',
     'il-electricity-cdd-only-billing_bimonthly',
     'il-gas-hdd-only-hourly',
     'il-gas-hdd-only-daily',
     'il-gas-hdd-only-billing_monthly',
     'il-gas-hdd-only-billing_bimonthly',
     'il-gas-intercept-only-hourly',
     'il-gas-intercept-only-daily',
     'il-gas-intercept-only-billing_monthly',
     'il-gas-intercept-only-billing_bimonthly']

Load meter data, temperature data, and metadata, use :any:`eemeter.load_sample`::

    >>> meter_data, temperature_data, metadata = \
    ...     eemeter.load_sample('il-electricity-cdd-hdd-daily')
    >>> meter_data.head()
                               value
    start
    2015-11-22 00:00:00+00:00  32.34
    2015-11-23 00:00:00+00:00  23.80
    2015-11-24 00:00:00+00:00  26.26
    2015-11-25 00:00:00+00:00  21.32
    2015-11-26 00:00:00+00:00   6.70
    >>> temperature_data.head()
    dt
    2015-11-22 06:00:00+00:00    21.01
    2015-11-22 07:00:00+00:00    20.35
    2015-11-22 08:00:00+00:00    19.38
    2015-11-22 09:00:00+00:00    19.02
    2015-11-22 10:00:00+00:00    17.82
    Name: tempF, dtype: float64

The metadata :any:`dict` contains simulated project ground truth, such as roughly
expected disaggregated annual usage, savings, and project dates.

Loading data from CSV
---------------------

Default meter data CSV format::

    start,value
    2015-11-22T00:00:00+00:00,32.34
    2015-11-23T00:00:00+00:00,23.80
    2015-11-24T00:00:00+00:00,26.26
    2015-11-25T00:00:00+00:00,21.32
    2015-11-26T00:00:00+00:00,6.70
    ...

To load meter data from a CSV, use :any:`eemeter.meter_data_from_csv`::

    >>> meter_data = eemeter.meter_data_from_csv(f)  # file handle

The :any:`eemeter.meter_data_from_csv` has lots of configurable options for
data that is formatted differently! Check out the API docs for more info.

Default temperature data CSV format::

    dt,tempF
    2015-11-22T00:00:00+06:00,21.01
    2015-11-22T01:00:00+06:00,20.35
    2015-11-22T02:00:00+06:00,19.38
    2015-11-22T03:00:00+06:00,19.02
    2015-11-22T04:00:00+06:00,17.82
    ...

To load temperature data from a CSV, use :any:`eemeter.temperature_data_from_csv`.
(See also :any:`EEweather <eeweather:index>`)::

    >>> temperature_data = eemeter.temperature_data_from_csv(f)  # file handle

The :any:`eemeter.temperature_data_from_csv` also has lots of configurable
options for data that is formatted differently! Check out the API docs for
more info.

These methods also work with gzipped files (e.g., the sample data)::

    >>> meter_data = eemeter.meter_data_from_csv(f, gzipped=True)

If frequency is known (``'hourly'``, ``'daily'``), this will load that data
with an index of the appropriate frequency. This helps the data formatting
methods do the right thing.

::

    >>> daily_meter_data = eemeter.meter_data_from_csv(f, freq='daily')

Creating design matrix datasets
-------------------------------

To create a design matrix, use one of the following functions:

- :any:`eemeter.create_caltrack_hourly_preliminary_design_matrix`
- :any:`eemeter.create_caltrack_hourly_segmented_design_matrices`
- :any:`eemeter.create_caltrack_daily_design_matrix`
- :any:`eemeter.create_caltrack_billing_design_matrix`

For example::

    >>> meter_data, temperature_data, metadata = \
    ...     eemeter.load_sample('il-electricity-cdd-hdd-daily')
    >>> data = eemeter.create_caltrack_daily_design_matrix(meter_data, temperature_data)


Running Daily and Billing CalTRACK methods
------------------------------------------

.. note::

    For complete compliance with CalTRACK methods, please ensure that input
    data meets requirements in section 2.1 of the CalTRACK methods
    specification and uses settings defined in :doc:`caltrack_compliance`.

To run the CalTRACK daily or billing methods, you need a :any:`pandas.DataFrame` with
the following columns:

- ``meter_value``: Daily average metered usage values for each point.
- ``cdd_<cooling_balance_point>``: Average period daily cooling degree days for
  a particular cooling balance point.
- ``hdd_<heating_balance_point>``: Average period daily heating degree days for
  a particular heating balance point.

For each balance point you want to include in the grid search, you must
provide a separate ``cdd_<>`` or ``hdd_<>`` column.

Armed with a design matrix of the form created above, you can use
:any:`eemeter.fit_caltrack_usage_per_day_model` to fit a model.

You may also wish to filter your data to a baseline period or a reporting
period. To do so, use :any:`eemeter.get_baseline_data` or
:any:`eemeter.get_reporting_data`. For example::

    >>> import datetime
    >>> import pytz
    >>> datetime.datetime(2016, 12, 26, 0, 0, tzinfo=pytz.UTC)
    >>> baseline_data, warnings = eemeter.get_baseline_data(
    ...     data, end=baseline_end_date, max_days=365)
    >>> print(baseline_data.head())
                               meter_value  cdd_70     hdd_60     hdd_61  \
    2015-12-27 00:00:00+00:00        25.55     0.0  18.093333  19.093333   
    2015-12-28 00:00:00+00:00        26.46     0.0  22.478333  23.478333   
    2015-12-29 00:00:00+00:00        30.38     0.0  25.003333  26.003333   
    2015-12-30 00:00:00+00:00        49.82     0.0  29.161667  30.161667   
    2015-12-31 00:00:00+00:00        34.47     0.0  29.572917  30.572917   

                               n_days_dropped  n_days_kept  
    2015-12-27 00:00:00+00:00             0.0          1.0  
    2015-12-28 00:00:00+00:00             0.0          1.0  
    2015-12-29 00:00:00+00:00             0.0          1.0  
    2015-12-30 00:00:00+00:00             0.0          1.0  
    2015-12-31 00:00:00+00:00             0.0          1.0  

CalTRACK Daily Methods
----------------------

Running caltrack daily methods is easy once you have the data in the right
format. This method returns a :any:`eemeter.CalTRACKUsagePerDayModelResults` object::

    >>> model_results = eemeter.fit_caltrack_usage_per_day_model(baseline_data)

This object can be dumped into a JSON string::

    >>> print(json.dumps(model_results.json(), indent=2))

It can be inspected for more detailed information::

    >>> model_results.totals_metrics.r_squared_adj
    0.7294645737524558

Or plotted (use with :any:`eemeter.plot_energy_signature` for an overlay on the
fitted data)::

    >>> model_results.plot()


CalTRACK Billing Methods
------------------------

Running caltrack billing methods::

    >>> model_results = eemeter.fit_caltrack_usage_per_day_model(
    ...     baseline_data, use_billing_presets=True)

It is essential that the data used in the CalTRACK billing methods is
*average daily* period usage (UPDm) and degree day values.

Data with this property is created by default by the
:any:`eemeter.create_caltrack_billing_design_matrix` method.


Using the CLI
-------------

The CLI can be used to run the caltrack methods directly against CSV data. To
allow users without immediate access to data to get started quickly with the
eemeter package, the CLI also allow using sample data that comes with eemeter.

Use CalTRACK methods on sample data::

    $ eemeter caltrack --sample=il-electricity-cdd-hdd-daily
    Loading sample: il-electricity-cdd-hdd-daily
    {
      "status": "SUCCESS",
      "method_name": "caltrack_daily_method",
      "model": {
        "model_type": "cdd_hdd",
        "formula": "meter_value ~ cdd_65 + hdd_55",
        "status": "QUALIFIED",
        "model_params": {
          "intercept": 10.733478866990144,
          "beta_cdd": 2.039525988684711,
          "beta_hdd": 1.0665644257451434,
          "cooling_balance_point": 65,
          "heating_balance_point": 55
        },
        "r_squared_adj": 0.7810065909435654,
        "warnings": []
      },
      "r_squared_adj": 0.7810065909435654,
      "warnings": [],
      "metadata": {},
      "settings": {
        "fit_cdd": true,
        "minimum_non_zero_cdd": 10,
        "minimum_non_zero_hdd": 10,
        "minimum_total_cdd": 20,
        "minimum_total_hdd": 20,
        "beta_cdd_maximum_p_value": 0.1,
        "beta_hdd_maximum_p_value": 0.1
      }
    }

Save output::

    $ eemeter caltrack --sample=il-electricity-cdd-only-billing_monthly \
    --output-file=/path/to/output.json
    Loading sample: il-electricity-cdd-only-billing_monthly
    Output written: /path/to/output.json

Load custom data (see sample files for example format)::

    $ eemeter caltrack --meter-file=/path/to/meter/data.csv \
    --temperature-file=/path/to/temperature/data.csv

Do not fit CDD models (intended for gas data)::

    $ eemeter caltrack --sample=il-gas-hdd-only-billing_monthly --no-fit-cdd

To include all candidate models in output::

    $ eemeter caltrack --sample=il-electricity-cdd-hdd-daily --show-candidates


Understanding eemeter warnings
------------------------------

The eemeter package tries to give warnings whenever a result is less than
perfect. Warnings appear throughout the eemeter and are given in the structure
:any:`eemeter.EEMeterWarning`.

Each warning has the following structure:

1. A 'dotted' hierarchical name (:any:`eemeter.EEMeterWarning.qualified_name`)
   summarizing its origin and nature. For example:
   ``'eemeter.caltrack_method.no_candidate_models'``
2. A full description of the error in prose
   (:any:`eemeter.EEMeterWarning.description`).
3. A set of relevant data about the error, such as limits that were passed (
   :any:`eemeter.EEMeterWarning.data`).


Visualization
-------------

Plotting results and models.

Plot an energy signature (:any:`eemeter.plot_energy_signature`)::

    >>> eemeter.plot_energy_signature(meter_data, temperature_data)

.. image:: _static/plot_energy_signature.png

Plot a time series of meter data and temperature data
(:any:`eemeter.plot_time_series`)::

    >>> eemeter.plot_time_series(meter_data, temperature_data)

.. image:: _static/plot_time_series.png

Plot the selected model and all candidate models (:any:`eemeter.CalTRACKUsagePerDayModelResults.plot`)
on top of an energy signature::

    >>> ax = eemeter.plot_energy_signature(meter_data, temperature_data)
    >>> model_results.plot(ax=ax, with_candidates=True)

.. image:: _static/plot_model_results.png

Plot a single candidate model (:any:`eemeter.CalTRACKUsagePerDayCandidateModel.plot`)::

    >>> model_results.model.plot()

.. image:: _static/plot_candidate.png

The plot functions are flexible and can take quite a few parameters, including
custom titles, labels, Matplotlib Axes, and color options.

Obtaining weather data
----------------------

Weather data can be obtained using the :any:`EEweather <eeweather:index>` package.

Definitely check out the full docs, but here's a taste of what that's like.

Installation::

    $ pip install eeweather

Usage::

    >>> import eeweather
    >>> result = eeweather.match_lat_long(35, -95)
    >>> result
    ISDStationMapping('722178')
    >>> result.distance_meters
    34672
    >>> station = result.isd_station
    >>> station
    ISDStation('722178')
    >>> import datetime
    >>> import pytz
    >>> start_date = datetime.datetime(2016, 6, 1, tzinfo=pytz.UTC)
    >>> end_date = datetime.datetime(2017, 9, 15, tzinfo=pytz.UTC)
    >>> tempC = station.load_isd_hourly_temp_data(start_date, end_date)
    >>> tempC.head()
    2016-06-01 00:00:00+00:00    28.291500
    2016-06-01 01:00:00+00:00    27.438500
    2016-06-01 02:00:00+00:00    27.197083
    2016-06-01 03:00:00+00:00    26.898750
    2016-06-01 04:00:00+00:00    26.701810
    Freq: H, dtype: float64
    >>> tempF = tempC * 1.8 + 32
    >>> tempF.head()
    2016-06-01 00:00:00+00:00    82.924700
    2016-06-01 01:00:00+00:00    81.389300
    2016-06-01 02:00:00+00:00    80.954750
    2016-06-01 03:00:00+00:00    80.417750
    2016-06-01 04:00:00+00:00    80.063259

.. _anaconda:

Using with Anaconda
-------------------

Some users find that the easiest way to get a working Python distributions
is to use Anaconda (hello Windows users!). Anaconda is a free distribution of
Python that comes with all of the dependencies of eemeter and a host of other
useful scientific packages.

If that sounds appealing to you, please follow the
`installation instructions <https://www.anaconda.com/download/>`_
that Anaconda provides, then come back here armed with a shiny new python
distribution and install eemeter at an anaconda shell with
`$ pip install eemeter`.