Skip to content

Utilities

opendsm.eemeter.utilities.io

A module for assiting with input/output operations.

Copyright 2014-2025 OpenDSM contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

meter_data_from_csv(filepath_or_buffer, tz=None, start_col='start', value_col='value', gzipped=False, freq=None, **kwargs)

Load meter data from a CSV file and convert to a dataframe.

This is an example of the default csv structure assumed.
start,value
2017-01-01T00:00:00+00:00,0.31
2017-01-02T00:00:00+00:00,0.4
2017-01-03T00:00:00+00:00,0.58

Parameters:

Name Type Description Default
filepath_or_buffer str | FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]

File path or object.

required
tz str | tzinfo | None

Timezone represented in the meter data. Ex: UTC or US/Pacific

None
start_col str

Date period start column.

'start'
value_col str

Value column, can be in any unit.

'value'
gzipped bool

Whether file is gzipped.

False
freq str | None

If given, apply frequency to data using pandas.DataFrame.resample. One of ['hourly', 'daily'].

None
**kwargs

Extra keyword arguments to pass to pandas.read_csv, such as sep='|'.

{}
Source code in opendsm/eemeter/utilities/io.py
def meter_data_from_csv(
    filepath_or_buffer: str | FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str],
    tz: str | datetime.tzinfo | None = None,
    start_col: str = "start",
    value_col: str = "value",
    gzipped: bool = False,
    freq: str | None = None,
    **kwargs,
) -> pd.DataFrame:
    """Load meter data from a CSV file and convert to a dataframe.

    Note: This is an example of the default csv structure assumed.
        ```python
        start,value
        2017-01-01T00:00:00+00:00,0.31
        2017-01-02T00:00:00+00:00,0.4
        2017-01-03T00:00:00+00:00,0.58
        ```

    Args:
        filepath_or_buffer: File path or object.
        tz: Timezone represented in the meter data. Ex: `UTC` or `US/Pacific`
        start_col: Date period start column.
        value_col: Value column, can be in any unit.
        gzipped: Whether file is gzipped.
        freq: If given, apply frequency to data using `pandas.DataFrame.resample`. One of `['hourly', 'daily']`.
        **kwargs: Extra keyword arguments to pass to `pandas.read_csv`, such as `sep='|'`.
    """

    read_csv_kwargs = {
        "usecols": [start_col, value_col],
        "dtype": {value_col: np.float64},
        "parse_dates": [start_col],
        "index_col": start_col,
    }

    if gzipped:
        read_csv_kwargs.update({"compression": "gzip"})

    # allow passing extra kwargs
    read_csv_kwargs.update(kwargs)

    df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs)
    df.index = pd.to_datetime(df.index, utc=True)

    # for pandas<0.24, which doesn't localize even with utc=True
    if df.index.tz is None:
        df.index = df.index.tz_localize("UTC")  # pragma: no cover

    if tz is not None:
        df = df.tz_convert(tz)

    if freq == "hourly":
        df = df.resample("h").sum(min_count=1)
    elif freq == "daily":
        df = df.resample("D").sum(min_count=1)

    return df

temperature_data_from_csv(filepath_or_buffer, tz=None, date_col='dt', temp_col='tempF', gzipped=False, freq=None, **kwargs)

Load meter data from a CSV file and convert to a dataframe. Farenheit is assumed for building models.

This is an example of the default csv structure assumed.
dt,tempF
2017-01-01T00:00:00+00:00,21
2017-01-01T01:00:00+00:00,22.5
2017-01-01T02:00:00+00:00,23.5

Parameters:

Name Type Description Default
filepath_or_buffer str | FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]

File path or object.

required
tz str | tzinfo | None

Timezone represented in the meter data. Ex: UTC or US/Pacific

None
date_col str

Date period start column.

'dt'
temp_col str

Temperature column.

'tempF'
gzipped bool

Whether file is gzipped.

False
freq str | None

If given, apply frequency to data using pandas.DataFrame.resample. One of ['hourly', 'daily'].

None
**kwargs

Extra keyword arguments to pass to pandas.read_csv, such as sep='|'.

{}
Source code in opendsm/eemeter/utilities/io.py
def temperature_data_from_csv(
    filepath_or_buffer: str | FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str],
    tz: str | datetime.tzinfo | None = None,
    date_col: str = "dt",
    temp_col: str = "tempF",
    gzipped: bool = False,
    freq: str | None = None,
    **kwargs,
):
    """Load meter data from a CSV file and convert to a dataframe. Farenheit is assumed for building models.

    Note: This is an example of the default csv structure assumed.
        ```python
        dt,tempF
        2017-01-01T00:00:00+00:00,21
        2017-01-01T01:00:00+00:00,22.5
        2017-01-01T02:00:00+00:00,23.5
        ```

    Args:
        filepath_or_buffer: File path or object.
        tz: Timezone represented in the meter data. Ex: `UTC` or `US/Pacific`
        date_col: Date period start column.
        temp_col: Temperature column.
        gzipped: Whether file is gzipped.
        freq: If given, apply frequency to data using `pandas.DataFrame.resample`. One of `['hourly', 'daily']`.
        **kwargs: Extra keyword arguments to pass to `pandas.read_csv`, such as `sep='|'`.
    """
    read_csv_kwargs = {
        "usecols": [date_col, temp_col],
        "dtype": {temp_col: np.float64},
        "parse_dates": [date_col],
        "index_col": date_col,
    }

    if gzipped:
        read_csv_kwargs.update({"compression": "gzip"})

    # allow passing extra kwargs
    read_csv_kwargs.update(kwargs)

    df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs)
    df.index = pd.to_datetime(df.index, utc=True)

    # for pandas<0.24, which doesn't localize even with utc=True
    if df.index.tz is None:
        df.index = df.index.tz_localize("UTC")  # pragma: no cover

    if tz is not None:
        df = df.tz_convert(tz)

    if freq == "hourly":
        df = df.resample("h").sum(min_count=1)

    return df[temp_col]

meter_data_from_json(data, orient='list')

Load meter data from a list of dictionary objects or a list of lists.

Parameters:

Name Type Description Default
data list

A list of meter data, with each row representing a single record.

required
orient str

Format of data parameter. Must be one of ['list', 'records']. 'list' is a list of lists, with the first element as start date and the second element as meter usage. 'records' is a list of dicts.

'list'
This is an example of the default list structure.
[
    ['2017-01-01T00:00:00+00:00', 3.5],
    ['2017-02-01T00:00:00+00:00', 0.4],
    ['2017-03-01T00:00:00+00:00', 0.46],
]
This is an example of the records structure.
[
    {'start': '2017-01-01T00:00:00+00:00', 'value': 3.5},
    {'start': '2017-02-01T00:00:00+00:00', 'value': 0.4},
    {'start': '2017-03-01T00:00:00+00:00', 'value': 0.46},
]

Returns:

Type Description
DataFrame

DataFrame with a single column ('value') and a pandas.DatetimeIndex. A second column ('estimated') may also be included if the input data contained an estimated boolean flag.

Source code in opendsm/eemeter/utilities/io.py
def meter_data_from_json(data: list, orient: str = "list") -> pd.DataFrame:
    """Load meter data from a list of dictionary objects or a list of lists.

    Args:
        data: A list of meter data, with each row representing a single record.
        orient: Format of `data` parameter. Must be one of `['list', 'records']`.
            `'list'` is a list of lists, with the first element as start date and the second element as meter usage. `'records'` is a list of dicts.

    Note: This is an example of the default `list` structure.
        ```python
        [
            ['2017-01-01T00:00:00+00:00', 3.5],
            ['2017-02-01T00:00:00+00:00', 0.4],
            ['2017-03-01T00:00:00+00:00', 0.46],
        ]
        ```

    Note: This is an example of the `records` structure.
        ```python
        [
            {'start': '2017-01-01T00:00:00+00:00', 'value': 3.5},
            {'start': '2017-02-01T00:00:00+00:00', 'value': 0.4},
            {'start': '2017-03-01T00:00:00+00:00', 'value': 0.46},
        ]
        ```

    Returns:
        DataFrame with a single column (``'value'``) and a `pandas.DatetimeIndex`. A second column (``'estimated'``) may also be included if the input data contained an estimated boolean flag.
    """

    def _empty_meter_data_dataframe():
        return pd.DataFrame(
            {"value": []}, index=pd.DatetimeIndex([], tz="UTC", name="start")
        )

    if data is None:
        return _empty_meter_data_dataframe()

    if orient == "list":
        df = pd.DataFrame(data, columns=["start", "value"])
        df["start"] = pd.to_datetime(df.start, utc=True)
        df = df.set_index("start")
        return df
    elif orient == "records":

        def _noneify_meter_data_row(row):
            value = row["value"]
            if value is not None:
                try:
                    value = float(value)
                except ValueError:
                    value = None
            out_row = {"start": row["start"], "value": value}
            if "estimated" in row:
                estimated = row.get("estimated")
                out_row["estimated"] = estimated in [True, "true", "True", 1, "1"]
            return out_row

        noneified_data = [_noneify_meter_data_row(row) for row in data]
        df = pd.DataFrame(noneified_data)
        if df.empty:
            return _empty_meter_data_dataframe()
        df["start"] = pd.to_datetime(df.start, utc=True)
        df = df.set_index("start")
        df["value"] = df["value"].astype(float)
        if "estimated" in df.columns:
            df["estimated"] = (
                df["estimated"].where(df["estimated"].notna(), False).astype(bool)
            )
        return df
    else:
        raise ValueError("orientation not recognized.")

temperature_data_from_json(data, orient='list')

Load temperature data from json to a Series. Farenheit is assumed for building models.

Parameters:

Name Type Description Default
data list

A list of temperature data, with each row representing a single record.

required
orient str

Format of data parameter. Must be 'list'. 'list' is a list of lists, with the first element as start date and the second element as temperature.

'list'
This is an example of the default list structure.
[
    ['2017-01-01T00:00:00+00:00', 3.5],
    ['2017-01-01T01:00:00+00:00', 5.4],
    ['2017-01-01T02:00:00+00:00', 7.4],
]

Returns:

Type Description
Series

DataFrame with a single column ('tempF') and a pandas.DatetimeIndex.

Raises:

Type Description
ValueError

If orient is not 'list'.

Source code in opendsm/eemeter/utilities/io.py
def temperature_data_from_json(data: list, orient: str = "list") -> pd.Series:
    """Load temperature data from json to a Series. Farenheit is assumed for building models.

    Args:
        data: A list of temperature data, with each row representing a single record.
        orient: Format of `data` parameter. Must be `'list'`.
            `'list'` is a list of lists, with the first element as start date and the second element as temperature.

    Note: This is an example of the default `list` structure.
        ```python
        [
            ['2017-01-01T00:00:00+00:00', 3.5],
            ['2017-01-01T01:00:00+00:00', 5.4],
            ['2017-01-01T02:00:00+00:00', 7.4],
        ]
        ```

    Returns:
        DataFrame with a single column (``'tempF'``) and a `pandas.DatetimeIndex`.

    Raises:
        ValueError: If `orient` is not `'list'`.
    """
    if orient == "list":
        df = pd.DataFrame(data, columns=["dt", "tempF"])
        series = df.tempF
        series.index = pd.to_datetime(df.dt, utc=True)
        return series
    else:
        raise ValueError("orientation not recognized.")

meter_data_to_csv(meter_data, path_or_buf)

Write meter data from a DataFrame or Series to a CSV. See also pandas.DataFrame.to_csv.

Parameters:

Name Type Description Default
meter_data DataFrame | Series

DataFrame or Series with a 'value' column and a pandas.DatetimeIndex.

required
path_or_buf str | FilePath | WriteBuffer[bytes] | WriteBuffer[str]

Path or file handle.

required
Source code in opendsm/eemeter/utilities/io.py
def meter_data_to_csv(
    meter_data: pd.DataFrame | pd.Series,
    path_or_buf: str | FilePath | WriteBuffer[bytes] | WriteBuffer[str],
) -> None:
    """Write meter data from a DataFrame or Series to a CSV. See also `pandas.DataFrame.to_csv`.

    Args:
        meter_data: DataFrame or Series with a ``'value'`` column and a `pandas.DatetimeIndex`.
        path_or_buf: Path or file handle.
    """
    if meter_data.index.name is None:
        meter_data.index.name = "start"

    return meter_data.to_csv(path_or_buf, index=True)

temperature_data_to_csv(temperature_data, path_or_buf)

Write temperature data to CSV. See also :any:pandas.DataFrame.to_csv.

Parameters:

Name Type Description Default
temperature_data Series

Temperature data series with :any:pandas.DatetimeIndex.

required
path_or_buf str | FilePath | WriteBuffer[bytes] | WriteBuffer[str]

Path or file handle.

required
Source code in opendsm/eemeter/utilities/io.py
def temperature_data_to_csv(
    temperature_data: pd.Series,
    path_or_buf: str | FilePath | WriteBuffer[bytes] | WriteBuffer[str],
) -> None:
    """Write temperature data to CSV. See also :any:`pandas.DataFrame.to_csv`.

    Args:
        temperature_data: Temperature data series with :any:`pandas.DatetimeIndex`.
        path_or_buf: Path or file handle.
    """
    if temperature_data.index.name is None:
        temperature_data.index.name = "dt"
    if temperature_data.name is None:
        temperature_data.name = "temperature"

    return temperature_data.to_frame().to_csv(path_or_buf, index=True)