{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Unit Validation and Conversion with TimeDB\n", "\n", "This notebook demonstrates TimeDB's unit handling capabilities using pint-pandas Series.\n", "\n", "#### What you'll learn:\n", "1. **Uploading data with units** - Using pint-pandas Series with `dtype=\"pint[unit]\"` in DataFrames\n", "2. **Reading data and getting unit information** - How to retrieve series metadata including units\n", "3. **Unit validation** - What happens when you try to upload incompatible units to an existing series\n", "\n", "**Key Features:**\n", "- Each DataFrame column (except time columns) automatically becomes a separate series\n", "- Series name defaults to the column name\n", "- Units are extracted from pint-pandas Series dtype (e.g., `dtype=\"pint[MW]\"`)\n", "- Values are automatically converted to canonical units before storage\n", "- Incompatible units raise `IncompatibleUnitError`\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✓ Imports successful\n", "✓ Using pint-pandas for unit handling\n" ] } ], "source": [ "import timedb as td\n", "import pandas as pd\n", "import pint_pandas\n", "import matplotlib.pyplot as plt\n", "from datetime import datetime, timezone, timedelta\n", "\n", "# Load environment variables (for database connection)\n", "from dotenv import load_dotenv\n", "load_dotenv()\n", "\n", "print(\"✓ Imports successful\")\n", "print(f\"✓ Using pint-pandas for unit handling\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Uploading Data with Units\n", "\n", "Let's start by creating the database schema and then uploading time series data with units embedded as pint-pandas Series.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating database schema...\n", "✓ Schema created successfully\n" ] } ], "source": [ "# Create database schema\n", "td.delete()\n", "td.create()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "int64", "type": "integer" }, { "name": "valid_time", "rawType": "datetime64[ns, UTC]", "type": "unknown" }, { "name": "power", "rawType": "pint[megawatt][Float64]", "type": "unknown" }, { "name": "wind_speed", "rawType": "pint[meter / second][Float64]", "type": "unknown" }, { "name": "temperature", "rawType": "pint[degree_Celsius][Float64]", "type": "unknown" } ], "ref": "1887c9dd-6d1e-40a9-a887-e2a153a09d79", "rows": [ [ "0", "2025-01-01 00:00:00+00:00", "1.0 megawatt", "5.0 meter / second", "20.0 degree_Celsius" ], [ "1", "2025-01-01 01:00:00+00:00", "1.05 megawatt", "5.2 meter / second", "20.5 degree_Celsius" ], [ "2", "2025-01-01 02:00:00+00:00", "1.1 megawatt", "5.4 meter / second", "21.0 degree_Celsius" ], [ "3", "2025-01-01 03:00:00+00:00", "1.15 megawatt", "5.6 meter / second", "21.5 degree_Celsius" ], [ "4", "2025-01-01 04:00:00+00:00", "1.2 megawatt", "5.8 meter / second", "22.0 degree_Celsius" ] ], "shape": { "columns": 4, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valid_timepowerwind_speedtemperature
02025-01-01 00:00:00+00:001.05.020.0
12025-01-01 01:00:00+00:001.055.220.5
22025-01-01 02:00:00+00:001.15.421.0
32025-01-01 03:00:00+00:001.155.621.5
42025-01-01 04:00:00+00:001.25.822.0
\n", "
" ], "text/plain": [ " valid_time power wind_speed temperature\n", "0 2025-01-01 00:00:00+00:00 1.0 5.0 20.0\n", "1 2025-01-01 01:00:00+00:00 1.05 5.2 20.5\n", "2 2025-01-01 02:00:00+00:00 1.1 5.4 21.0\n", "3 2025-01-01 03:00:00+00:00 1.15 5.6 21.5\n", "4 2025-01-01 04:00:00+00:00 1.2 5.8 22.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create sample time series data with units\n", "# We'll create three different series: power (MW), wind speed (m/s), and temperature (°C)\n", "\n", "base_time = datetime(2025, 1, 1, 0, 0, tzinfo=timezone.utc)\n", "times = [base_time + timedelta(hours=i) for i in range(24)]\n", "\n", "# Power values in megawatts (will be stored as canonical unit)\n", "# Round to avoid floating point precision artifacts\n", "power_vals_MW = [round(1.0 + i * 0.05, 2) for i in range(24)]\n", "\n", "# Wind speed values in meters per second\n", "wind_vals_m_s = [round(5.0 + i * 0.2, 2) for i in range(24)]\n", "\n", "# Temperature values in Celsius\n", "temp_vals_C = [round(20.0 + i * 0.5, 2) for i in range(24)]\n", "\n", "# Create DataFrame with pint-pandas Series\n", "# Each column becomes a separate series with its unit specified in the dtype\n", "df = pd.DataFrame({\n", " \"valid_time\": times,\n", " \"power\": pd.Series(power_vals_MW, dtype=\"pint[MW]\"), # Series with MW unit\n", " \"wind_speed\": pd.Series(wind_vals_m_s, dtype=\"pint[m/s]\"), # Series with m/s unit\n", " \"temperature\": pd.Series(temp_vals_C, dtype=\"pint[degree_Celsius]\") # Series with °C unit\n", "})\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "object", "type": "string" }, { "name": "0", "rawType": "object", "type": "unknown" } ], "ref": "55ef0186-f187-4b16-bd3c-ffb86f65e019", "rows": [ [ "valid_time", "datetime64[ns, UTC]" ], [ "power", "pint[megawatt][Float64]" ], [ "wind_speed", "pint[meter / second][Float64]" ], [ "temperature", "pint[degree_Celsius][Float64]" ] ], "shape": { "columns": 1, "rows": 4 } }, "text/plain": [ "valid_time datetime64[ns, UTC]\n", "power pint[megawatt][Float64]\n", "wind_speed pint[meter / second][Float64]\n", "temperature pint[degree_Celsius][Float64]\n", "dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data values inserted successfully.\n", "✓ Data inserted successfully!\n", "\n", "Batch ID: ef24a6bd-402a-4f63-8d1c-c454401e5c9a\n", "Workflow ID: sdk-workflow\n", "Tenant ID: 00000000-0000-0000-0000-000000000000\n", "\n", "Series created (name -> series_id):\n", " power: a3cb75f7-3328-4026-8960-3c79075e468a\n", " wind_speed: 46ad278c-d32c-471e-aaf8-d1c3efb3d46e\n", " temperature: 6feb7937-5c30-4062-b63e-412b569fff2f\n" ] } ], "source": [ "# Insert the data - TimeDB automatically:\n", "# 1. Detects each column as a separate series\n", "# 2. Extracts units from pint-pandas Series dtype\n", "# 3. Creates series with name = column name\n", "# 4. Converts values to canonical units and stores them\n", "\n", "result = td.insert_batch(df=df)\n", "\n", "print(\"✓ Data inserted successfully!\")\n", "print(f\"\\nBatch ID: {result.batch_id}\")\n", "print(f\"Workflow ID: {result.workflow_id}\")\n", "print(f\"Tenant ID: {result.tenant_id}\")\n", "print(f\"\\nSeries created (name -> series_id):\")\n", "for name, series_id in result.series_ids.items():\n", " print(f\" {name}: {series_id}\")\n", "\n", "# Store the series_ids for later use\n", "series_ids_map = result.series_ids\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Reading Data and Getting Unit Information\n", "\n", "Now let's read the data back and see how TimeDB returns unit information along with the values.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✓ Data read successfully\n", "\n", "DataFrame shape: (24, 3)\n", "\n", "Columns: ['wind_speed', 'temperature', 'power']\n", "\n", "Index: valid_time\n", "\n", "First few rows:\n" ] }, { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "valid_time", "rawType": "datetime64[ns, UTC]", "type": "unknown" }, { "name": "wind_speed", "rawType": "pint[meter / second][Float64]", "type": "unknown" }, { "name": "temperature", "rawType": "pint[degree_Celsius][Float64]", "type": "unknown" }, { "name": "power", "rawType": "pint[megawatt][Float64]", "type": "unknown" } ], "ref": "e17df92f-b953-42e1-a2c1-32c3ffb5e3ea", "rows": [ [ "2025-01-01 00:00:00+00:00", "5.0 meter / second", "20.0 degree_Celsius", "1.0 megawatt" ], [ "2025-01-01 01:00:00+00:00", "5.2 meter / second", "20.5 degree_Celsius", "1.05 megawatt" ], [ "2025-01-01 02:00:00+00:00", "5.4 meter / second", "21.0 degree_Celsius", "1.1 megawatt" ], [ "2025-01-01 03:00:00+00:00", "5.6 meter / second", "21.5 degree_Celsius", "1.15 megawatt" ], [ "2025-01-01 04:00:00+00:00", "5.8 meter / second", "22.0 degree_Celsius", "1.2 megawatt" ] ], "shape": { "columns": 3, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namewind_speedtemperaturepower
valid_time
2025-01-01 00:00:00+00:005.020.01.0
2025-01-01 01:00:00+00:005.220.51.05
2025-01-01 02:00:00+00:005.421.01.1
2025-01-01 03:00:00+00:005.621.51.15
2025-01-01 04:00:00+00:005.822.01.2
\n", "
" ], "text/plain": [ "name wind_speed temperature power\n", "valid_time \n", "2025-01-01 00:00:00+00:00 5.0 20.0 1.0\n", "2025-01-01 01:00:00+00:00 5.2 20.5 1.05\n", "2025-01-01 02:00:00+00:00 5.4 21.0 1.1\n", "2025-01-01 03:00:00+00:00 5.6 21.5 1.15\n", "2025-01-01 04:00:00+00:00 5.8 22.0 1.2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Read all data back\n", "# Columns are the series names (human-readable)\n", "df_read = td.read()\n", "\n", "print(\"✓ Data read successfully\")\n", "print(f\"\\nDataFrame shape: {df_read.shape}\")\n", "print(f\"\\nColumns: {list(df_read.columns)}\")\n", "print(f\"\\nIndex: {df_read.index.name}\")\n", "print(f\"\\nFirst few rows:\")\n", "df_read.head()\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Column dtypes (with units):\n", " wind_speed: pint[meter / second][Float64]\n", " temperature: pint[degree_Celsius][Float64]\n", " power: pint[megawatt][Float64]\n" ] } ], "source": [ "# Check the dtypes - each column has a pint-pandas dtype with the unit\n", "print(\"Column dtypes (with units):\")\n", "for col in df_read.columns:\n", " print(f\" {col}: {df_read[col].dtype}\")\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✓ Reset index to see all columns\n", "\n", "Columns: ['valid_time', 'wind_speed', 'temperature', 'power']\n", "\n", "Sample data:\n", "name valid_time wind_speed temperature power\n", "0 2025-01-01 00:00:00+00:00 5.0 20.0 1.0\n", "1 2025-01-01 01:00:00+00:00 5.2 20.5 1.05\n", "2 2025-01-01 02:00:00+00:00 5.4 21.0 1.1\n", "3 2025-01-01 03:00:00+00:00 5.6 21.5 1.15\n", "4 2025-01-01 04:00:00+00:00 5.8 22.0 1.2\n", "5 2025-01-01 05:00:00+00:00 6.0 22.5 1.25\n", "6 2025-01-01 06:00:00+00:00 6.2 23.0 1.3\n", "7 2025-01-01 07:00:00+00:00 6.4 23.5 1.35\n", "8 2025-01-01 08:00:00+00:00 6.6 24.0 1.4\n", "9 2025-01-01 09:00:00+00:00 6.8 24.5 1.45\n", "\n", "Series and their units:\n", " wind_speed: meter / second][Float64\n", " temperature: degree_Celsius][Float64\n", " power: megawatt][Float64\n" ] } ], "source": [ "# Reset index to see valid_time as a column\n", "df_flat = df_read.reset_index()\n", "\n", "print(\"✓ Reset index to see all columns\")\n", "print(f\"\\nColumns: {list(df_flat.columns)}\")\n", "print(f\"\\nSample data:\")\n", "print(df_flat.head(10))\n", "\n", "# Extract unit information from column dtypes\n", "print(f\"\\nSeries and their units:\")\n", "for col in df_read.columns:\n", " dtype_str = str(df_read[col].dtype)\n", " # Extract unit from pint dtype (e.g., \"pint[MW]\" -> \"MW\")\n", " if dtype_str.startswith(\"pint[\"):\n", " unit = dtype_str[5:-1] # Remove \"pint[\" and \"]\"\n", " print(f\" {col}: {unit}\")\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pint_pandas.PintType.ureg.setup_matplotlib()\n", "df_flat[[\"wind_speed\"]].plot()\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "✓ Plotted all series with their units\n" ] } ], "source": [ "# Visualize the data with unit information in the plot\n", "# Note: setup_matplotlib() was already called in the previous cell\n", "fig, axes = plt.subplots(3, 1, figsize=(12, 10))\n", "\n", "# Extract units from column dtypes\n", "for idx, series_name in enumerate(df_read.columns):\n", " # Extract unit from pint dtype\n", " dtype_str = str(df_read[series_name].dtype)\n", " if dtype_str.startswith(\"pint[\"):\n", " unit = dtype_str[5:-1] # Remove \"pint[\" and \"]\"\n", " else:\n", " unit = \"dimensionless\"\n", " \n", " # Extract numeric values from pint-pandas Series for plotting\n", " # pint-pandas Series contain Pint Quantity objects, so we need to extract magnitudes\n", " # Note: For individual series plotting with subplots, we extract magnitudes.\n", " # pint-pandas matplotlib support works best with DataFrame.plot() for multiple columns\n", " values = df_read[series_name].apply(lambda x: float(x.magnitude) if hasattr(x, 'magnitude') else float(x))\n", " \n", " # Plot the series\n", " axes[idx].plot(df_read.index, values, marker='o', linewidth=2, markersize=4)\n", " axes[idx].set_title(f'{series_name} ({unit})', fontsize=12, fontweight='bold')\n", " axes[idx].set_ylabel(f'Value [{unit}]', fontsize=10)\n", " axes[idx].grid(True, alpha=0.3)\n", " axes[idx].tick_params(axis='x', rotation=45)\n", "\n", "axes[-1].set_xlabel('Time', fontsize=10)\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"✓ Plotted all series with their units\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Unit Validation - Incompatible Units Error\n", "\n", "Now let's try to upload new data to an existing series, but with incompatible units. This should raise an `IncompatibleUnitError`.\n", "\n", "**Important:** Once a series is created with a canonical unit (e.g., \"kW\" for power), all subsequent data must have compatible units (e.g., \"MW\", \"W\", \"kW\" are all compatible as they're all power units). However, trying to use incompatible units (e.g., \"MWh\" for a power series) will raise an error.\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Power series ID: a3cb75f7-3328-4026-8960-3c79075e468a\n", "Power series canonical unit: pint[megawatt][Float64]\n", "\n", "This means all power values are stored in pint[megawatt][Float64]\n", "Note: The original data was in MW, which is now the canonical unit for this series.\n" ] } ], "source": [ "# Get the series_id for the \"power\" series\n", "power_series_id = result.series_ids['power']\n", "print(f\"Power series ID: {power_series_id}\")\n", "\n", "# Check what unit the power series uses (extract from dtype)\n", "power_dtype = str(df_read['power'].dtype)\n", "if power_dtype.startswith(\"pint[\"):\n", " canonical_unit = power_dtype\n", "else:\n", " canonical_unit = \"dimensionless\"\n", "\n", "print(f\"Power series canonical unit: {canonical_unit}\")\n", "print(f\"\\nThis means all power values are stored in {canonical_unit}\")\n", "print(f\"Note: The original data was in MW, which is now the canonical unit for this series.\")\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Example 1: Uploading compatible units (kW to MW series)\n", "============================================================\n", "Data values inserted successfully.\n", "✓ SUCCESS: kW values were automatically converted to MW and stored\n", " Inserted 24 new data points\n", " Note: 100 kW = 0.1 MW (automatic conversion)\n" ] } ], "source": [ "# Example 1: Uploading compatible units (kW -> MW conversion)\n", "# This should work fine - kW and MW are compatible (both are power units)\n", "\n", "print(\"Example 1: Uploading compatible units (kW to MW series)\")\n", "print(\"=\" * 60)\n", "\n", "# Create new data in kilowatts (compatible with MW - will be converted)\n", "new_times = [base_time + timedelta(hours=i) for i in range(24, 48)]\n", "power_vals_kW = [100.0 + i * 5.0 for i in range(24)] # Values in kW\n", "\n", "df_compatible = pd.DataFrame({\n", " \"valid_time\": new_times,\n", " \"power\": pd.Series(power_vals_kW, dtype=\"pint[kW]\") # kW is compatible with MW\n", "})\n", "\n", "try:\n", " result_compatible = td.insert_batch(df=df_compatible, series_ids=series_ids_map)\n", " print(\"✓ SUCCESS: kW values were automatically converted to MW and stored\")\n", " print(f\" Inserted {len(new_times)} new data points\")\n", " print(f\" Note: 100 kW = 0.1 MW (automatic conversion)\")\n", "except Exception as e:\n", " print(f\"✗ ERROR: {type(e).__name__}: {e}\")\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Example 2: Uploading incompatible units (MWh to MW series)\n", "============================================================\n", "✓ EXPECTED ERROR: IncompatibleUnitError\n", " Message: Cannot convert megawatt_hour to megawatt: incompatible dimensionality (Cannot convert from 'megawatt_hour' ([mass] * [length] ** 2 / [time] ** 2) to 'megawatt' ([mass] * [length] ** 2 / [time] ** 3))\n", "\n", " This error occurred because:\n", " - The 'power' series has canonical unit: pint[megawatt][Float64] (power)\n", " - You tried to upload values with unit: MWh (energy)\n", " - Power and energy have incompatible dimensionality\n" ] } ], "source": [ "# Example 2: Uploading incompatible units (MWh -> MW series)\n", "# This should FAIL - MWh (energy) is incompatible with MW (power)\n", "\n", "print(\"\\nExample 2: Uploading incompatible units (MWh to MW series)\")\n", "print(\"=\" * 60)\n", "\n", "# Create new data in megawatt-hours (INCOMPATIBLE with MW - energy vs power!)\n", "new_times2 = [base_time + timedelta(hours=i) for i in range(48, 72)]\n", "energy_vals_MWh = [10.0 + i * 0.5 for i in range(24)] # Values in MWh\n", "\n", "df_incompatible = pd.DataFrame({\n", " \"valid_time\": new_times2,\n", " \"power\": pd.Series(energy_vals_MWh, dtype=\"pint[MWh]\") # MWh is INCOMPATIBLE with MW!\n", "})\n", "\n", "try:\n", " result_incompatible = td.insert_batch(df=df_incompatible, series_ids=series_ids_map)\n", " print(\"✗ UNEXPECTED: This should have failed but didn't!\")\n", "except td.IncompatibleUnitError as e:\n", " print(f\"✓ EXPECTED ERROR: {type(e).__name__}\")\n", " print(f\" Message: {e}\")\n", " print(f\"\\n This error occurred because:\")\n", " print(f\" - The 'power' series has canonical unit: {canonical_unit} (power)\")\n", " print(f\" - You tried to upload values with unit: MWh (energy)\")\n", " print(f\" - Power and energy have incompatible dimensionality\")\n", "except Exception as e:\n", " print(f\"✗ UNEXPECTED ERROR: {type(e).__name__}: {e}\")\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Example 3: Uploading compatible but different units (W to MW series)\n", "============================================================\n", "Data values inserted successfully.\n", "✓ SUCCESS: W values were automatically converted to MW and stored\n", " Inserted 24 new data points\n", " Note: 100000 W = 0.1 MW (automatic conversion)\n" ] } ], "source": [ "# Example 3: Uploading compatible but different units (W -> MW series)\n", "# This should work fine - W (watts) is compatible with MW (megawatts)\n", "\n", "print(\"\\nExample 3: Uploading compatible but different units (W to MW series)\")\n", "print(\"=\" * 60)\n", "\n", "# Create new data in watts (compatible with MW - just a different scale)\n", "new_times3 = [base_time + timedelta(hours=i) for i in range(72, 96)]\n", "power_vals_W = [100000.0 + i * 5000.0 for i in range(24)] # Values in W\n", "\n", "df_watts = pd.DataFrame({\n", " \"valid_time\": new_times3,\n", " \"power\": pd.Series(power_vals_W, dtype=\"pint[W]\") # W is compatible with MW\n", "})\n", "\n", "try:\n", " result_watts = td.insert_batch(df=df_watts, series_ids=series_ids_map)\n", " print(\"✓ SUCCESS: W values were automatically converted to MW and stored\")\n", " print(f\" Inserted {len(new_times3)} new data points\")\n", " print(f\" Note: 100000 W = 0.1 MW (automatic conversion)\")\n", "except Exception as e:\n", " print(f\"✗ ERROR: {type(e).__name__}: {e}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "This notebook demonstrated:\n", "\n", "1. **Uploading data with units**: Using pint-pandas Series with `dtype=\"pint[unit]\"` in DataFrame columns automatically creates series with units extracted from the dtype.\n", "\n", "2. **Reading data with unit information**: The `read()` function returns a pivoted DataFrame with `series_key` as column names and pint-pandas dtypes (e.g., `dtype=\"pint[MW]\"`) so you always know what unit the stored values are in.\n", "\n", "3. **Unit validation**: TimeDB prevents storing incompatible units in the same series:\n", " - ✅ Compatible units (kW, MW, W) → Automatic conversion\n", " - ❌ Incompatible units (MW vs MWh) → `IncompatibleUnitError`\n", "\n", "**Key Takeaways:**\n", "- Each DataFrame column becomes a separate series\n", "- Series name defaults to column name\n", "- Units are extracted from pint-pandas Series dtype (e.g., `dtype=\"pint[MW]\"`)\n", "- Values are converted to canonical units before storage\n", "- Incompatible units are rejected with clear error messages\n", "- The `read()` function returns a DataFrame with series names as columns and units in the dtype\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✓ Final data read\n", "\n", "Total rows: 72\n", "\n", "Series in database:\n", " wind_speed (meter / second][Float64): 24 data points\n", " temperature (degree_Celsius][Float64): 24 data points\n", " power (megawatt][Float64): 72 data points\n" ] } ], "source": [ "# Final verification: Read all data to see the complete time series\n", "df_final = td.read()\n", "\n", "print(\"✓ Final data read\")\n", "print(f\"\\nTotal rows: {len(df_final)}\")\n", "print(f\"\\nSeries in database:\")\n", "for series_name in df_final.columns:\n", " # Extract unit from dtype\n", " dtype_str = str(df_final[series_name].dtype)\n", " if dtype_str.startswith(\"pint[\"):\n", " unit = dtype_str[5:-1] # Extract unit from \"pint[MW]\"\n", " else:\n", " unit = \"dimensionless\"\n", " count = len(df_final[df_final[series_name].notna()])\n", " print(f\" {series_name} ({unit}): {count} data points\")\n" ] } ], "metadata": { "kernelspec": { "display_name": "timedb", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.2" } }, "nbformat": 4, "nbformat_minor": 2 }