SDK Usage ========= The timedb SDK provides a high-level Python interface for working with time series data. It handles unit conversion, series management, and DataFrame operations automatically. Getting Started --------------- Import the SDK: .. code-block:: python import timedb as td import pandas as pd import pint Database Connection ------------------- The SDK uses environment variables for database connection: - ``TIMEDB_DSN`` (preferred) - ``DATABASE_URL`` (alternative) You can also use a ``.env`` file in your project root. Schema Management ----------------- Creating the Schema ~~~~~~~~~~~~~~~~~~~ Before using the SDK, create the database schema: .. code-block:: python td.create() This creates all necessary tables. It's safe to run multiple times. Deleting the Schema ~~~~~~~~~~~~~~~~~~~ To delete all tables and data (use with caution): .. code-block:: python td.delete() **WARNING**: This will delete all data! Inserting Data -------------- The main function for inserting data is ``insert_batch()``. It automatically: - Detects series from DataFrame columns - Extracts units from Pint Quantity objects or pint-pandas Series - Creates/gets series with appropriate units - Converts all values to canonical units before storage Basic Example ~~~~~~~~~~~~~ .. code-block:: python import timedb as td import pandas as pd import pint from datetime import datetime, timezone, timedelta # Create unit registry ureg = pint.UnitRegistry() # Create sample data with Pint Quantity objects times = pd.date_range( start=datetime.now(timezone.utc), periods=24, freq='H', tz='UTC' ) # Create DataFrame with Pint Quantity columns df = pd.DataFrame({ "valid_time": times, "power": [100.0, 105.0, 110.0] * 8 * ureg.kW, "temperature": [20.0, 21.0, 22.0] * 8 * ureg.degC, }) # Insert the data result = td.insert_batch(df=df) # result.series_ids contains the mapping of series_key to series_id print(result.series_ids) # {'power': UUID('...'), 'temperature': UUID('...')} Using pint-pandas Series ~~~~~~~~~~~~~~~~~~~~~~~~ You can also use pint-pandas Series with dtype annotations: .. code-block:: python df = pd.DataFrame({ "valid_time": times, "power": pd.Series([100.0, 105.0, 110.0] * 8, dtype="pint[MW]"), "wind_speed": pd.Series([5.0, 6.0, 7.0] * 8, dtype="pint[m/s]"), }) result = td.insert_batch(df=df) Custom Series Keys ~~~~~~~~~~~~~~~~~~ Override default series keys (column names): .. code-block:: python result = td.insert_batch( df=df, series_key_overrides={ 'power': 'wind_power_forecast', 'temperature': 'ambient_temperature' } ) Interval Values ~~~~~~~~~~~~~~~ For interval-based time series (e.g., energy over a time period): .. code-block:: python start_times = pd.date_range( start=datetime.now(timezone.utc), periods=24, freq='H', tz='UTC' ) end_times = start_times + timedelta(hours=1) df_intervals = pd.DataFrame({ "valid_time": start_times, "valid_time_end": end_times, "energy": [100.0, 105.0, 110.0] * 8 * ureg.MWh }) result = td.insert_batch( df=df_intervals, valid_time_end_col='valid_time_end' ) Advanced Options ~~~~~~~~~~~~~~~~ Full function signature: .. code-block:: python result = td.insert_batch( df=pd.DataFrame(...), tenant_id=None, # Optional, defaults to zeros UUID run_id=None, # Optional, auto-generated if not provided workflow_id=None, # Optional, defaults to "sdk-workflow" run_start_time=None, # Optional, defaults to now() run_finish_time=None, # Optional valid_time_col='valid_time', # Column name for valid_time valid_time_end_col=None, # Column name for valid_time_end known_time=None, # Time of knowledge (for backfills) run_params=None, # Optional dict of run parameters series_key_overrides=None, # Optional dict mapping column names to series_key ) Reading Data ------------ The SDK provides several functions for reading data. Basic Read ~~~~~~~~~~ Read time series values: .. code-block:: python df = td.read( series_id=None, # Optional, filter by series ID tenant_id=None, # Optional, defaults to zeros UUID start_valid=None, # Optional, start of valid time range end_valid=None, # Optional, end of valid time range return_mapping=False, # If True, return (DataFrame, mapping_dict) all_versions=False, # If True, include all versions return_value_id=False, # If True, include value_id column tags_and_annotations=False, # If True, include tags and annotation columns ) Returns a DataFrame with: - Index: ``valid_time`` - Columns: ``series_key`` (one column per series) - Each column has pint-pandas dtype based on ``series_unit`` Example: .. code-block:: python # Read all series df = td.read() # Read specific series series_id = result.series_ids['power'] df = td.read(series_id=series_id) # Read with time range df = td.read( start_valid=datetime(2024, 1, 1, tzinfo=timezone.utc), end_valid=datetime(2024, 1, 2, tzinfo=timezone.utc), ) # Get mapping of series_id to series_key df, mapping = td.read(return_mapping=True) Reading with Tags and Annotations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Include tags and annotations as DataFrame columns: .. code-block:: python df = td.read( series_id=series_id, tags_and_annotations=True ) The returned DataFrame includes additional columns: - ``tags``: List of tags for each value - ``annotation``: Annotation text for each value - ``changed_by``: Email of the user who changed the value - ``change_time``: When the value was last changed Reading All Versions ~~~~~~~~~~~~~~~~~~~~ By default, ``read()`` returns only the latest version of each value. To see all historical versions: .. code-block:: python df = td.read( series_id=series_id, all_versions=True, return_value_id=True # Include value_id to distinguish versions ) This is useful for: - Auditing changes to time series values - Viewing complete version history - Analyzing how values evolved over time Flat Mode Read ~~~~~~~~~~~~~~ Read values in flat mode (latest known_time per valid_time): .. code-block:: python df = td.read_values_flat( series_id=None, tenant_id=None, start_valid=None, end_valid=None, start_known=None, # Start of known_time range end_known=None, # End of known_time range all_versions=False, return_mapping=False, units=True, # If True, return pint-pandas DataFrame return_value_id=False, ) Returns the latest version of each (valid_time, series_id) combination based on known_time. Overlapping Mode Read ~~~~~~~~~~~~~~~~~~~~~ Read values in overlapping mode (all forecast revisions): .. code-block:: python df = td.read_values_overlapping( series_id=None, tenant_id=None, start_valid=None, end_valid=None, start_known=None, end_known=None, all_versions=False, return_mapping=False, units=True, ) Returns all versions of forecasts, showing how predictions evolve over time. Useful for analyzing forecast revisions and backtesting. Example: Analyzing forecast revisions: .. code-block:: python # Create multiple forecasts at different known_times base_time = datetime(2025, 1, 1, 0, 0, tzinfo=timezone.utc) for i in range(4): known_time = base_time + timedelta(days=i) valid_times = [known_time + timedelta(hours=j) for j in range(72)] df = pd.DataFrame({ "valid_time": valid_times, "power": generate_forecast(valid_times) * ureg.MW }) td.insert_batch( df=df, known_time=known_time, # When forecast was made workflow_id=f"forecast-run-{i}" ) # Read all forecast revisions (overlapping mode) df_overlapping = td.read_values_overlapping( series_id=series_id, start_valid=base_time, end_valid=base_time + timedelta(days=5) ) Updating Records ---------------- Update existing records with new values, annotations, or tags: .. code-block:: python updates = [ { 'value_id': 123, # Simplest: update by value_id 'value': 150.0, 'annotation': 'Manually corrected', 'tags': ['reviewed', 'corrected'], 'changed_by': 'user@example.com', } ] result = td.update_records(updates) For each update: - Omit a field to leave it unchanged - Set to ``None`` (or ``[]`` for tags) to explicitly clear it - Set to a value to update it The function returns: .. code-block:: python { 'updated': [...], # List of updated records 'skipped_no_ops': [...] # List of skipped records (no changes) } API Server ---------- The SDK provides functions to start and manage the TimeDB REST API server. Starting the API Server (Blocking) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Start the API server in the current process: .. code-block:: python td.start_api() # With custom host/port td.start_api(host="0.0.0.0", port=8080) # With auto-reload (development) td.start_api(reload=True) This blocks until the server is stopped (Ctrl+C). Starting the API Server (Non-blocking) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For use in notebooks or when you need the API to run in the background: .. code-block:: python # Start in background thread started = td.start_api_background() if started: print("API server started") else: print("API server was already running") Checking API Status ~~~~~~~~~~~~~~~~~~~ Check if the API server is running: .. code-block:: python if td.check_api(): # API is running pass else: # API is not running td.start_api_background() Error Handling -------------- The SDK raises specific exceptions: - ``ValueError``: If tables don't exist (run ``td.create()`` first) - ``IncompatibleUnitError``: If unit conversion fails due to dimensionality mismatch - ``ValueError``: If DataFrame structure is invalid or units are inconsistent Example error handling: .. code-block:: python from timedb import IncompatibleUnitError try: result = td.insert_batch(df=df) except ValueError as e: if "TimeDB tables do not exist" in str(e): print("Please create the schema first: td.create()") else: raise except IncompatibleUnitError as e: print(f"Unit conversion error: {e}") Best Practices -------------- 1. **Always use timezone-aware datetimes**: All time columns must be timezone-aware (UTC recommended) 2. **Use Pint for units**: Leverage Pint Quantity objects or pint-pandas Series for automatic unit handling 3. **Create schema first**: Run ``td.create()`` before inserting data 4. **Use known_time for backfills**: When inserting historical data, set ``known_time`` to when the data was actually known 5. **Use workflow_id**: Set meaningful workflow IDs to track data sources Complete Example ---------------- .. code-block:: python import timedb as td import pandas as pd import pint from datetime import datetime, timezone, timedelta # 1. Create schema td.create() # 2. Prepare data ureg = pint.UnitRegistry() times = pd.date_range( start=datetime.now(timezone.utc), periods=24, freq='H', tz='UTC' ) df = pd.DataFrame({ "valid_time": times, "power": [100.0 + i for i in range(24)] * ureg.kW, "temperature": [20.0 + i*0.5 for i in range(24)] * ureg.degC, }) # 3. Insert data result = td.insert_batch( df=df, workflow_id="forecast-v1", run_params={"model": "wind-forecast-v2", "version": "1.0"} ) # 4. Read data df_read = td.read( series_id=result.series_ids['power'], start_valid=times[0], end_valid=times[-1] ) print(df_read.head())