| --- |
| license: mit |
| tags: |
| - Decision Transformer |
| - Heating-Ventilation-Air Conditioning(HVAC) |
| - Docker |
| - Energy Plus |
| - Generalisation, General Models |
| - Transfer Learning and Zero Shot |
| --- |
| # One for All: LLM guided zero-shot HVAC control |
| ## Abstract |
|
|
| HVAC controllers are widely deployed across buildings with different layouts, sensing configurations, climates, and occupancy |
| patterns. In practice, controllers tuned for one building often degrade when applied to another, leading to inconsistent energy efficiency and occupant comfort. Many learning-based HVAC control |
| methods rely on building-specific training, retraining, or expert |
| intervention, which is often impractical or costly at scale. |
| To address these challenges, we present Gen-HVAC, an LLM- |
| guided, zero-shot HVAC control platform for multi-zone buildings |
| that is trained once and deployed across diverse buildings without |
| retraining. We design a transformer-based HVAC controller that |
| is trained on historical building operation data collected across |
| multiple buildings and generates control actions by conditioning |
| on recent system behavior rather than building-specific models. |
| By using recent temperature measurements, past control actions, |
| and observed system responses, the controller generates HVAC |
| control actions that transfer across buildings and climates with- |
| out retraining, enabling the same model to scale to new buildings. |
| To further improve occupant comfort, we integrate a lightweight |
| language model that allows users to specify comfort preferences |
| directly, without requiring human expertise, manual rule design, |
| or paid external APIs. The system translates these preferences into |
| control objectives that guide the controller without interfering with |
| system dynamics or real-time control. By conditioning on these |
| objectives, the controller switches between operating modes, such |
| as energy-focused or comfort-focused behavior. |
| We evaluate Gen-HVAC across multiple climates and building |
| scenarios using EnergyPlus and validate the system in a real build- |
| ing deployment. Results show consistent improvements over rule- |
| based control, achieving 36.8% energy savings with 28% comfort |
| performance under zero-shot deployment. We also release our plat- |
| form to support reproducibility and enable future research on scal- |
| able, data-driven HVAC control. |
|
|
| ---- |
|  |
|
|
| ## System Architecture, Training and Implementation |
|
|
| We devided this entire project into 5 phases. Please go through each step to use our system. |
|
|
|
|
| ### Energy Plus Setup |
| For this project we use [Sinergym](https://sinergym.readthedocs.io/en/latest/pages/installation.html) and [EnergyPlus](). Sinergym has also a prebuilt docker version, you can find it [here](https://ugr-sail.github.io/sinergym/compilation/v2.1.0/pages/installation.html). For installation: |
| ```bash |
| docker pull ghcr.io/ugr-sail/sinergym:2.4.0 |
| ``` |
|
|
| After this run the docker container and see next steps |
|
|
| ```bash |
| docker run -it \ |
| --name genhvac_container \ |
| -v $(pwd):/workspace \ |
| ghcr.io/ugr-sail/sinergym:2.4.0 \ |
| /bin/bash |
| ``` |
|
|
|  |
| ### Data generation |
| Trajectory generation is executed via the rollout runner coupled with a behavior policy. Use the data-generation script together with the rollout runner to generate |
| temporally consistent data across different buildings, climates, and envelope/occupancy variants. You can generate datasets using rule-based controllers, |
| learned policies, MPC-style rules, or real-building logs such as Ecobee traces, and the same pipeline will serialize them into a unified trajectory format. |
| The provided rollout utilities support targeted generation for a specific location or building type, as well as generation that mixes envelope variants, weather files, |
| and building archetypes to construct large, diverse training corpora. |
|
|
| All of this needs to be run inside a docker container which has energy plus. If you already have energy plus setup as a native software then you can simply adopt the synergym setup, and |
| generate the sequential training data. |
|
|
| ```bash |
| # Inside Docker container |
| cd /workspace |
| |
| python trajectory_generator.py \ |
| --manifest patched_reference_data_base/OfficeSmall/reference_database.json \ |
| --output_dir dataset \ |
| --behavior seasonal_reactive \ |
| --time_freq 900 \ |
| ``` |
|
|
| Optional multi-building combinations: |
|
|
| ```bash |
| python trajectory_generator.py \ |
| --manifest patched_reference_data_base/OfficeMedium/reference_database.json \ |
| --combine_climates True \ |
| --combine_envelopes True \ |
| --output_dir dataset_large |
| ``` |
|
|
| Each episode is stored as compressed `.npz`: |
|
|
| ``` |
| dataset/ |
| βββ OfficeSmall__Buffalo__standard__episode_001.npz |
| βββ OfficeSmall__Dubai__high_internal__episode_002.npz |
| βββ metadata.json |
| ``` |
|
|
| Each file contains: |
|
|
| ```python |
| { |
| "observations": np.ndarray(T, state_dim), |
| "actions": np.ndarray(T, action_dim), |
| "rewards": np.ndarray(T), |
| "state_keys": list, |
| "action_keys": list, |
| "meta": dict |
| } |
| ``` |
|
|
| Temporal resolution: 15 minutes |
| Episode length: 35040 timesteps (1 simulation year) |
|
|
| ### Training Phase |
|
|
| After data generation, you can proceed to the training phase. In our experiments, we generated 2,300+ buildingβweatherβpolicy combinations, yielding 3M+ sequential |
| stateβaction transitions. The training pipeline is modular and consists of the dataloader, the Decision Transformer model, the loss modules, and the main training loop. |
|
|
| In most cases, the only required adaptation is mapping your raw sensor observations to the expected schema and defining the corresponding action keys; |
| we provide validated mappings for OfficeSmall STD2013 (5-zone) and OfficeMedium STD2013 (15-zone), and the same interface extends directly to other HOT buildings |
| as well as Ecobee or other real-building datasets. The training implementation is designed for generalization and zero-shot transfer. |
| Scaling to larger and more diverse building types primarily requires increasing model capacity (d_model, layers, heads), the embedding and loss structure can remain |
| unchanged. It supports heterogeneous buildings, zone counts, and sensing modalities. |
| We condition the policy on multi-objective return-to-go (RTG) targets for energy and comfort, and optionally apply Top-K filtering/selection by RTG to bias training |
| toward higher-quality sub-trajectories, enabling the model to learn how different action sequences causally trade off energy consumption and comfort outcomes. |
| |
| |
| ### LLM deployment phase |
| Gen-HVAC supports LLM + Digital Human-in-the-Loop (DHIL) layer that modulates preference/RTG targets and high-level |
| constraints. For local LLM hosting, install Ollama, pull a quantized model ,and launch the service. |
| |
| On Linux/macOS you can install Ollama via curl -fsSL https://ollama.com/install.sh | sh, start the daemon with ollama serve (leave it running), and pull recommended models using ollama pull deepseek-r1:7b (lightweight reasoning), ollama pull llama3.1:8b (strong general instruction-following), ollama pull qwen2.5:7b (efficient general model), or ollama pull mistral:instruct (fast instruct model). If you want a slightly heavier but still practical model, ollama pull deepseek-r1:14b or ollama pull qwen2.5:14b. |
| In our testing we choose Deepseek R1. |
| |
| Once pulled, run deepseek-r1:7b with Ollama, then in another terminal point your Gen-HVAC LLM client to the default endpoint and run your integration from the llm/ folder (e.g., python -m llm.server --host 0.0.0.0 --port 8000 and python -m llm.client --base_url http://localhost:xxxx --model deepseek-r1:7b. |
| After the LLM endpoint is up, you can proceed to the inference server step to bind the persona/prompt layer to RTG conditioning and the control loop in one end to end pipeline. |
|
|
| ### Inference |
| During inference, we deploy Gen-HVAC as a stateless HTTP microservice that loads the trained Decision Transformer checkpoint and normalization statistics at startup, |
| maintains a short autoregressive context window internally, and returns multi-zone heating/cooling setpoints per control step. |
| In our experiments, EnergyPlus/Sinergym executes inside the Docker container, while the inference service runs on the host/server (CPU/GPU), so the simulator can stream observation vectors to POST /predict (payload: {step, obs, info}) and receive an action vector in the response, with POST /reset |
| used to clear policy history at episode boundaries. When enabled, the DHIL module queries a local Ollama endpoint and updates the comfort RTG target at a low frequency (e.g., every 4 steps). |
|
|
| ## Repository Structure |
|
|
|
|
| ```text |
| Gen-HVAC:Controller/ |
| β |
| βββ data_generator/ # Data generation pipeline (EnergyPlus/Sinergym rollouts) |
| β βββ rollout_runner.py # Runs rollouts for selected building-weather configs and logs outputs |
| β βββ trajectory_generator.py # Creates trajectory datasets from rollouts |
| β |
| βββ evaluation/ # Evaluation scripts |
| β |
| βββ Inference_&_LLM/ # Inference + LLM/DHIL (Digital Human-in-the-Loop) components |
| β βββ inference.py # Runs local inference (loads model + produces actions) |
| β βββ inference_server.py # Server wrapper for inference (API-based deployment) |
| β βββ digital_human_in_the_loop.py # DHIL logic |
| β βββ llm_client/ # LLM client utilities |
| β βββ ... |
| β |
| βββ Model/ # Saved model checkpoints + configs |
| β βββ Model_V1/ |
| β β βββ last.pt # Model checkpoint |
| β β βββ model_config.json # Training/model parameters |
| β β βββ report.json |
| β βββ Model_V2/ ... |
| β βββ Model_V3/ ... |
| β |
| βββ training/ # Training code (DT model, embeddings, losses, trainer) |
| β βββ data_loader.py # Loads trajectories, builds tokens/batches, normalization, RTG, etc. |
| β βββ embeddings.py # Feature encoders + token embeddings (zone/global/RTG encodings) |
| β βββ losses.py # Action loss + auxiliary losses (physics/value/etc. if enabled) |
| β βββ training.py # Main training entry point (train loop, checkpoints, logging) |
| β |
| βββ utilities/ # Shared utilities used across data-gen/training/eval/inference |
| βββ comfort.py # Comfort metric helpers |
| βββ data_generator.py # Shared dataset helpers / schema utilities |
| βββ policy.py # Policy wrappers (DT policy interface, action post-processing) |
| βββ rewards.py |
| βββ rollout.py # Rollout utilities (env stepping, logging, post-processing) |
| βββ tables.py |
| |
| |