File size: 7,988 Bytes

---
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- gravityllm
- spatial-audio
- immersive-audio
- spatial9
- iamf
- instruction-tuning
- json
- lora
- qlora
- peft
- transformers
widget:
- text: |-
    INPUT:
    {
      "target_format": "iamf",
      "max_objects": 10,
      "style": "club",
      "section": "drop",
      "global": {"bpm": 128, "energy": 0.92},
      "stems": [
        {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
        {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
      ],
      "rules": [
        {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
        {"type": "mono_low_end", "hz_below": 120}
      ]
    }
---

![GravityLLM banner](assets/heads9.png)

# GravityLLM

GravityLLM is a compact instruction-tuned model for **constraint-conditioned spatial scene generation**.  
It turns **music constraints + stem descriptors** into strict **Spatial9Scene JSON** for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows.

> **Status**
> This repository is **training-ready and Hub-ready**.  
> This includes code, schema, sample data, evaluation, and upload helpers.  
> It does **not** include fine-tuned weights yet. After training, upload the contents of your `outputs/...` folder as the actual model repo.

Demo at **[https://spatial9.ai/demo](https://spatial9.ai/demo)**

## What you will find in this repo

- Proper instruction fine-tuning with **prompt masking**, so the loss is applied to the target JSON instead of the instruction prefix.
- **LoRA** and **QLoRA** training paths for efficient fine-tuning on small-to-medium GPUs.
- Strict **JSON Schema** validation for production-safe outputs.
- Built-in **evaluation** for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate.
- Clean **Hugging Face upload** helper with `upload_folder`.
- Ready-made **sample data**, **sample scene**, and **recommended training config**.

## Model contract

### Input
A structured payload describing:

- target format
- object budget
- style and section
- per-stem descriptors
- hard rules such as anchors, low-end centering, width targets, and masking constraints

### Output
A single valid JSON object matching `schemas/scene.schema.json`.

### Example input
```json
{
  "target_format": "iamf",
  "max_objects": 10,
  "style": "club",
  "section": "drop",
  "global": {"bpm": 128, "energy": 0.92},
  "stems": [
    {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
    {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
  ],
  "rules": [
    {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
    {"type": "mono_low_end", "hz_below": 120}
  ]
}
```

### Example output
```json
{
  "version": "1.0",
  "bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"},
  "objects": [
    {
      "id": "v1",
      "class": "lead_vocal",
      "az_deg": 0,
      "el_deg": 10,
      "dist_m": 1.6,
      "width": 0.15,
      "gain_db": 0.0,
      "reverb_send": 0.18,
      "early_reflections": 0.22,
      "motion": [
        {"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
        {"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}
      ]
    }
  ],
  "constraints_applied": [
    "anchor:lead_vocal@0/10/1.6",
    "mono_low_end<120Hz"
  ]
}
```

## Repository layout

```text
GravityLLM-HuggingFace-Repo/
├── README.md
├── LICENSE
├── Makefile
├── pyproject.toml
├── requirements.txt
├── train.py
├── infer.py
├── evaluate.py
├── upload_to_hub.py
├── assets/
│   └── gravityllm_banner.svg
├── configs/
│   └── recommended_train_args.json
├── data/
│   ├── train.jsonl
│   └── valid.jsonl
├── examples/
│   ├── sample_input.json
│   └── sample_output.json
├── schemas/
│   └── scene.schema.json
├── scripts/
│   ├── push_to_hub.sh
│   └── train_qlora.sh
└── tools/
    ├── make_synthetic_dataset.py
    └── validate_scene.py
```

## Quick start

### 1) Install
```bash
python -m pip install -r requirements.txt
```

### 2) Train with QLoRA
```bash
bash scripts/train_qlora.sh
```

Or run directly:

```bash
python train.py   --model Qwen/Qwen2.5-1.5B-Instruct   --train_file data/train.jsonl   --valid_file data/valid.jsonl   --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --max_length 2048   --num_train_epochs 3   --learning_rate 2e-4   --train_batch_size 1   --eval_batch_size 1   --gradient_accumulation_steps 16   --warmup_ratio 0.03   --save_steps 100   --eval_steps 100   --qlora --bf16
```

### 3) Generate a scene
```bash
python infer.py   --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --input_json examples/sample_input.json   --validate   --output_json outputs/sample_prediction.json
```

### 4) Evaluate
```bash
python evaluate.py   --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --data_file data/valid.jsonl   --report_path reports/eval_report.json
```

### 5) Validate any output
```bash
python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json
```

## Push to the Hugging Face Hub

### From a trained output folder
```bash
python upload_to_hub.py   --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9   --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
```

### Or with the helper script
```bash
bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
```

## Dataset format

Training files are JSONL with two fields per row:

```json
{
  "prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}",
  "completion": "{... valid Spatial9Scene JSON ...}"
}
```

The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible.

## Recommended data strategy

For a strong first release:

1. Collect a few hundred high-quality gold examples from expert-authored scenes.
2. Keep the schema stable and quantized.
3. Encode hard rules explicitly instead of relying on vague prose.
4. Run evaluation after every fine-tune.
5. Add a post-processor to enforce hard constraints if the runtime must be deterministic.

## Suggested training roadmap

### v0
- Small curated dataset
- QLoRA adapter
- Schema-valid JSON only
- Anchor and budget constraints

### v1
- More genres and sections
- Better masking and width rules
- Object motion patterns
- Automatic validation and repair loop

### v2
- Preference tuning on human A/B judgments
- A dedicated reward signal for clarity, masking avoidance, and translation safety

## Intended use

GravityLLM is designed for:

- music-tech pipelines
- Spatial9 scene authoring
- assisted immersive-audio layout generation
- IAMF-ready authoring workflows
- renderer-side JSON generation

## Limitations

- This repo does not include trained weights out of the box.
- The model only knows what you teach it through your dataset.
- Raw audio is not consumed directly here; the training pipeline expects structured stem features.
- Production systems should still validate outputs and optionally apply a rule-based correction pass.

## Safety and reliability

- Always validate generated scenes against the JSON schema.
- Keep low-end centering as a hard rule outside the model if that is non-negotiable.
- Treat the model as a scene proposal engine, not an oracle.

## License

This repository is released under Apache-2.0.