File size: 7,988 Bytes
b7720f0 8454772 b7720f0 137fee0 b7720f0 38a81d4 68ab9fa a6c78e4 b7720f0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | ---
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- gravityllm
- spatial-audio
- immersive-audio
- spatial9
- iamf
- instruction-tuning
- json
- lora
- qlora
- peft
- transformers
widget:
- text: |-
INPUT:
{
"target_format": "iamf",
"max_objects": 10,
"style": "club",
"section": "drop",
"global": {"bpm": 128, "energy": 0.92},
"stems": [
{"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
{"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
],
"rules": [
{"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"type": "mono_low_end", "hz_below": 120}
]
}
---

# GravityLLM
GravityLLM is a compact instruction-tuned model for **constraint-conditioned spatial scene generation**.
It turns **music constraints + stem descriptors** into strict **Spatial9Scene JSON** for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows.
> **Status**
> This repository is **training-ready and Hub-ready**.
> This includes code, schema, sample data, evaluation, and upload helpers.
> It does **not** include fine-tuned weights yet. After training, upload the contents of your `outputs/...` folder as the actual model repo.
Demo at **[https://spatial9.ai/demo](https://spatial9.ai/demo)**
## What you will find in this repo
- Proper instruction fine-tuning with **prompt masking**, so the loss is applied to the target JSON instead of the instruction prefix.
- **LoRA** and **QLoRA** training paths for efficient fine-tuning on small-to-medium GPUs.
- Strict **JSON Schema** validation for production-safe outputs.
- Built-in **evaluation** for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate.
- Clean **Hugging Face upload** helper with `upload_folder`.
- Ready-made **sample data**, **sample scene**, and **recommended training config**.
## Model contract
### Input
A structured payload describing:
- target format
- object budget
- style and section
- per-stem descriptors
- hard rules such as anchors, low-end centering, width targets, and masking constraints
### Output
A single valid JSON object matching `schemas/scene.schema.json`.
### Example input
```json
{
"target_format": "iamf",
"max_objects": 10,
"style": "club",
"section": "drop",
"global": {"bpm": 128, "energy": 0.92},
"stems": [
{"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
{"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
],
"rules": [
{"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"type": "mono_low_end", "hz_below": 120}
]
}
```
### Example output
```json
{
"version": "1.0",
"bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"},
"objects": [
{
"id": "v1",
"class": "lead_vocal",
"az_deg": 0,
"el_deg": 10,
"dist_m": 1.6,
"width": 0.15,
"gain_db": 0.0,
"reverb_send": 0.18,
"early_reflections": 0.22,
"motion": [
{"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}
]
}
],
"constraints_applied": [
"anchor:lead_vocal@0/10/1.6",
"mono_low_end<120Hz"
]
}
```
## Repository layout
```text
GravityLLM-HuggingFace-Repo/
βββ README.md
βββ LICENSE
βββ Makefile
βββ pyproject.toml
βββ requirements.txt
βββ train.py
βββ infer.py
βββ evaluate.py
βββ upload_to_hub.py
βββ assets/
β βββ gravityllm_banner.svg
βββ configs/
β βββ recommended_train_args.json
βββ data/
β βββ train.jsonl
β βββ valid.jsonl
βββ examples/
β βββ sample_input.json
β βββ sample_output.json
βββ schemas/
β βββ scene.schema.json
βββ scripts/
β βββ push_to_hub.sh
β βββ train_qlora.sh
βββ tools/
βββ make_synthetic_dataset.py
βββ validate_scene.py
```
## Quick start
### 1) Install
```bash
python -m pip install -r requirements.txt
```
### 2) Train with QLoRA
```bash
bash scripts/train_qlora.sh
```
Or run directly:
```bash
python train.py --model Qwen/Qwen2.5-1.5B-Instruct --train_file data/train.jsonl --valid_file data/valid.jsonl --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --max_length 2048 --num_train_epochs 3 --learning_rate 2e-4 --train_batch_size 1 --eval_batch_size 1 --gradient_accumulation_steps 16 --warmup_ratio 0.03 --save_steps 100 --eval_steps 100 --qlora --bf16
```
### 3) Generate a scene
```bash
python infer.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --input_json examples/sample_input.json --validate --output_json outputs/sample_prediction.json
```
### 4) Evaluate
```bash
python evaluate.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --data_file data/valid.jsonl --report_path reports/eval_report.json
```
### 5) Validate any output
```bash
python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json
```
## Push to the Hugging Face Hub
### From a trained output folder
```bash
python upload_to_hub.py --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9 --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
```
### Or with the helper script
```bash
bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
```
## Dataset format
Training files are JSONL with two fields per row:
```json
{
"prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}",
"completion": "{... valid Spatial9Scene JSON ...}"
}
```
The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible.
## Recommended data strategy
For a strong first release:
1. Collect a few hundred high-quality gold examples from expert-authored scenes.
2. Keep the schema stable and quantized.
3. Encode hard rules explicitly instead of relying on vague prose.
4. Run evaluation after every fine-tune.
5. Add a post-processor to enforce hard constraints if the runtime must be deterministic.
## Suggested training roadmap
### v0
- Small curated dataset
- QLoRA adapter
- Schema-valid JSON only
- Anchor and budget constraints
### v1
- More genres and sections
- Better masking and width rules
- Object motion patterns
- Automatic validation and repair loop
### v2
- Preference tuning on human A/B judgments
- A dedicated reward signal for clarity, masking avoidance, and translation safety
## Intended use
GravityLLM is designed for:
- music-tech pipelines
- Spatial9 scene authoring
- assisted immersive-audio layout generation
- IAMF-ready authoring workflows
- renderer-side JSON generation
## Limitations
- This repo does not include trained weights out of the box.
- The model only knows what you teach it through your dataset.
- Raw audio is not consumed directly here; the training pipeline expects structured stem features.
- Production systems should still validate outputs and optionally apply a rule-based correction pass.
## Safety and reliability
- Always validate generated scenes against the JSON schema.
- Keep low-end centering as a hard rule outside the model if that is non-negotiable.
- Treat the model as a scene proposal engine, not an oracle.
## License
This repository is released under Apache-2.0.
|