GravityLLM / README.md
lzanardos9's picture
Update README.md
38a81d4 verified
---
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- gravityllm
- spatial-audio
- immersive-audio
- spatial9
- iamf
- instruction-tuning
- json
- lora
- qlora
- peft
- transformers
widget:
- text: |-
INPUT:
{
"target_format": "iamf",
"max_objects": 10,
"style": "club",
"section": "drop",
"global": {"bpm": 128, "energy": 0.92},
"stems": [
{"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
{"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
],
"rules": [
{"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"type": "mono_low_end", "hz_below": 120}
]
}
---
![GravityLLM banner](assets/heads9.png)
# GravityLLM
GravityLLM is a compact instruction-tuned model for **constraint-conditioned spatial scene generation**.
It turns **music constraints + stem descriptors** into strict **Spatial9Scene JSON** for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows.
> **Status**
> This repository is **training-ready and Hub-ready**.
> This includes code, schema, sample data, evaluation, and upload helpers.
> It does **not** include fine-tuned weights yet. After training, upload the contents of your `outputs/...` folder as the actual model repo.
Demo at **[https://spatial9.ai/demo](https://spatial9.ai/demo)**
## What you will find in this repo
- Proper instruction fine-tuning with **prompt masking**, so the loss is applied to the target JSON instead of the instruction prefix.
- **LoRA** and **QLoRA** training paths for efficient fine-tuning on small-to-medium GPUs.
- Strict **JSON Schema** validation for production-safe outputs.
- Built-in **evaluation** for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate.
- Clean **Hugging Face upload** helper with `upload_folder`.
- Ready-made **sample data**, **sample scene**, and **recommended training config**.
## Model contract
### Input
A structured payload describing:
- target format
- object budget
- style and section
- per-stem descriptors
- hard rules such as anchors, low-end centering, width targets, and masking constraints
### Output
A single valid JSON object matching `schemas/scene.schema.json`.
### Example input
```json
{
"target_format": "iamf",
"max_objects": 10,
"style": "club",
"section": "drop",
"global": {"bpm": 128, "energy": 0.92},
"stems": [
{"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
{"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
],
"rules": [
{"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"type": "mono_low_end", "hz_below": 120}
]
}
```
### Example output
```json
{
"version": "1.0",
"bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"},
"objects": [
{
"id": "v1",
"class": "lead_vocal",
"az_deg": 0,
"el_deg": 10,
"dist_m": 1.6,
"width": 0.15,
"gain_db": 0.0,
"reverb_send": 0.18,
"early_reflections": 0.22,
"motion": [
{"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}
]
}
],
"constraints_applied": [
"anchor:lead_vocal@0/10/1.6",
"mono_low_end<120Hz"
]
}
```
## Repository layout
```text
GravityLLM-HuggingFace-Repo/
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ Makefile
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ train.py
β”œβ”€β”€ infer.py
β”œβ”€β”€ evaluate.py
β”œβ”€β”€ upload_to_hub.py
β”œβ”€β”€ assets/
β”‚ └── gravityllm_banner.svg
β”œβ”€β”€ configs/
β”‚ └── recommended_train_args.json
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ train.jsonl
β”‚ └── valid.jsonl
β”œβ”€β”€ examples/
β”‚ β”œβ”€β”€ sample_input.json
β”‚ └── sample_output.json
β”œβ”€β”€ schemas/
β”‚ └── scene.schema.json
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ push_to_hub.sh
β”‚ └── train_qlora.sh
└── tools/
β”œβ”€β”€ make_synthetic_dataset.py
└── validate_scene.py
```
## Quick start
### 1) Install
```bash
python -m pip install -r requirements.txt
```
### 2) Train with QLoRA
```bash
bash scripts/train_qlora.sh
```
Or run directly:
```bash
python train.py --model Qwen/Qwen2.5-1.5B-Instruct --train_file data/train.jsonl --valid_file data/valid.jsonl --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --max_length 2048 --num_train_epochs 3 --learning_rate 2e-4 --train_batch_size 1 --eval_batch_size 1 --gradient_accumulation_steps 16 --warmup_ratio 0.03 --save_steps 100 --eval_steps 100 --qlora --bf16
```
### 3) Generate a scene
```bash
python infer.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --input_json examples/sample_input.json --validate --output_json outputs/sample_prediction.json
```
### 4) Evaluate
```bash
python evaluate.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --data_file data/valid.jsonl --report_path reports/eval_report.json
```
### 5) Validate any output
```bash
python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json
```
## Push to the Hugging Face Hub
### From a trained output folder
```bash
python upload_to_hub.py --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9 --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
```
### Or with the helper script
```bash
bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
```
## Dataset format
Training files are JSONL with two fields per row:
```json
{
"prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}",
"completion": "{... valid Spatial9Scene JSON ...}"
}
```
The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible.
## Recommended data strategy
For a strong first release:
1. Collect a few hundred high-quality gold examples from expert-authored scenes.
2. Keep the schema stable and quantized.
3. Encode hard rules explicitly instead of relying on vague prose.
4. Run evaluation after every fine-tune.
5. Add a post-processor to enforce hard constraints if the runtime must be deterministic.
## Suggested training roadmap
### v0
- Small curated dataset
- QLoRA adapter
- Schema-valid JSON only
- Anchor and budget constraints
### v1
- More genres and sections
- Better masking and width rules
- Object motion patterns
- Automatic validation and repair loop
### v2
- Preference tuning on human A/B judgments
- A dedicated reward signal for clarity, masking avoidance, and translation safety
## Intended use
GravityLLM is designed for:
- music-tech pipelines
- Spatial9 scene authoring
- assisted immersive-audio layout generation
- IAMF-ready authoring workflows
- renderer-side JSON generation
## Limitations
- This repo does not include trained weights out of the box.
- The model only knows what you teach it through your dataset.
- Raw audio is not consumed directly here; the training pipeline expects structured stem features.
- Production systems should still validate outputs and optionally apply a rule-based correction pass.
## Safety and reliability
- Always validate generated scenes against the JSON schema.
- Keep low-end centering as a hard rule outside the model if that is non-negotiable.
- Treat the model as a scene proposal engine, not an oracle.
## License
This repository is released under Apache-2.0.