| --- |
| language: |
| - en |
| license: apache-2.0 |
| library_name: transformers |
| pipeline_tag: text-generation |
| base_model: Qwen/Qwen2.5-1.5B-Instruct |
| tags: |
| - gravityllm |
| - spatial-audio |
| - immersive-audio |
| - spatial9 |
| - iamf |
| - instruction-tuning |
| - json |
| - lora |
| - qlora |
| - peft |
| - transformers |
| widget: |
| - text: |- |
| INPUT: |
| { |
| "target_format": "iamf", |
| "max_objects": 10, |
| "style": "club", |
| "section": "drop", |
| "global": {"bpm": 128, "energy": 0.92}, |
| "stems": [ |
| {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95}, |
| {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25} |
| ], |
| "rules": [ |
| {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, |
| {"type": "mono_low_end", "hz_below": 120} |
| ] |
| } |
| --- |
| |
|  |
|
|
| # GravityLLM |
|
|
| GravityLLM is a compact instruction-tuned model for **constraint-conditioned spatial scene generation**. |
| It turns **music constraints + stem descriptors** into strict **Spatial9Scene JSON** for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows. |
|
|
| > **Status** |
| > This repository is **training-ready and Hub-ready**. |
| > This includes code, schema, sample data, evaluation, and upload helpers. |
| > It does **not** include fine-tuned weights yet. After training, upload the contents of your `outputs/...` folder as the actual model repo. |
|
|
| Demo at **[https://spatial9.ai/demo](https://spatial9.ai/demo)** |
|
|
| ## What you will find in this repo |
|
|
| - Proper instruction fine-tuning with **prompt masking**, so the loss is applied to the target JSON instead of the instruction prefix. |
| - **LoRA** and **QLoRA** training paths for efficient fine-tuning on small-to-medium GPUs. |
| - Strict **JSON Schema** validation for production-safe outputs. |
| - Built-in **evaluation** for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate. |
| - Clean **Hugging Face upload** helper with `upload_folder`. |
| - Ready-made **sample data**, **sample scene**, and **recommended training config**. |
|
|
| ## Model contract |
|
|
| ### Input |
| A structured payload describing: |
|
|
| - target format |
| - object budget |
| - style and section |
| - per-stem descriptors |
| - hard rules such as anchors, low-end centering, width targets, and masking constraints |
|
|
| ### Output |
| A single valid JSON object matching `schemas/scene.schema.json`. |
|
|
| ### Example input |
| ```json |
| { |
| "target_format": "iamf", |
| "max_objects": 10, |
| "style": "club", |
| "section": "drop", |
| "global": {"bpm": 128, "energy": 0.92}, |
| "stems": [ |
| {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95}, |
| {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25} |
| ], |
| "rules": [ |
| {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, |
| {"type": "mono_low_end", "hz_below": 120} |
| ] |
| } |
| ``` |
|
|
| ### Example output |
| ```json |
| { |
| "version": "1.0", |
| "bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"}, |
| "objects": [ |
| { |
| "id": "v1", |
| "class": "lead_vocal", |
| "az_deg": 0, |
| "el_deg": 10, |
| "dist_m": 1.6, |
| "width": 0.15, |
| "gain_db": 0.0, |
| "reverb_send": 0.18, |
| "early_reflections": 0.22, |
| "motion": [ |
| {"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, |
| {"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6} |
| ] |
| } |
| ], |
| "constraints_applied": [ |
| "anchor:lead_vocal@0/10/1.6", |
| "mono_low_end<120Hz" |
| ] |
| } |
| ``` |
|
|
| ## Repository layout |
|
|
| ```text |
| GravityLLM-HuggingFace-Repo/ |
| βββ README.md |
| βββ LICENSE |
| βββ Makefile |
| βββ pyproject.toml |
| βββ requirements.txt |
| βββ train.py |
| βββ infer.py |
| βββ evaluate.py |
| βββ upload_to_hub.py |
| βββ assets/ |
| β βββ gravityllm_banner.svg |
| βββ configs/ |
| β βββ recommended_train_args.json |
| βββ data/ |
| β βββ train.jsonl |
| β βββ valid.jsonl |
| βββ examples/ |
| β βββ sample_input.json |
| β βββ sample_output.json |
| βββ schemas/ |
| β βββ scene.schema.json |
| βββ scripts/ |
| β βββ push_to_hub.sh |
| β βββ train_qlora.sh |
| βββ tools/ |
| βββ make_synthetic_dataset.py |
| βββ validate_scene.py |
| ``` |
|
|
| ## Quick start |
|
|
| ### 1) Install |
| ```bash |
| python -m pip install -r requirements.txt |
| ``` |
|
|
| ### 2) Train with QLoRA |
| ```bash |
| bash scripts/train_qlora.sh |
| ``` |
|
|
| Or run directly: |
|
|
| ```bash |
| python train.py --model Qwen/Qwen2.5-1.5B-Instruct --train_file data/train.jsonl --valid_file data/valid.jsonl --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --max_length 2048 --num_train_epochs 3 --learning_rate 2e-4 --train_batch_size 1 --eval_batch_size 1 --gradient_accumulation_steps 16 --warmup_ratio 0.03 --save_steps 100 --eval_steps 100 --qlora --bf16 |
| ``` |
|
|
| ### 3) Generate a scene |
| ```bash |
| python infer.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --input_json examples/sample_input.json --validate --output_json outputs/sample_prediction.json |
| ``` |
|
|
| ### 4) Evaluate |
| ```bash |
| python evaluate.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --data_file data/valid.jsonl --report_path reports/eval_report.json |
| ``` |
|
|
| ### 5) Validate any output |
| ```bash |
| python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json |
| ``` |
|
|
| ## Push to the Hugging Face Hub |
|
|
| ### From a trained output folder |
| ```bash |
| python upload_to_hub.py --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9 --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9 |
| ``` |
|
|
| ### Or with the helper script |
| ```bash |
| bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9 |
| ``` |
|
|
| ## Dataset format |
|
|
| Training files are JSONL with two fields per row: |
|
|
| ```json |
| { |
| "prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}", |
| "completion": "{... valid Spatial9Scene JSON ...}" |
| } |
| ``` |
|
|
| The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible. |
|
|
| ## Recommended data strategy |
|
|
| For a strong first release: |
|
|
| 1. Collect a few hundred high-quality gold examples from expert-authored scenes. |
| 2. Keep the schema stable and quantized. |
| 3. Encode hard rules explicitly instead of relying on vague prose. |
| 4. Run evaluation after every fine-tune. |
| 5. Add a post-processor to enforce hard constraints if the runtime must be deterministic. |
|
|
| ## Suggested training roadmap |
|
|
| ### v0 |
| - Small curated dataset |
| - QLoRA adapter |
| - Schema-valid JSON only |
| - Anchor and budget constraints |
|
|
| ### v1 |
| - More genres and sections |
| - Better masking and width rules |
| - Object motion patterns |
| - Automatic validation and repair loop |
|
|
| ### v2 |
| - Preference tuning on human A/B judgments |
| - A dedicated reward signal for clarity, masking avoidance, and translation safety |
|
|
| ## Intended use |
|
|
| GravityLLM is designed for: |
|
|
| - music-tech pipelines |
| - Spatial9 scene authoring |
| - assisted immersive-audio layout generation |
| - IAMF-ready authoring workflows |
| - renderer-side JSON generation |
|
|
| ## Limitations |
|
|
| - This repo does not include trained weights out of the box. |
| - The model only knows what you teach it through your dataset. |
| - Raw audio is not consumed directly here; the training pipeline expects structured stem features. |
| - Production systems should still validate outputs and optionally apply a rule-based correction pass. |
|
|
| ## Safety and reliability |
|
|
| - Always validate generated scenes against the JSON schema. |
| - Keep low-end centering as a hard rule outside the model if that is non-negotiable. |
| - Treat the model as a scene proposal engine, not an oracle. |
|
|
| ## License |
|
|
| This repository is released under Apache-2.0. |
|
|