--- language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen2.5-1.5B-Instruct tags: - gravityllm - spatial-audio - immersive-audio - spatial9 - iamf - instruction-tuning - json - lora - qlora - peft - transformers widget: - text: |- INPUT: { "target_format": "iamf", "max_objects": 10, "style": "club", "section": "drop", "global": {"bpm": 128, "energy": 0.92}, "stems": [ {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95}, {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25} ], "rules": [ {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, {"type": "mono_low_end", "hz_below": 120} ] } --- ![GravityLLM banner](assets/heads9.png) # GravityLLM GravityLLM is a compact instruction-tuned model for **constraint-conditioned spatial scene generation**. It turns **music constraints + stem descriptors** into strict **Spatial9Scene JSON** for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows. > **Status** > This repository is **training-ready and Hub-ready**. > This includes code, schema, sample data, evaluation, and upload helpers. > It does **not** include fine-tuned weights yet. After training, upload the contents of your `outputs/...` folder as the actual model repo. Demo at **[https://spatial9.ai/demo](https://spatial9.ai/demo)** ## What you will find in this repo - Proper instruction fine-tuning with **prompt masking**, so the loss is applied to the target JSON instead of the instruction prefix. - **LoRA** and **QLoRA** training paths for efficient fine-tuning on small-to-medium GPUs. - Strict **JSON Schema** validation for production-safe outputs. - Built-in **evaluation** for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate. - Clean **Hugging Face upload** helper with `upload_folder`. - Ready-made **sample data**, **sample scene**, and **recommended training config**. ## Model contract ### Input A structured payload describing: - target format - object budget - style and section - per-stem descriptors - hard rules such as anchors, low-end centering, width targets, and masking constraints ### Output A single valid JSON object matching `schemas/scene.schema.json`. ### Example input ```json { "target_format": "iamf", "max_objects": 10, "style": "club", "section": "drop", "global": {"bpm": 128, "energy": 0.92}, "stems": [ {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95}, {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25} ], "rules": [ {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, {"type": "mono_low_end", "hz_below": 120} ] } ``` ### Example output ```json { "version": "1.0", "bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"}, "objects": [ { "id": "v1", "class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6, "width": 0.15, "gain_db": 0.0, "reverb_send": 0.18, "early_reflections": 0.22, "motion": [ {"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, {"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6} ] } ], "constraints_applied": [ "anchor:lead_vocal@0/10/1.6", "mono_low_end<120Hz" ] } ``` ## Repository layout ```text GravityLLM-HuggingFace-Repo/ ├── README.md ├── LICENSE ├── Makefile ├── pyproject.toml ├── requirements.txt ├── train.py ├── infer.py ├── evaluate.py ├── upload_to_hub.py ├── assets/ │ └── gravityllm_banner.svg ├── configs/ │ └── recommended_train_args.json ├── data/ │ ├── train.jsonl │ └── valid.jsonl ├── examples/ │ ├── sample_input.json │ └── sample_output.json ├── schemas/ │ └── scene.schema.json ├── scripts/ │ ├── push_to_hub.sh │ └── train_qlora.sh └── tools/ ├── make_synthetic_dataset.py └── validate_scene.py ``` ## Quick start ### 1) Install ```bash python -m pip install -r requirements.txt ``` ### 2) Train with QLoRA ```bash bash scripts/train_qlora.sh ``` Or run directly: ```bash python train.py --model Qwen/Qwen2.5-1.5B-Instruct --train_file data/train.jsonl --valid_file data/valid.jsonl --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --max_length 2048 --num_train_epochs 3 --learning_rate 2e-4 --train_batch_size 1 --eval_batch_size 1 --gradient_accumulation_steps 16 --warmup_ratio 0.03 --save_steps 100 --eval_steps 100 --qlora --bf16 ``` ### 3) Generate a scene ```bash python infer.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --input_json examples/sample_input.json --validate --output_json outputs/sample_prediction.json ``` ### 4) Evaluate ```bash python evaluate.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --data_file data/valid.jsonl --report_path reports/eval_report.json ``` ### 5) Validate any output ```bash python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json ``` ## Push to the Hugging Face Hub ### From a trained output folder ```bash python upload_to_hub.py --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9 --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9 ``` ### Or with the helper script ```bash bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9 ``` ## Dataset format Training files are JSONL with two fields per row: ```json { "prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}", "completion": "{... valid Spatial9Scene JSON ...}" } ``` The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible. ## Recommended data strategy For a strong first release: 1. Collect a few hundred high-quality gold examples from expert-authored scenes. 2. Keep the schema stable and quantized. 3. Encode hard rules explicitly instead of relying on vague prose. 4. Run evaluation after every fine-tune. 5. Add a post-processor to enforce hard constraints if the runtime must be deterministic. ## Suggested training roadmap ### v0 - Small curated dataset - QLoRA adapter - Schema-valid JSON only - Anchor and budget constraints ### v1 - More genres and sections - Better masking and width rules - Object motion patterns - Automatic validation and repair loop ### v2 - Preference tuning on human A/B judgments - A dedicated reward signal for clarity, masking avoidance, and translation safety ## Intended use GravityLLM is designed for: - music-tech pipelines - Spatial9 scene authoring - assisted immersive-audio layout generation - IAMF-ready authoring workflows - renderer-side JSON generation ## Limitations - This repo does not include trained weights out of the box. - The model only knows what you teach it through your dataset. - Raw audio is not consumed directly here; the training pipeline expects structured stem features. - Production systems should still validate outputs and optionally apply a rule-based correction pass. ## Safety and reliability - Always validate generated scenes against the JSON schema. - Keep low-end centering as a hard rule outside the model if that is non-negotiable. - Treat the model as a scene proposal engine, not an oracle. ## License This repository is released under Apache-2.0.