DialLM GSPO checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Post-SFT RL
Jordan Painter
jordanpainter
AI & ML interests
None yet
Recent Activity
updated a collection about 7 hours ago
DialLM Datasets updated a collection about 7 hours ago
DialLM Datasets updated a collection about 7 hours ago
DialLM DatasetsOrganizations
DialLM GRPO
Group Relative Policy Optimization fine-tunes for DialLM across Gemma, Llama, and Qwen models, covering all dialect variants.
DialLM SFT
DialLM SFT checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Pre-RL alignment.
DialLM GSPO 🐙
DialLM GSPO checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Post-SFT RL
DialLM CPT 🌍
Continual pre-training checkpoints using ICE for DialLM across Gemma, Llama, and Qwen base models.
DialLM GRPO
Group Relative Policy Optimization fine-tunes for DialLM across Gemma, Llama, and Qwen models, covering all dialect variants.
DialLM DPO
DialLM DPO checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Post-SFT preference alignment.
DialLM SFT
DialLM SFT checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Pre-RL alignment.
DialLM Datasets