Add DomainTransformerForCausalLM — GPT-style NoPE model with SDPA attention, weight tying, HF Trainer compatible 0dec8e4 verified rtferraz commited on 8 days ago