LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts
Abstract
LM-Lexicon improves definition modeling through data clustering, semantic expert learning, and sparse mixture-of-experts architecture, achieving higher BLEU scores and better expert specialization.
We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.
Community
We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.
Accepted by EACL 2026 (Oral)
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts (2025)
- DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation (2026)
- CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning (2026)
- MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models (2026)
- LELA: an LLM-based Entity Linking Approach with Zero-Shot Domain Adaptation (2026)
- Token-Level LLM Collaboration via FusionRoute (2026)
- SemPA: Improving Sentence Embeddings of Large Language Models through Semantic Preference Alignment (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 10
Browse 10 models citing this paperDatasets citing this paper 5
Browse 5 datasets citing this paperSpaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper