FOCUS: Effective Embedding Initialization for Specializing Pretrained Multilingual Models on a Single Language Paper • 2305.14481 • Published May 23, 2023 • 2
AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings Paper • 2505.20133 • Published May 26, 2025 • 1
kd-shared/fineweb-CC-MAIN-2023-50-and-CC-MAIN-2024-10-meta-llama_Llama-2-7b-hf Updated May 19, 2024 • 22
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token Paper • 2412.06676 • Published Dec 9, 2024 • 9
konstantindobler/mistral7b-ar-tokenizer-swap-pure-bf16 Text Generation • 7B • Updated Aug 23, 2024 • 4
konstantindobler/mistral7b-de-tokenizer-swap-mixed-bf16 Text Generation • 7B • Updated Aug 23, 2024 • 6