1 8 24

Derek Lewis

delewis

AI & ML interests

None yet

Recent Activity

liked a model 22 days ago

nvidia/NVIDIA-Nemotron-Nano-12B-v2

liked a model 22 days ago

nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD

liked a model 24 days ago

Qwen/Qwen3-VL-30B-A3B-Thinking

View all activity

Organizations

liked 2 models 22 days ago

nvidia/NVIDIA-Nemotron-Nano-12B-v2

Text Generation • Updated Nov 25, 2025 • 18.9k • • 155

nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD

Image-Text-to-Text • Updated Nov 13, 2025 • 5.15k • 20

liked a model 24 days ago

Qwen/Qwen3-VL-30B-A3B-Thinking

Image-Text-to-Text • Updated Nov 26, 2025 • 169k • • 191

upvoted an article 25 days ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

25 days ago

•

139

upvoted an article 9 months ago

Article

The Transformers Library: standardizing model definitions

May 15, 2025

•

121

liked 4 models over 1 year ago

upvoted a paper over 1 year ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12, 2024 • 17

upvoted an article over 1 year ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

Aug 21, 2024

•

upvoted a paper over 1 year ago

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13, 2024 • 68

upvoted 2 articles over 1 year ago

Article

Introduction to ggml

Aug 13, 2024

•

266

Article

Welcome Falcon Mamba: The first strong attention-free 7B model

Aug 12, 2024

•

113

liked a model over 1 year ago

mistralai/Mistral-7B-Instruct-v0.3

7B • Updated Dec 3, 2025 • 1.21M • 2.43k

updated 2 models almost 2 years ago

delewis/gemma-2b-no-peft-eos-issue

Text Generation • 3B • Updated May 10, 2024 • 2

delewis/gemma-2b-peft-eos-issue

Updated May 10, 2024

liked 2 models almost 2 years ago

meta-llama/Meta-Llama-3-8B

Text Generation • 8B • Updated Sep 27, 2024 • 1.75M • • 6.46k

google/gemma-2b

Text Generation • 3B • Updated Sep 27, 2024 • 153k • 1.15k

liked a model about 2 years ago

meta-llama/Llama-2-7b-hf

Text Generation • 7B • Updated Apr 17, 2024 • 457k • 2.27k

Derek Lewis

AI & ML interests

Recent Activity

Organizations

delewis's activity

We Got Claude to Build CUDA Kernels and teach open models!

The Transformers Library: standardizing model definitions

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

Introduction to ggml

Welcome Falcon Mamba: The first strong attention-free 7B model