Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
1
1
4
Yonghua Lin
Yonghua
Follow
ldwang's profile picture
Tkaegypt's profile picture
21world's profile picture
3 followers
Β·
1 following
AI & ML interests
None yet
Recent Activity
new
activity
15 days ago
deepseek-ai/DeepSeek-V4-Flash:
Run DeepSeek-V4-Flash on more hardware: FP8/BF16 adapted versions for 8 AI chips (ready to download)
posted
an
update
15 days ago
π Run DeepSeek V4 on more AI GPUs with FlagOS DeepSeek V4 just dropped with huge specs: 1.6T params, 1M context, MIT license. But thereβs a catch: the official weights use FP4+FP8 mixed precision, which mainly targets NVIDIA Blackwell / B200-class GPUs. So we built DeepSeek-V4-FlagOS. On Day 0, the FlagOS community completed multi-chip adaptation across 8 AI hardware platforms: β NVIDIA H100/H20 β FP8/BF16 β Huawei Ascend β BF16 β Hygon DCU β BF16 β MetaX GPU β BF16 β Moore Threads MTT S5000 β FP8 β Kunlunxin XPU β BF16 β T-Head/Alibaba Zhenwu β BF16 β Iluvatar GPU β BF16 π§ What makes it work? 1οΈβ£ FlagGems operator replacement DeepSeek V4 operators β MoE routing, Attention, RMSNorm and more β are reimplemented with Triton, reducing dependency on CUDA-specific libraries. New V4 operators include: Act Quant, hc_split_sinkhorn, FP8 MatMul, Sparse Attention, Hadamard Transform. 2οΈβ£ Flexible tensor parallelism DeepSeek V4 uses o_groups=8, which can limit TP. We added an independent communication group for o-groups, while allowing the rest of the model to scale to higher TP, enabling deployment on 32GB/64GB cards. 3οΈβ£ FP4 β BF16 conversion For hardware without native FP4, we provide ready-to-use BF16 conversion and pre-converted model releases. π¦ Pre-converted models are available on Hugging Face: V4-Pro: FlagRelease/DeepSeek-V4-Pro-nvidia-FlagOS FlagRelease/DeepSeek-V4-Pro-metax-FlagOS FlagRelease/DeepSeek-V4-Pro-mthreads-FlagOS FlagRelease/DeepSeek-V4-Pro-hygon-FlagOS FlagRelease/DeepSeek-V4-Pro-ascend-FlagOS V4-Flash: FlagRelease/DeepSeek-V4-Flash-nvidia-FlagOS FlagRelease/DeepSeek-V4-Flash-zhenwu-FlagOS FlagRelease/DeepSeek-V4-Flash-kunlunxin-FlagOS FlagRelease/DeepSeek-V4-Flash-iluvatar-FlagOS β‘ Performance on NVIDIA H20, V4-Flash FP8: FlagGems C++ Wrapper + Triton: 70.7 tok/s DeepSeek TileLang: 62.99 tok/s Thatβs 12.24% faster. π Try it here: https://github.com/flagos-ai/DeepSeek-V4-FlagOS Open models should run on open infrastructure
authored
a paper
8 months ago
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions
View all activity
Organizations
Yonghua
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
2 datasets
over 1 year ago
BAAI/Infinity-MM
Updated
Dec 13, 2024
β’
36.9k
β’
120
Zoooora/BullyingAct
Viewer
β’
Updated
Oct 14, 2024
β’
103
β’
45
β’
1
liked
a dataset
almost 2 years ago
nyu-visionx/Cambrian-10M
Preview
β’
Updated
Jul 8, 2024
β’
6.42k
β’
128
liked
a dataset
over 2 years ago
BAAI/CCI-Data
Updated
Dec 17, 2024
β’
104
β’
68