ct2

ct-2

17 53 28

AI & ML interests

None yet

Recent Activity

upvoted a paper 19 days ago

Variable-Width Transformers

upvoted a paper 19 days ago

Tapered Language Models

upvoted a collection 20 days ago

EdgeRazor-Nbit

View all activity

Organizations

None yet

upvoted 2 papers 19 days ago

Variable-Width Transformers

Paper • 2606.18246 • Published 28 days ago • 16

Tapered Language Models

Paper • 2606.23670 • Published 22 days ago • 10

upvoted a collection 20 days ago

EdgeRazor-Nbit

Collection

16 items • Updated May 7 • 9

upvoted a paper 23 days ago

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Paper • 2606.20381 • Published 26 days ago • 10

upvoted 3 papers about 1 month ago

Kwai Keye-VL-2.0 Technical Report

Paper • 2606.10651 • Published Jun 9 • 192

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

Paper • 2606.03645 • Published May 29 • 5

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Paper • 2606.02553 • Published Jun 1 • 20

upvoted a paper about 2 months ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Paper • 2605.23901 • Published May 22 • 13

upvoted a collection about 2 months ago

BitCPM-CANN

Collection

Full-pipeline ternary quantized model trained on CANN. • 12 items • Updated May 24 • 28

upvoted 3 papers about 2 months ago

upvoted 2 papers 2 months ago

Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published Apr 27 • 74

StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

Paper • 2605.02904 • Published Apr 5 • 8

upvoted a paper 3 months ago

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82

upvoted a collection 3 months ago

Trinity-Large-Thinking

Collection

5 items • Updated Apr 10 • 32

upvoted 4 papers 4 months ago

InCoder-32B: Code Foundation Model for Industrial Scenarios

Paper • 2603.16790 • Published Mar 17 • 312

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

Paper • 2603.13364 • Published Mar 9 • 9

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published Mar 11 • 13

Mixture of Attention Heads: Selecting Attention Heads Per Token

Paper • 2210.05144 • Published Oct 11, 2022 • 3

ct2

AI & ML interests

Recent Activity

Organizations

ct-2's activity