quant
updated
SqueezeLLM: Dense-and-Sparse Quantization
Paper
• 2306.07629
• Published
• 4
Norm Tweaking: High-performance Low-bit Quantization of Large Language
Models
Paper
• 2309.02784
• Published
• 2
Extreme Compression of Large Language Models via Additive Quantization
Paper
• 2401.06118
• Published
• 14
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published
• 50
OneBit: Towards Extremely Low-bit Large Language Models
Paper
• 2402.11295
• Published
• 24
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for
Large Language Models
Paper
• 2402.14866
• Published
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Paper
• 2403.02775
• Published
• 13
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
• 2402.15319
• Published
• 22
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization
Paper
• 2403.07134
• Published
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and
Inference of Large Language Models
Paper
• 2306.02272
• Published
QuantEase: Optimization-based Quantization for Language Models
Paper
• 2309.01885
• Published
• 4
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published
• 73
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via
decoupling Parameters into Integer and Floating Points
Paper
• 2404.12759
• Published
SpinQuant: LLM quantization with learned rotations
Paper
• 2405.16406
• Published
• 2
Rotation and Permutation for Advanced Outlier Management and Efficient
Quantization of LLMs
Paper
• 2406.01721
• Published
Attention-aware Post-training Quantization without Backpropagation
Paper
• 2406.13474
• Published
• 1
Accuracy is Not All You Need
Paper
• 2407.09141
• Published
• 3
FlatQuant: Flatness Matters for LLM Quantization
Paper
• 2410.09426
• Published
• 15