quant - a smpanaro Collection

smpanaro 's Collections

Apple Neural Engine LLMs

quant

updated Oct 20, 2024

SqueezeLLM: Dense-and-Sparse Quantization

Paper • 2306.07629 • Published Jun 13, 2023 • 4
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

Paper • 2309.02784 • Published Sep 6, 2023 • 2
Extreme Compression of Large Language Models via Additive Quantization

Paper • 2401.06118 • Published Jan 11, 2024 • 14
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17, 2024 • 24
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models

Paper • 2402.14866 • Published Feb 21, 2024
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

Paper • 2403.02775 • Published Mar 5, 2024 • 13
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23, 2024 • 22
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

Paper • 2403.07134 • Published Mar 11, 2024
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Paper • 2306.02272 • Published Jun 4, 2023
QuantEase: Optimization-based Quantization for Language Models

Paper • 2309.01885 • Published Sep 5, 2023 • 4
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 73
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

Paper • 2404.12759 • Published Apr 19, 2024
SpinQuant: LLM quantization with learned rotations

Paper • 2405.16406 • Published May 26, 2024 • 2
Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs

Paper • 2406.01721 • Published Jun 3, 2024
Attention-aware Post-training Quantization without Backpropagation

Paper • 2406.13474 • Published Jun 19, 2024 • 1
Accuracy is Not All You Need

Paper • 2407.09141 • Published Jul 12, 2024 • 3
FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published Oct 12, 2024 • 15