stereoplegic 's Collections Approximation
updated
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Paper
• 2211.04076
• Published
• 1
Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Paper
• 2109.06762
• Published
• 1
COMCAT: Towards Efficient Compression and Customization of
Attention-Based Vision Models
Paper
• 2305.17235
• Published
• 2
Exploring Low Rank Training of Deep Neural Networks
Paper
• 2209.13569
• Published
• 1
Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator
Paper
• 2305.15099
• Published
• 1
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
Smaller and more Accurate NLP Models
Paper
• 2010.03688
• Published
• 1
Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition
Paper
• 2107.11442
• Published
• 1
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and
Accurate Deep Learning
Paper
• 2210.17357
• Published
• 1
Paper
• 2312.17244
• Published
• 9
Rethinking Compression: Reduced Order Modelling of Latent Features in
Large Language Models
Paper
• 2312.07046
• Published
• 15
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot
Compression
Paper
• 2309.14021
• Published
• 1
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Paper
• 2310.10700
• Published
• 1
The Truth is in There: Improving Reasoning in Language Models with
Layer-Selective Rank Reduction
Paper
• 2312.13558
• Published
• 5
NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning
Paper
• 2307.08941
• Published
• 1
Low-rank lottery tickets: finding efficient low-rank neural networks via
matrix differential equations
Paper
• 2205.13571
• Published
• 1
Trained Rank Pruning for Efficient Deep Neural Networks
Paper
• 1812.02402
• Published
• 1
TRP: Trained Rank Pruning for Efficient Deep Neural Networks
Paper
• 2004.14566
• Published
• 1
Factorization Vision Transformer: Modeling Long Range Dependency with
Local Window Cost
Paper
• 2312.08614
• Published
• 1
Learning Low-Rank Representations for Model Compression
Paper
• 2211.11397
• Published
• 1
Latent Space Factorisation and Manipulation via Matrix Subspace
Projection
Paper
• 1907.12385
• Published
• 1
Rethinking Attention with Performers
Paper
• 2009.14794
• Published
• 1
Softmax-free Linear Transformers
Paper
• 2207.03341
• Published
• 1
Generalization Bounds for Magnitude-Based Pruning via Sparse Matrix
Sketching
Paper
• 2305.18789
• Published
• 1
Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of
Weight Residuals
Paper
• 2305.18425
• Published
• 1
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
Network Models
Paper
• 2112.00029
• Published
• 1
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
Routing Policy
Paper
• 2310.01334
• Published
• 3
LoGAH: Predicting 774-Million-Parameter Transformers using Graph
HyperNetworks with 1/100 Parameters
Paper
• 2405.16287
• Published
• 11
Effectively Compress KV Heads for LLM
Paper
• 2406.07056
• Published
• 1
SVD-LLM: Truncation-aware Singular Value Decomposition for Large
Language Model Compression
Paper
• 2403.07378
• Published
• 4
On the Benefits of Rank in Attention Layers
Paper
• 2407.16153
• Published