Interpretability - a Shafagh99 Collection

Shafagh99 's Collections

Interpretability

LLM Distillation

Interpretability

updated Feb 23

Research in LM interpretability

From Understanding to Utilization: A Survey on Explainability for Large Language Models

Paper • 2401.12874 • Published Jan 23, 2024 • 4
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP

Paper • 2406.12618 • Published Jun 18, 2024 • 5
Rethinking Interpretability in the Era of Large Language Models

Paper • 2402.01761 • Published Jan 30, 2024 • 23
A Comprehensive Guide to Explainable AI: From Classical Models to LLMs

Paper • 2412.00800 • Published Dec 1, 2024 • 1
A Primer on the Inner Workings of Transformer-based Language Models

Paper • 2405.00208 • Published Apr 30, 2024 • 12
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

Paper • 2407.19200 • Published Jul 27, 2024 • 1
Mixture of Experts Made Intrinsically Interpretable

Paper • 2503.07639 • Published Mar 5, 2025 • 10
A Survey on Mixture of Experts

Paper • 2407.06204 • Published Jun 26, 2024 • 1
A Survey on Neural Network Interpretability

Paper • 2012.14261 • Published Dec 28, 2020
A Comprehensive Survey on Self-Interpretable Neural Networks

Paper • 2501.15638 • Published Jan 26, 2025 • 2
REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization

Paper • 2310.14418 • Published Oct 22, 2023 • 1
ERASER: A Benchmark to Evaluate Rationalized NLP Models

Paper • 1911.03429 • Published Nov 8, 2019
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Paper • 2402.04614 • Published Feb 7, 2024 • 3