Instructions to use google/switch-base-8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/switch-base-8 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("google/switch-base-8") model = AutoModelForMultimodalLM.from_pretrained("google/switch-base-8") - Notebooks
- Google Colab
- Kaggle
TemporalMesh Transformer: dynamic kNN graph attention + adaptive exit gates, 29.4 PPL at 48% compute
#6
by vigneshwar234 - opened
New open-source transformer architecture โ directly relevant to this repo
TMT achieves 29.4 PPL on WikiText-2 at 48% compute (โ30.2% vs vanilla, 120M params). Directly relevant to users comparing efficient attention and depth-adaptive architectures.
Five innovations: Mesh Attention (O(Sยทk) dynamic kNN), Temporal Decay (post-softmax multiplicative), Adaptive Exit Gate (per-token depth routing, avg 5.76/12 layers), Dual-Stream FFN, EMA Memory Anchors.
vs. models in this category:
- Beats Mamba: 29.4 vs 31.8 PPL, same 120M params
- Beats Longformer: 29.4 vs 39.6 PPL, same compute class
- LongBench: 53.4 vs 51.3 Mamba
๐ Paper (DOI 10.5281/zenodo.20287197): https://zenodo.org/records/20287390
๐ป Code + 226 tests: https://github.com/vignesh2027/TemporalMesh-Transformer
๐ฎ Live demo: https://huggingface.co/spaces/vigneshwar234/TemporalMesh-Transformer-Demo