pianistprogrammer
/

abc2vec

+---
+license: mit
+library_name: pytorch
+tags:
+- music
+- folk-music
+- irish-traditional-music
+- abc-notation
+- symbolic-music
+- representation-learning
+- self-supervised
+- transformer
+pipeline_tag: feature-extraction
+---
+# ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music
+This is the official pre-trained ABC2Vec model from the paper:
+**"ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music"**
+## Model Description
+ABC2Vec is a self-supervised Transformer encoder that learns dense, semantically meaningful embeddings from ABC notation (symbolic music format). It is specifically designed for Irish traditional folk music and trained on 211,524 tunes.
+### Key Features
+- 🎵 **Purpose-built for folk music** - Addresses transposition equivalence, modal tonality, and variant detection
+- 🔄 **Transposition Invariance** - Novel TI objective for pitch-invariant representations
+- 📊 **Bar-level Patchification** - 16× sequence length reduction for efficiency
+- 🎯 **Self-supervised** - No text annotations or audio required
+- ⚡ **Efficient** - Trained in 18 hours on Apple M4 Mac
+## Model Architecture
+- **Layers:** 6
+- **Hidden Size (d_model):** 256
+- **Attention Heads:** 8
+- **FFN Size (d_ff):** 1024
+- **Embedding Size:** 128
+- **Vocabulary Size:** 98
+- **Max Bars:** 64
+- **Max Bar Length:** 64
+- **Parameters:** ~5M
+## Training Details
+- **Dataset:** 211,524 Irish traditional tunes (IrishMAN corpus)
+- **Training Objectives:**
+  - Masked Music Modeling (MMM)
+  - Transposition Invariance (TI) contrastive learning
+- **Training Steps:** 40,000 steps (40 epochs)
+- **Final Validation Loss:** 2.36
+- **Hardware:** Apple M4 Mac (48GB unified memory)
+- **Training Time:** ~18 hours
+## Performance
+| Task | Accuracy | Notes |
+|------|----------|-------|
+| Tune Type Classification | 78.4% ± 1.2% | 6 classes (jig, reel, polka, etc.) |
+| Mode Classification | 78.8% ± 1.6% | 4 classes (major, minor, dorian, mixolydian) |
+| Key Root (Linear Probe) | 62.3% ± 0.9% | 8 most common keys |
+| Tune Length (Linear Probe) | 89.5% ± 0.7% | 3 classes (short, medium, long) |
+## Usage
+```python
+import torch
+import json
+from pathlib import Path
+# Load model configuration
+config_path = "model_config.json"
+with open(config_path) as f:
+    config_dict = json.load(f)
+# Initialize model (you'll need the ABC2Vec model code)
+from abc2vec.core.model import ABC2VecModel
+from abc2vec.core.model.encoder import ABC2VecConfig
+config = ABC2VecConfig(**config_dict)
+model = ABC2VecModel(config)
+# Load pre-trained weights
+checkpoint = torch.load("best_model.pt", map_location="cpu")
+model.load_state_dict(checkpoint["model_state_dict"])
+model.eval()
+# Load vocabulary for tokenization
+with open("vocab.json") as f:
+    vocab_data = json.load(f)
+# Extract embeddings for a tune
+from abc2vec.core.tokenizer import ABCVocabulary, BarPatchifier
+vocab = ABCVocabulary.load("vocab.json")
+patchifier = BarPatchifier(
+    vocab=vocab,
+    max_bars=config.max_bars,
+    max_bar_length=config.max_bar_length
+)
+# Example ABC tune
+abc_tune = "M:6/8\nK:D\n|:A2A ABc|ded cBA|A2A ABc|ded cAG|"
+patches = patchifier.patchify(abc_tune)
+# Get embedding
+with torch.no_grad():
+    bar_indices = patches["bar_indices"].unsqueeze(0)
+    char_mask = patches["char_mask"].unsqueeze(0)
+    bar_mask = patches["bar_mask"].unsqueeze(0)
+    embedding = model.get_embedding(bar_indices, char_mask, bar_mask)
+    # embedding shape: (1, 128)
+```
+## Code Repository
+Full training code, evaluation scripts, and usage examples:
+- **GitHub:** https://github.com/pianistprogrammer/ABC2VEC
+## Dataset
+The processed dataset with train/validation/test splits:
+- **HuggingFace:** https://huggingface.co/datasets/pianistprogrammer/abc2vec-irish-folk-dataset
+## Citation
+If you use this model, please cite:
+```bibtex
+@article{abc2vec2025,
+  title={ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music},
+  author={[Your Name]},
+  journal={[Journal Name]},
+  year={2025},
+  note={Model: https://huggingface.co/pianistprogrammer/abc2vec-model}
+}
+```
+## License
+MIT License
+## Acknowledgements
+We thank The Session community for curating and maintaining the Irish traditional music archive that made this work possible.
+## Model Card Authors
+[Your Name]
+## Contact
+For questions or issues:
+- GitHub: https://github.com/pianistprogrammer/ABC2VEC
+- HuggingFace: https://huggingface.co/pianistprogrammer