PipeOwl
Collection
A transformer-free semantic retrieval engine. • 6 items • Updated
PipeOwl is a transformer-free geometric embedding package built on a static embedding field stored as NumPy arrays.
This repo provides:
L1_base_embeddings.npy: float32 (V, 1024) embedding table (unit-normalized)L1_base_vocab.json: list of vocab strings aligned to embedding rowsdelta_base_scalar.npy: float32 (V,) optional scalar bias fieldengine.py) and usage script (quickstart.py)The base embedding vectors were generated using BGE (Apache-2.0) via inference (model outputs). This repository does not redistribute any original BGE model weights.
pip install numpy
python quickstart.py
Or minimal usage:
from engine import PipeOwlEngine, PipeOwlConfig
engine = PipeOwlEngine(PipeOwlConfig())
q = engine.encode("雪鴞好可愛")
# use q for similarity / retrieval
Files
Notes
No safetensors / pytorch_model.bin is included because this model is distributed as a static NumPy embedding field.
~165M embedding parameters (static matrix)
| Model | in-domain MRR@10 | OOD MRR@10 |
|---|---|---|
| MiniLM | 0.019 | 0.026 |
| BGE | 0.026 | 0.009 |
| PipeOwl | 0.013 | 0.023 |
Note: This test uses a harder corpus and adversarial-style queries. Absolute scores are low due to difficulty scaling.
See full experimental notes here: https://hackmd.io/@galaxy4552/BkpUEnTwbl
pipeowl/
│
├─ README.md
├─ LICENSE
│
├─ engine.py
├─ quickstart.py
│
└─ data/
├─ L1_base_embeddings.npy
├─ delta_base_scalar.npy
└─ L1_base_vocab.json