Update README.md
Browse files
README.md
CHANGED
|
@@ -14,110 +14,149 @@ tags:
|
|
| 14 |
- embedding
|
| 15 |
- codebase
|
| 16 |
- semantic-search
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
---
|
| 19 |
|
| 20 |
-
#
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
|
|
|
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
| FP32 (original) | 28.8 MB | 1.0x |
|
| 31 |
-
| **LF4 (this model)** | **4.7 MB** | **6.4x** |
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
-
|
| 38 |
-
vocab=29528 dim=256 bits=4 block_size=32 size=4.7MB
|
| 39 |
-
`
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
-
###
|
| 54 |
|
| 55 |
-
|
|
|
|
|
|
|
| 56 |
pip install vortexa
|
| 57 |
-
`
|
| 58 |
|
| 59 |
-
`
|
|
|
|
|
|
|
| 60 |
from vortexa.core.indexer import CodebaseIndexer
|
| 61 |
|
| 62 |
-
# vortexa
|
| 63 |
indexer = CodebaseIndexer(root='.')
|
| 64 |
stats = indexer.index()
|
| 65 |
-
results = indexer.search('find CSV parser', top_k=5)
|
| 66 |
-
`
|
| 67 |
|
| 68 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
|
|
|
|
|
|
|
| 71 |
from lf4_model import LF4StaticEmbedding
|
| 72 |
|
|
|
|
| 73 |
model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')
|
|
|
|
|
|
|
| 74 |
embeddings = model.encode(['search the web', 'read file'])
|
|
|
|
|
|
|
| 75 |
scores, indices = model.search(query_emb, doc_emb, top_k=10)
|
| 76 |
-
`
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
-
from sentence_transformers import SentenceTransformer
|
| 82 |
-
model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
|
| 83 |
-
embeddings = model.encode(['search the web', 'read file'])
|
| 84 |
-
`
|
| 85 |
|
| 86 |
-
|
| 87 |
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
| Cosine preservation vs FP32 | 0.9969 |
|
| 91 |
-
| MSE | 0.257 |
|
| 92 |
-
| Tool search accuracy | 100% (15/15) |
|
| 93 |
-
| Inference speed | ~0.15ms per text |
|
| 94 |
-
| Load time | ~144ms |
|
| 95 |
-
| Search (P50, 2707 chunks) | 14.6ms |
|
| 96 |
|
| 97 |
-
|
| 98 |
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
| Inference | **0.15ms** | ~50ms |
|
| 102 |
-
| Load time | **144ms** | ~5s |
|
| 103 |
-
| Disk | **4.7 MB** | ~400 MB |
|
| 104 |
-
| GPU | **No** | Recommended |
|
| 105 |
-
| Accuracy | Comparable | Higher (complex semantics) |
|
| 106 |
|
| 107 |
-
|
|
|
|
|
|
|
| 108 |
|
| 109 |
-
|
| 110 |
|
| 111 |
-
|
| 112 |
|
| 113 |
-
|
| 114 |
|
| 115 |
-
|
| 116 |
|
| 117 |
-
bibtex
|
| 118 |
@software{vortex-embed-4.7m,
|
| 119 |
-
title
|
| 120 |
author = {VortexAI},
|
| 121 |
-
year
|
| 122 |
-
url
|
| 123 |
}
|
|
|
|
|
|
|
|
|
| 14 |
- embedding
|
| 15 |
- codebase
|
| 16 |
- semantic-search
|
| 17 |
+
---
|
| 18 |
+
# Vortex-Embed-4.7M
|
| 19 |
+
|
| 20 |
+
`Vortex-Embed-4.7M` is an ultra-lightweight, **4-bit quantized static sentence embedding model** designed for high-throughput semantic code search and tool retrieval. Delivering a 256-dimensional space within a **4.7 MB** footprint, the model completely bypasses heavy deep learning frameworks like PyTorch or Hugging Face Transformers, making it ideal for edge computing, local IDE plugins, and resource-constrained CLI tools.
|
| 21 |
+
|
| 22 |
+
This model is deployed as the native, default embedder inside [**vortexa**](https://github.com/OEvortex/vortexa)βthe open-source AST-aware codebase indexing and semantic search engine.
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
+
## β‘ Key Highlights
|
| 27 |
+
* **Zero Heavy Dependencies:** Built strictly on NumPy, Safetensors, and Tokenizers. No PyTorch, no execution graphs, no CUDA requirements.
|
| 28 |
+
* **Aggressive Compression:** Compressed **6.4Γ** via LF4 block-quantization while retaining **99.69%** cosine similarity relative to the unquantized FP32 baseline.
|
| 29 |
+
* **Blazing Fast Execution:** Sub-millisecond inference (~0.15ms per text string) with linear search scaling.
|
| 30 |
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## π Performance Benchmarks
|
| 34 |
|
| 35 |
+
### Quantization Fidelity & Speed
|
| 36 |
+
All metrics evaluated on a commodity x86 CPU baseline.
|
| 37 |
|
| 38 |
+
| Metric | Target Value | Notes |
|
| 39 |
+
| :--- | :--- | :--- |
|
| 40 |
+
| **Cosine Preservation (vs FP32)** | `0.9969` | Near-zero degradation in vector geometry |
|
| 41 |
+
| **Mean Squared Error (MSE)** | `0.257` | Absolute error tracking across the vocabulary |
|
| 42 |
+
| **Inference Latency** | `~0.15ms` | Per single text encoding execution |
|
| 43 |
+
| **Cold Boot / Load Time** | `~144ms` | Disk serialization to memory initialization |
|
| 44 |
+
| **Local Search Latency** | `14.6ms` | P50 latency across 2,707 indexed code chunks |
|
| 45 |
+
| **Tool Search Accuracy** | `100%` | 15/15 strict functional tool-intent matches |
|
| 46 |
|
| 47 |
+
### Architectural Efficiency Comparison
|
| 48 |
+
Why choose a quantized static embedding over a traditional Transformer-based bi-encoder architecture?
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
| Architectural Feature | Vortex-Embed-4.7M (Static) | BGE / BERT-Base (Transformer) |
|
| 51 |
+
| :--- | :--- | :--- |
|
| 52 |
+
| **Inference Latency** | **π 0.15ms** | ~50.0ms |
|
| 53 |
+
| **Cold Start Latency** | **π 144ms** | ~5000ms |
|
| 54 |
+
| **On-Disk Footprint** | **π 4.7 MB** | ~400+ MB |
|
| 55 |
+
| **Hardware Prerequisite** | **Commodity CPU** | Dedicated GPU Highly Recommended |
|
| 56 |
+
| **Domain Performance** | **Optimized for Code / Tools** | General Text Semantics |
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
|
| 60 |
+
## π οΈ Architecture & Quantization Details
|
| 61 |
|
| 62 |
+
The model utilizes a learned token-to-embedding static matrix combined with custom **LF4 per-block quantization**. Sentences are processed via tokenization, sequential row-lookup with inline dequantization, mean pooling, and final L2 normalization.
|
|
|
|
|
|
|
| 63 |
|
| 64 |
+
### Structural Topology
|
| 65 |
+
```text
|
| 66 |
+
vocab_size = 29,528 | dimensions = 256 | bits = 4 | block_size = 32
|
| 67 |
|
| 68 |
+
```
|
| 69 |
|
| 70 |
+
### Tensor Layout Matrix
|
| 71 |
+
|
| 72 |
+
The underlying weights are stored safely inside a standard `.safetensors` dictionary container:
|
| 73 |
+
|
| 74 |
+
| Tensor Target | Data Type | Dimensions / Shape | Functional Description |
|
| 75 |
+
| --- | --- | --- | --- |
|
| 76 |
+
| `embedding_packed` | `uint8` | `(29528, 128)` | 4-bit packed array space (stores two 4-bit values per byte) |
|
| 77 |
+
| `embedding_scales` | `float16` | `(29528, 8)` | High-precision floating-point per-block scale multiplier |
|
| 78 |
+
| `embedding_zeros` | `float16` | `(29528, 8)` | High-precision floating-point per-block zero-point offset |
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## π Quickstart Installation & Usage
|
| 83 |
+
|
| 84 |
+
### Prerequisite Environment
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
pip install numpy safetensors tokenizers
|
| 88 |
|
| 89 |
+
```
|
| 90 |
|
| 91 |
+
### 1. Seamless Codebase Indexing (Via `vortexa`)
|
| 92 |
|
| 93 |
+
For turnkey directory indexing, search, and MCP support, use the official core engine:
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
pip install vortexa
|
|
|
|
| 97 |
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
from vortexa.core.indexer import CodebaseIndexer
|
| 102 |
|
| 103 |
+
# Native integration: vortexa resolves and loads Vortex-Embed-4.7M out of the box
|
| 104 |
indexer = CodebaseIndexer(root='.')
|
| 105 |
stats = indexer.index()
|
|
|
|
|
|
|
| 106 |
|
| 107 |
+
# Execute high-speed vector retrieval across code chunks
|
| 108 |
+
results = indexer.search('find CSV parser or file tokenizer', top_k=5)
|
| 109 |
+
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
### 2. Standalone Low-Level Inference (No Torch Pipeline)
|
| 113 |
|
| 114 |
+
For custom applications or minimal CLI tools requiring zero framework overhead:
|
| 115 |
+
|
| 116 |
+
```python
|
| 117 |
from lf4_model import LF4StaticEmbedding
|
| 118 |
|
| 119 |
+
# Streamlined serialization layer
|
| 120 |
model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')
|
| 121 |
+
|
| 122 |
+
# Encode source text directly into normalized NumPy arrays
|
| 123 |
embeddings = model.encode(['search the web', 'read file'])
|
| 124 |
+
|
| 125 |
+
# High-performance analytical matrix search mapping
|
| 126 |
scores, indices = model.search(query_emb, doc_emb, top_k=10)
|
|
|
|
| 127 |
|
| 128 |
+
```
|
| 129 |
|
| 130 |
+
### 3. Sentence-Transformers Framework Compatibility
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
+
If you prefer running within standard ML pipelines, use the modern native static backend:
|
| 133 |
|
| 134 |
+
```bash
|
| 135 |
+
pip install sentence-transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
|
| 137 |
+
```
|
| 138 |
|
| 139 |
+
```python
|
| 140 |
+
from sentence_transformers import SentenceTransformer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
+
# Load using the explicit static processing engine
|
| 143 |
+
model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
|
| 144 |
+
embeddings = model.encode(['search the web', 'read file'])
|
| 145 |
|
| 146 |
+
```
|
| 147 |
|
| 148 |
+
---
|
| 149 |
|
| 150 |
+
## π Citation & Attributions
|
| 151 |
|
| 152 |
+
If you leverage this model or the `vortexa` engine in technical research, production environments, or industrial applications, please reference the repository utilizing the following BibTeX schema:
|
| 153 |
|
| 154 |
+
```bibtex
|
| 155 |
@software{vortex-embed-4.7m,
|
| 156 |
+
title = {Vortex-Embed-4.7M: High-Performance 4-Bit Static Embedding Topology},
|
| 157 |
author = {VortexAI},
|
| 158 |
+
year = {2025},
|
| 159 |
+
url = {[https://huggingface.co/VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)}
|
| 160 |
}
|
| 161 |
+
|
| 162 |
+
```
|