File size: 5,864 Bytes
44bcd16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ee431b
 
 
 
 
 
44bcd16
 
 
4ee431b
 
 
 
44bcd16
4ee431b
 
 
44bcd16
4ee431b
 
44bcd16
4ee431b
 
 
 
 
 
 
 
44bcd16
4ee431b
 
44bcd16
4ee431b
 
 
 
 
 
 
 
 
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
 
44bcd16
4ee431b
44bcd16
4ee431b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
 
44bcd16
 
4ee431b
 
 
44bcd16
 
4ee431b
44bcd16
 
 
4ee431b
 
 
 
 
 
44bcd16
4ee431b
 
 
44bcd16
 
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b
 
44bcd16
 
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b
 
 
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---

language: en
library_name: lf4
license: mit
pipeline_tag: sentence-similarity
tags:
- lf4
- lf4-static-embedding
- static-embedding
- 4-bit
- quantized
- code-search
- tool-search
- embedding
- codebase
- semantic-search
---

# Vortex-Embed-4.7M

`Vortex-Embed-4.7M` is an ultra-lightweight, **4-bit quantized static sentence embedding model** designed for high-throughput semantic code search and tool retrieval. Delivering a 256-dimensional space within a **4.7 MB** footprint, the model completely bypasses heavy deep learning frameworks like PyTorch or Hugging Face Transformers, making it ideal for edge computing, local IDE plugins, and resource-constrained CLI tools.

This model is deployed as the native, default embedder inside [**vortexa**](https://github.com/OEvortex/vortexa)β€”the open-source AST-aware codebase indexing and semantic search engine.

---

## ⚑ Key Highlights
* **Zero Heavy Dependencies:** Built strictly on NumPy, Safetensors, and Tokenizers. No PyTorch, no execution graphs, no CUDA requirements.
* **Aggressive Compression:** Compressed **6.4Γ—** via LF4 block-quantization while retaining **99.69%** cosine similarity relative to the unquantized FP32 baseline.
* **Blazing Fast Execution:** Sub-millisecond inference (~0.15ms per text string) with linear search scaling.

---

## πŸ“Š Performance Benchmarks

### Quantization Fidelity & Speed
All metrics evaluated on a commodity x86 CPU baseline.

| Metric | Target Value | Notes |
| :--- | :--- | :--- |
| **Cosine Preservation (vs FP32)** | `0.9969` | Near-zero degradation in vector geometry |
| **Mean Squared Error (MSE)** | `0.257` | Absolute error tracking across the vocabulary |
| **Inference Latency** | `~0.15ms` | Per single text encoding execution |
| **Cold Boot / Load Time** | `~144ms` | Disk serialization to memory initialization |
| **Local Search Latency** | `14.6ms` | P50 latency across 2,707 indexed code chunks |
| **Tool Search Accuracy** | `100%` | 15/15 strict functional tool-intent matches |

### Architectural Efficiency Comparison
Why choose a quantized static embedding over a traditional Transformer-based bi-encoder architecture?

| Architectural Feature | Vortex-Embed-4.7M (Static) | BGE / BERT-Base (Transformer) |
| :--- | :--- | :--- |
| **Inference Latency** | **πŸš€ 0.15ms** | ~50.0ms |
| **Cold Start Latency** | **πŸš€ 144ms** | ~5000ms |
| **On-Disk Footprint** | **πŸš€ 4.7 MB** | ~400+ MB |
| **Hardware Prerequisite** | **Commodity CPU** | Dedicated GPU Highly Recommended |
| **Domain Performance** | **Optimized for Code / Tools** | General Text Semantics |

---

## πŸ› οΈ Architecture & Quantization Details

The model utilizes a learned token-to-embedding static matrix combined with custom **LF4 per-block quantization**. Sentences are processed via tokenization, sequential row-lookup with inline dequantization, mean pooling, and final L2 normalization.

### Structural Topology
```text

vocab_size = 29,528 | dimensions = 256 | bits = 4 | block_size = 32



```

### Tensor Layout Matrix

The underlying weights are stored safely inside a standard `.safetensors` dictionary container:

| Tensor Target | Data Type | Dimensions / Shape | Functional Description |
| --- | --- | --- | --- |
| `embedding_packed` | `uint8` | `(29528, 128)` | 4-bit packed array space (stores two 4-bit values per byte) |
| `embedding_scales` | `float16` | `(29528, 8)` | High-precision floating-point per-block scale multiplier |
| `embedding_zeros` | `float16` | `(29528, 8)` | High-precision floating-point per-block zero-point offset |

---

## πŸš€ Quickstart Installation & Usage

### Prerequisite Environment

```bash

pip install numpy safetensors tokenizers



```

### 1. Seamless Codebase Indexing (Via `vortexa`)

For turnkey directory indexing, search, and MCP support, use the official core engine:

```bash

pip install vortexa



```

```python

from vortexa.core.indexer import CodebaseIndexer



# Native integration: vortexa resolves and loads Vortex-Embed-4.7M out of the box

indexer = CodebaseIndexer(root='.')

stats = indexer.index()



# Execute high-speed vector retrieval across code chunks

results = indexer.search('find CSV parser or file tokenizer', top_k=5)



```

### 2. Standalone Low-Level Inference (No Torch Pipeline)

For custom applications or minimal CLI tools requiring zero framework overhead:

```python

from lf4_model import LF4StaticEmbedding



# Streamlined serialization layer

model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')



# Encode source text directly into normalized NumPy arrays

embeddings = model.encode(['search the web', 'read file'])



# High-performance analytical matrix search mapping

scores, indices = model.search(query_emb, doc_emb, top_k=10)



```

### 3. Sentence-Transformers Framework Compatibility

If you prefer running within standard ML pipelines, use the modern native static backend:

```bash

pip install sentence-transformers



```

```python

from sentence_transformers import SentenceTransformer



# Load using the explicit static processing engine

model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')

embeddings = model.encode(['search the web', 'read file'])



```

---

## πŸ“œ Citation & Attributions

If you leverage this model or the `vortexa` engine in technical research, production environments, or industrial applications, please reference the repository utilizing the following BibTeX schema:

```bibtex

@software{vortex-embed-4.7m,

  title  = {Vortex-Embed-4.7M: High-Performance 4-Bit Static Embedding Topology},

  author = {VortexAI},

  year   = {2025},

  url    = {[https://huggingface.co/VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)}

}



```