README.md · VTXAI/Vortex-Embed-4.7M at main

File size: 5,864 Bytes

44bcd16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ee431b
 
 
 
 
 
44bcd16
 
 
4ee431b
 
 
 
44bcd16
4ee431b
 
 
44bcd16
4ee431b
 
44bcd16
4ee431b
 
 
 
 
 
 
 
44bcd16
4ee431b
 
44bcd16
4ee431b
 
 
 
 
 
 
 
 
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
 
44bcd16
4ee431b
44bcd16
4ee431b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
 
44bcd16
 
4ee431b
 
 
44bcd16
 
4ee431b
44bcd16
 
 
4ee431b
 
 
 
 
 
44bcd16
4ee431b
 
 
44bcd16
 
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b
 
44bcd16
 
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b
 
 
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
44bcd16
4ee431b
 
44bcd16
4ee431b

---

language: en
library_name: lf4
license: mit
pipeline_tag: sentence-similarity
tags:
- lf4
- lf4-static-embedding
- static-embedding
- 4-bit
- quantized
- code-search
- tool-search
- embedding
- codebase
- semantic-search
---

# Vortex-Embed-4.7M

`Vortex-Embed-4.7M` is an ultra-lightweight, **4-bit quantized static sentence embedding model** designed for high-throughput semantic code search and tool retrieval. Delivering a 256-dimensional space within a **4.7 MB** footprint, the model completely bypasses heavy deep learning frameworks like PyTorch or Hugging Face Transformers, making it ideal for edge computing, local IDE plugins, and resource-constrained CLI tools.

This model is deployed as the native, default embedder inside [**vortexa**](https://github.com/OEvortex/vortexa)—the open-source AST-aware codebase indexing and semantic search engine.

---

## ⚡ Key Highlights
* **Zero Heavy Dependencies:** Built strictly on NumPy, Safetensors, and Tokenizers. No PyTorch, no execution graphs, no CUDA requirements.
* **Aggressive Compression:** Compressed **6.4×** via LF4 block-quantization while retaining **99.69%** cosine similarity relative to the unquantized FP32 baseline.
* **Blazing Fast Execution:** Sub-millisecond inference (~0.15ms per text string) with linear search scaling.

---

## 📊 Performance Benchmarks

### Quantization Fidelity & Speed
All metrics evaluated on a commodity x86 CPU baseline.

| Metric | Target Value | Notes |
| :--- | :--- | :--- |
| **Cosine Preservation (vs FP32)** | `0.9969` | Near-zero degradation in vector geometry |
| **Mean Squared Error (MSE)** | `0.257` | Absolute error tracking across the vocabulary |
| **Inference Latency** | `~0.15ms` | Per single text encoding execution |
| **Cold Boot / Load Time** | `~144ms` | Disk serialization to memory initialization |
| **Local Search Latency** | `14.6ms` | P50 latency across 2,707 indexed code chunks |
| **Tool Search Accuracy** | `100%` | 15/15 strict functional tool-intent matches |

### Architectural Efficiency Comparison
Why choose a quantized static embedding over a traditional Transformer-based bi-encoder architecture?

| Architectural Feature | Vortex-Embed-4.7M (Static) | BGE / BERT-Base (Transformer) |
| :--- | :--- | :--- |
| **Inference Latency** | **🚀 0.15ms** | ~50.0ms |
| **Cold Start Latency** | **🚀 144ms** | ~5000ms |
| **On-Disk Footprint** | **🚀 4.7 MB** | ~400+ MB |
| **Hardware Prerequisite** | **Commodity CPU** | Dedicated GPU Highly Recommended |
| **Domain Performance** | **Optimized for Code / Tools** | General Text Semantics |

---

## 🛠️ Architecture & Quantization Details

The model utilizes a learned token-to-embedding static matrix combined with custom **LF4 per-block quantization**. Sentences are processed via tokenization, sequential row-lookup with inline dequantization, mean pooling, and final L2 normalization.

### Structural Topology
```text

vocab_size = 29,528 | dimensions = 256 | bits = 4 | block_size = 32



```

### Tensor Layout Matrix

The underlying weights are stored safely inside a standard `.safetensors` dictionary container:

| Tensor Target | Data Type | Dimensions / Shape | Functional Description |
| --- | --- | --- | --- |
| `embedding_packed` | `uint8` | `(29528, 128)` | 4-bit packed array space (stores two 4-bit values per byte) |
| `embedding_scales` | `float16` | `(29528, 8)` | High-precision floating-point per-block scale multiplier |
| `embedding_zeros` | `float16` | `(29528, 8)` | High-precision floating-point per-block zero-point offset |

---

## 🚀 Quickstart Installation & Usage

### Prerequisite Environment

```bash

pip install numpy safetensors tokenizers



```

### 1. Seamless Codebase Indexing (Via `vortexa`)

For turnkey directory indexing, search, and MCP support, use the official core engine:

```bash

pip install vortexa



```

```python

from vortexa.core.indexer import CodebaseIndexer



# Native integration: vortexa resolves and loads Vortex-Embed-4.7M out of the box

indexer = CodebaseIndexer(root='.')

stats = indexer.index()



# Execute high-speed vector retrieval across code chunks

results = indexer.search('find CSV parser or file tokenizer', top_k=5)



```

### 2. Standalone Low-Level Inference (No Torch Pipeline)

For custom applications or minimal CLI tools requiring zero framework overhead:

```python

from lf4_model import LF4StaticEmbedding



# Streamlined serialization layer

model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')



# Encode source text directly into normalized NumPy arrays

embeddings = model.encode(['search the web', 'read file'])



# High-performance analytical matrix search mapping

scores, indices = model.search(query_emb, doc_emb, top_k=10)



```

### 3. Sentence-Transformers Framework Compatibility

If you prefer running within standard ML pipelines, use the modern native static backend:

```bash

pip install sentence-transformers



```

```python

from sentence_transformers import SentenceTransformer



# Load using the explicit static processing engine

model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')

embeddings = model.encode(['search the web', 'read file'])



```

---

## 📜 Citation & Attributions

If you leverage this model or the `vortexa` engine in technical research, production environments, or industrial applications, please reference the repository utilizing the following BibTeX schema:

```bibtex

@software{vortex-embed-4.7m,

  title  = {Vortex-Embed-4.7M: High-Performance 4-Bit Static Embedding Topology},

  author = {VortexAI},

  year   = {2025},

  url    = {[https://huggingface.co/VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)}

}



```