README.md · VTXAI/Vortex-Embed-4.7M at main

Vortex-Embed-4.7M / README.md

Abhaykoul

Update README.md

4ee431b verified 15 days ago

preview code

raw

history blame contribute delete

5.86 kB

	---
	language: en
	library_name: lf4
	license: mit
	pipeline_tag: sentence-similarity
	tags:
	- lf4
	- lf4-static-embedding
	- static-embedding
	- 4-bit
	- quantized
	- code-search
	- tool-search
	- embedding
	- codebase
	- semantic-search
	---
	# Vortex-Embed-4.7M

	`Vortex-Embed-4.7M` is an ultra-lightweight, 4-bit quantized static sentence embedding model designed for high-throughput semantic code search and tool retrieval. Delivering a 256-dimensional space within a 4.7 MB footprint, the model completely bypasses heavy deep learning frameworks like PyTorch or Hugging Face Transformers, making it ideal for edge computing, local IDE plugins, and resource-constrained CLI tools.

	This model is deployed as the native, default embedder inside [vortexa](https://github.com/OEvortex/vortexa)—the open-source AST-aware codebase indexing and semantic search engine.

	---

	## ⚡ Key Highlights
	* Zero Heavy Dependencies: Built strictly on NumPy, Safetensors, and Tokenizers. No PyTorch, no execution graphs, no CUDA requirements.
	* Aggressive Compression: Compressed 6.4× via LF4 block-quantization while retaining 99.69% cosine similarity relative to the unquantized FP32 baseline.
	* Blazing Fast Execution: Sub-millisecond inference (~0.15ms per text string) with linear search scaling.

	---

	## 📊 Performance Benchmarks

	### Quantization Fidelity & Speed
	All metrics evaluated on a commodity x86 CPU baseline.

	\| Metric \| Target Value \| Notes \|
	\| :--- \| :--- \| :--- \|
	\| Cosine Preservation (vs FP32) \| `0.9969` \| Near-zero degradation in vector geometry \|
	\| Mean Squared Error (MSE) \| `0.257` \| Absolute error tracking across the vocabulary \|
	\| Inference Latency \| `~0.15ms` \| Per single text encoding execution \|
	\| Cold Boot / Load Time \| `~144ms` \| Disk serialization to memory initialization \|
	\| Local Search Latency \| `14.6ms` \| P50 latency across 2,707 indexed code chunks \|
	\| Tool Search Accuracy \| `100%` \| 15/15 strict functional tool-intent matches \|

	### Architectural Efficiency Comparison
	Why choose a quantized static embedding over a traditional Transformer-based bi-encoder architecture?

	\| Architectural Feature \| Vortex-Embed-4.7M (Static) \| BGE / BERT-Base (Transformer) \|
	\| :--- \| :--- \| :--- \|
	\| Inference Latency \| 🚀 0.15ms \| ~50.0ms \|
	\| Cold Start Latency \| 🚀 144ms \| ~5000ms \|
	\| On-Disk Footprint \| 🚀 4.7 MB \| ~400+ MB \|
	\| Hardware Prerequisite \| Commodity CPU \| Dedicated GPU Highly Recommended \|
	\| Domain Performance \| Optimized for Code / Tools \| General Text Semantics \|

	---

	## 🛠️ Architecture & Quantization Details

	The model utilizes a learned token-to-embedding static matrix combined with custom LF4 per-block quantization. Sentences are processed via tokenization, sequential row-lookup with inline dequantization, mean pooling, and final L2 normalization.

	### Structural Topology
	```text
	vocab_size = 29,528 \| dimensions = 256 \| bits = 4 \| block_size = 32

	```

	### Tensor Layout Matrix

	The underlying weights are stored safely inside a standard `.safetensors` dictionary container:

	\| Tensor Target \| Data Type \| Dimensions / Shape \| Functional Description \|
	\| --- \| --- \| --- \| --- \|
	\| `embedding_packed` \| `uint8` \| `(29528, 128)` \| 4-bit packed array space (stores two 4-bit values per byte) \|
	\| `embedding_scales` \| `float16` \| `(29528, 8)` \| High-precision floating-point per-block scale multiplier \|
	\| `embedding_zeros` \| `float16` \| `(29528, 8)` \| High-precision floating-point per-block zero-point offset \|

	---

	## 🚀 Quickstart Installation & Usage

	### Prerequisite Environment

	```bash
	pip install numpy safetensors tokenizers

	```

	### 1. Seamless Codebase Indexing (Via `vortexa`)

	For turnkey directory indexing, search, and MCP support, use the official core engine:

	```bash
	pip install vortexa

	```

	```python
	from vortexa.core.indexer import CodebaseIndexer

	# Native integration: vortexa resolves and loads Vortex-Embed-4.7M out of the box
	indexer = CodebaseIndexer(root='.')
	stats = indexer.index()

	# Execute high-speed vector retrieval across code chunks
	results = indexer.search('find CSV parser or file tokenizer', top_k=5)

	```

	### 2. Standalone Low-Level Inference (No Torch Pipeline)

	For custom applications or minimal CLI tools requiring zero framework overhead:

	```python
	from lf4_model import LF4StaticEmbedding

	# Streamlined serialization layer
	model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')

	# Encode source text directly into normalized NumPy arrays
	embeddings = model.encode(['search the web', 'read file'])

	# High-performance analytical matrix search mapping
	scores, indices = model.search(query_emb, doc_emb, top_k=10)

	```

	### 3. Sentence-Transformers Framework Compatibility

	If you prefer running within standard ML pipelines, use the modern native static backend:

	```bash
	pip install sentence-transformers

	```

	```python
	from sentence_transformers import SentenceTransformer

	# Load using the explicit static processing engine
	model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
	embeddings = model.encode(['search the web', 'read file'])

	```

	---

	## 📜 Citation & Attributions

	If you leverage this model or the `vortexa` engine in technical research, production environments, or industrial applications, please reference the repository utilizing the following BibTeX schema:

	```bibtex
	@software{vortex-embed-4.7m,
	title = {Vortex-Embed-4.7M: High-Performance 4-Bit Static Embedding Topology},
	author = {VortexAI},
	year = {2025},
	url = {[https://huggingface.co/VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)}
	}

	```