| ---
|
| language: en
|
| library_name: lf4
|
| license: mit
|
| pipeline_tag: sentence-similarity
|
| tags:
|
| - lf4
|
| - lf4-static-embedding
|
| - static-embedding
|
| - 4-bit
|
| - quantized
|
| - code-search
|
| - tool-search
|
| - embedding
|
| - codebase
|
| - semantic-search
|
| ---
|
| # Vortex-Embed-4.7M
|
|
|
| `Vortex-Embed-4.7M` is an ultra-lightweight, **4-bit quantized static sentence embedding model** designed for high-throughput semantic code search and tool retrieval. Delivering a 256-dimensional space within a **4.7 MB** footprint, the model completely bypasses heavy deep learning frameworks like PyTorch or Hugging Face Transformers, making it ideal for edge computing, local IDE plugins, and resource-constrained CLI tools.
|
|
|
| This model is deployed as the native, default embedder inside [**vortexa**](https://github.com/OEvortex/vortexa)βthe open-source AST-aware codebase indexing and semantic search engine.
|
|
|
| ---
|
|
|
| ## β‘ Key Highlights
|
| * **Zero Heavy Dependencies:** Built strictly on NumPy, Safetensors, and Tokenizers. No PyTorch, no execution graphs, no CUDA requirements.
|
| * **Aggressive Compression:** Compressed **6.4Γ** via LF4 block-quantization while retaining **99.69%** cosine similarity relative to the unquantized FP32 baseline.
|
| * **Blazing Fast Execution:** Sub-millisecond inference (~0.15ms per text string) with linear search scaling.
|
|
|
| ---
|
|
|
| ## π Performance Benchmarks
|
|
|
| ### Quantization Fidelity & Speed
|
| All metrics evaluated on a commodity x86 CPU baseline.
|
|
|
| | Metric | Target Value | Notes |
|
| | :--- | :--- | :--- |
|
| | **Cosine Preservation (vs FP32)** | `0.9969` | Near-zero degradation in vector geometry |
|
| | **Mean Squared Error (MSE)** | `0.257` | Absolute error tracking across the vocabulary |
|
| | **Inference Latency** | `~0.15ms` | Per single text encoding execution |
|
| | **Cold Boot / Load Time** | `~144ms` | Disk serialization to memory initialization |
|
| | **Local Search Latency** | `14.6ms` | P50 latency across 2,707 indexed code chunks |
|
| | **Tool Search Accuracy** | `100%` | 15/15 strict functional tool-intent matches |
|
|
|
| ### Architectural Efficiency Comparison
|
| Why choose a quantized static embedding over a traditional Transformer-based bi-encoder architecture?
|
|
|
| | Architectural Feature | Vortex-Embed-4.7M (Static) | BGE / BERT-Base (Transformer) |
|
| | :--- | :--- | :--- |
|
| | **Inference Latency** | **π 0.15ms** | ~50.0ms |
|
| | **Cold Start Latency** | **π 144ms** | ~5000ms |
|
| | **On-Disk Footprint** | **π 4.7 MB** | ~400+ MB |
|
| | **Hardware Prerequisite** | **Commodity CPU** | Dedicated GPU Highly Recommended |
|
| | **Domain Performance** | **Optimized for Code / Tools** | General Text Semantics |
|
|
|
| ---
|
|
|
| ## π οΈ Architecture & Quantization Details
|
|
|
| The model utilizes a learned token-to-embedding static matrix combined with custom **LF4 per-block quantization**. Sentences are processed via tokenization, sequential row-lookup with inline dequantization, mean pooling, and final L2 normalization.
|
|
|
| ### Structural Topology
|
| ```text
|
| vocab_size = 29,528 | dimensions = 256 | bits = 4 | block_size = 32
|
|
|
| ```
|
|
|
| ### Tensor Layout Matrix
|
|
|
| The underlying weights are stored safely inside a standard `.safetensors` dictionary container:
|
|
|
| | Tensor Target | Data Type | Dimensions / Shape | Functional Description |
|
| | --- | --- | --- | --- |
|
| | `embedding_packed` | `uint8` | `(29528, 128)` | 4-bit packed array space (stores two 4-bit values per byte) |
|
| | `embedding_scales` | `float16` | `(29528, 8)` | High-precision floating-point per-block scale multiplier |
|
| | `embedding_zeros` | `float16` | `(29528, 8)` | High-precision floating-point per-block zero-point offset |
|
|
|
| ---
|
|
|
| ## π Quickstart Installation & Usage
|
|
|
| ### Prerequisite Environment
|
|
|
| ```bash
|
| pip install numpy safetensors tokenizers
|
|
|
| ```
|
|
|
| ### 1. Seamless Codebase Indexing (Via `vortexa`)
|
|
|
| For turnkey directory indexing, search, and MCP support, use the official core engine:
|
|
|
| ```bash
|
| pip install vortexa
|
|
|
| ```
|
|
|
| ```python
|
| from vortexa.core.indexer import CodebaseIndexer
|
|
|
| # Native integration: vortexa resolves and loads Vortex-Embed-4.7M out of the box
|
| indexer = CodebaseIndexer(root='.')
|
| stats = indexer.index()
|
|
|
| # Execute high-speed vector retrieval across code chunks
|
| results = indexer.search('find CSV parser or file tokenizer', top_k=5)
|
|
|
| ```
|
|
|
| ### 2. Standalone Low-Level Inference (No Torch Pipeline)
|
|
|
| For custom applications or minimal CLI tools requiring zero framework overhead:
|
|
|
| ```python
|
| from lf4_model import LF4StaticEmbedding
|
|
|
| # Streamlined serialization layer
|
| model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')
|
|
|
| # Encode source text directly into normalized NumPy arrays
|
| embeddings = model.encode(['search the web', 'read file'])
|
|
|
| # High-performance analytical matrix search mapping
|
| scores, indices = model.search(query_emb, doc_emb, top_k=10)
|
|
|
| ```
|
|
|
| ### 3. Sentence-Transformers Framework Compatibility
|
|
|
| If you prefer running within standard ML pipelines, use the modern native static backend:
|
|
|
| ```bash
|
| pip install sentence-transformers
|
|
|
| ```
|
|
|
| ```python
|
| from sentence_transformers import SentenceTransformer
|
|
|
| # Load using the explicit static processing engine
|
| model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
|
| embeddings = model.encode(['search the web', 'read file'])
|
|
|
| ```
|
|
|
| ---
|
|
|
| ## π Citation & Attributions
|
|
|
| If you leverage this model or the `vortexa` engine in technical research, production environments, or industrial applications, please reference the repository utilizing the following BibTeX schema:
|
|
|
| ```bibtex
|
| @software{vortex-embed-4.7m,
|
| title = {Vortex-Embed-4.7M: High-Performance 4-Bit Static Embedding Topology},
|
| author = {VortexAI},
|
| year = {2025},
|
| url = {[https://huggingface.co/VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)}
|
| }
|
|
|
| ``` |