--- title: FastEmbed EN Embeddings emoji: 🚀 colorFrom: blue colorTo: green sdk: docker pinned: false license: apache-2.0 --- # FastEmbed Code Embeddings Server CPU-optimized embedding server using **FastEmbed** with ONNX quantized models. ## Models Models: - Dense: BAAI/bge-base-en-v1.5 (768 dim) - Sparse: Qdrant/bm25 (BM25, 0.01GB) - Reranker: jinaai/jina-reranker-v1-turbo-en (0.13GB) **Total: ~0.78 GB** - Fits easily in CPU Basic (2 vCPU, 16GB RAM) ## API Endpoints ### Dense Embeddings ```bash curl -X POST https://YOUR_SPACE.hf.space/v1/embeddings \ -H "Content-Type: application/json" \ -d '{"input": ["def hello(): pass", "class Foo: ..."], "model": "code-embed"}' ``` ### Sparse BM25 Embeddings ```bash curl -X POST https://YOUR_SPACE.hf.space/v1/sparse/embeddings \ -H "Content-Type: application/json" \ -d '{"input": ["search query", "document text"]}' ``` ### Hybrid Search Embeddings ```bash curl -X POST https://YOUR_SPACE.hf.space/v1/hybrid/embeddings \ -H "Content-Type: application/json" \ -d '{"input": ["code snippet"]}' ``` ### Reranking ```bash curl -X POST https://YOUR_SPACE.hf.space/v1/rerank \ -H "Content-Type: application/json" \ -d '{"query": "python async function", "documents": ["doc1", "doc2", "doc3"]}' ``` ## Features - **ONNX Runtime**: Optimized CPU inference, no PyTorch overhead - **Model Caching**: Models loaded once, reused across requests - **Hybrid Search**: Dense + sparse (BM25) for better retrieval - **Code-Optimized**: `jina-embeddings-v2-base-code` specifically trained for code ## Performance Compared to PyTorch-based SentenceTransformers: - **5-10x faster** on CPU - **5x smaller** model footprint - **Lower latency**: ONNX quantization + caching