Spaces:
Sleeping
Sleeping
| title: FastEmbed EN Embeddings | |
| emoji: 🚀 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| license: apache-2.0 | |
| # FastEmbed Code Embeddings Server | |
| CPU-optimized embedding server using **FastEmbed** with ONNX quantized models. | |
| ## Models | |
| Models: | |
| - Dense: BAAI/bge-base-en-v1.5 (768 dim) | |
| - Sparse: Qdrant/bm25 (BM25, 0.01GB) | |
| - Reranker: jinaai/jina-reranker-v1-turbo-en (0.13GB) | |
| **Total: ~0.78 GB** - Fits easily in CPU Basic (2 vCPU, 16GB RAM) | |
| ## API Endpoints | |
| ### Dense Embeddings | |
| ```bash | |
| curl -X POST https://YOUR_SPACE.hf.space/v1/embeddings \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": ["def hello(): pass", "class Foo: ..."], "model": "code-embed"}' | |
| ``` | |
| ### Sparse BM25 Embeddings | |
| ```bash | |
| curl -X POST https://YOUR_SPACE.hf.space/v1/sparse/embeddings \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": ["search query", "document text"]}' | |
| ``` | |
| ### Hybrid Search Embeddings | |
| ```bash | |
| curl -X POST https://YOUR_SPACE.hf.space/v1/hybrid/embeddings \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": ["code snippet"]}' | |
| ``` | |
| ### Reranking | |
| ```bash | |
| curl -X POST https://YOUR_SPACE.hf.space/v1/rerank \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"query": "python async function", "documents": ["doc1", "doc2", "doc3"]}' | |
| ``` | |
| ## Features | |
| - **ONNX Runtime**: Optimized CPU inference, no PyTorch overhead | |
| - **Model Caching**: Models loaded once, reused across requests | |
| - **Hybrid Search**: Dense + sparse (BM25) for better retrieval | |
| - **Code-Optimized**: `jina-embeddings-v2-base-code` specifically trained for code | |
| ## Performance | |
| Compared to PyTorch-based SentenceTransformers: | |
| - **5-10x faster** on CPU | |
| - **5x smaller** model footprint | |
| - **Lower latency**: ONNX quantization + caching |