Use it from Swift
Add the package
Package.swift:
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),
// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
Platforms: iOS 18+ / macOS 15+.
Download + encode
import CoreMLLLM
let modelsDir = try FileManager.default.url(
for: .applicationSupportDirectory, in: .userDomainMask,
appropriateFor: nil, create: true)
let eg = try await EmbeddingGemma.downloadAndLoad(modelsDir: modelsDir)
// 768-dim L2-normalised embedding
let v = try eg.encode(text: "How do I list files in Swift?")
// Matryoshka: cheap-to-truncate dims (768 / 512 / 256 / 128)
let v256 = try eg.encode(text: "How do I list files in Swift?",
dim: 256)
// Task-prefixed (RAG document vs. query)
let q = try eg.encode(text: "list files",
task: .retrievalQuery)
let d = try eg.encode(text: "Use FileManager.contentsOfDirectory(...)",
task: .retrievalDocument)
See Gemma3EmbeddingGemma.swift
for task prefixes and dim list.
EmbeddingGemma-300M for Apple CoreML (ANE-optimized)
CoreML conversion of google/embeddinggemma-300m produced with the
CoreML-LLM pipeline. Targets
iOS 26 / macOS 26.
What's in this repo
| File | Notes |
|---|---|
encoder.mlmodelc/ |
Compiled stateless bidirectional encoder (fp16, 588 MB) |
model_config.json |
I/O contract, Matryoshka dims, task prefixes |
hf_model/ |
Tokenizer files |
ANE residency
99.80% on Apple Neural Engine (1950/1954 dispatched ops, verified via
MLComputePlan on macOS 26). Achieved by:
- residual-stream rescaling (semantic-preserving fp16 fit)
- fp16-safe L2 normalize (divide by max-abs first to keep
sum(x²)bounded) - iOS 26 deployment target
Use it
Via the CoreML-LLM Swift package:
import CoreMLLLM
let bundleURL = try await Gemma3BundleDownloader.download(
.embeddingGemma300m, into: appSupportDir)
let eg = try await EmbeddingGemma.load(bundleURL: bundleURL)
let vec = try eg.encode(text: "On-device embeddings",
task: .retrievalQuery,
dim: 768) // or 512 / 256 / 128 (Matryoshka)
I/O contract:
input_ids (1, 128) int32,attention_mask (1, 128) fp16(1.0 valid, 0.0 pad)embedding (1, 768) fp16— L2 unit norm; truncate the trailing dim and re-normalize for Matryoshka 512 / 256 / 128
The bundle in this repo is built for max_seq_len=128. For longer inputs,
re-run python conversion/build_embeddinggemma_bundle.py --max-seq-len 2048.
Sanity check
cosine("cat sat on mat", "feline rested on rug") = 0.7345 (high — similar)
cosine("cat sat on mat", "quantum mechanics") = 0.4650 (low — different)
License
Inherits Google's Gemma terms of use.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for mlboydaisuke/embeddinggemma-300m-coreml
Base model
google/embeddinggemma-300m