Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings
Abstract
Epicure presents skip-gram ingredient embeddings trained on multilingual recipes, utilizing LLM-augmented normalization and three Metapath2Vec variants with different random-walk schemas to capture ingredient relationships.
We present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus. We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English, and normalise the raw ingredient strings to 1,790 canonical entries via an LLM-augmented pipeline. A 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph, 2,247 typed compound nodes across 15 categories, seed three Metapath2Vec variants that share architecture and hyperparameters and differ only in the random-walk schema: Cooc walks the co-occurrence graph only, Chem walks the typed compound metapaths only, and Core blends both via injected ingredient-ingredient walks at controlled mixing, placing each model at a distinct point on the chemistry-vs-recipe-context spectrum.
Get this paper in your agent:
hf papers read 2605.22391 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash