Instructions to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nimendraai/NuExtract-tiny-Resume-Data-Extractor")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("nimendraai/NuExtract-tiny-Resume-Data-Extractor")
model = AutoModelForMultimodalLM.from_pretrained("nimendraai/NuExtract-tiny-Resume-Data-Extractor")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="nimendraai/NuExtract-tiny-Resume-Data-Extractor",
	filename="NuExtract-tiny-v1.5.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Use Docker

docker model run hf.co/nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

LM Studio
Jan

vLLM

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nimendraai/NuExtract-tiny-Resume-Data-Extractor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nimendraai/NuExtract-tiny-Resume-Data-Extractor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

SGLang

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nimendraai/NuExtract-tiny-Resume-Data-Extractor" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nimendraai/NuExtract-tiny-Resume-Data-Extractor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nimendraai/NuExtract-tiny-Resume-Data-Extractor" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nimendraai/NuExtract-tiny-Resume-Data-Extractor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with Ollama:
```
ollama run hf.co/nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M
```

Unsloth Studio

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nimendraai/NuExtract-tiny-Resume-Data-Extractor to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nimendraai/NuExtract-tiny-Resume-Data-Extractor to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nimendraai/NuExtract-tiny-Resume-Data-Extractor to start chatting

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with Docker Model Runner:
```
docker model run hf.co/nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M
```

Lemonade

How to use nimendraai/NuExtract-tiny-Resume-Data-Extractor with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

Run and chat with the model

lemonade run user.NuExtract-tiny-Resume-Data-Extractor-Q4_K_M

List all available models

lemonade list

NuExtract-tiny-Resume-Data-Extractor

A fine-tuned version of numind/NuExtract-tiny-v1.5 (Qwen2.5-0.5B backbone) specialised for resume / CV structured extraction.

Given raw resume text in any format, the model returns a clean JSON object with name, contact details, skills, work experience, education, and other details — ready to plug into a hiring pipeline, ATS, or LangChain workflow.

Model Details

Property	Value
Base model	`numind/NuExtract-tiny-v1.5`
Backbone	Qwen2.5-0.5B
Total parameters	511,388,160
Trainable (LoRA)	17,596,416 (3.44%)
LoRA rank / alpha	r=32 / alpha=64
Quantisation	Q4_K_M GGUF (Ollama-ready)
Vocabulary size	151,665 (unchanged from base)
License	MIT

Training

Property	Value
Method	QLoRA via Unsloth
Dataset	3,000 synthetic resumes (generated)
Train / eval split	95% / 5% (2,850 / 150)
Packed sequences	1,125
Epochs	4
Total steps	284
Batch size	16 (2 per device × 8 grad accum)
Learning rate	2e-4 (cosine schedule, 14 warmup steps)
Hardware	1× NVIDIA Tesla T4 (Google Colab)
Training time	~24 minutes

Loss Curve

Step	Epoch	Train Loss	Val Loss
100	1.0	0.2355	0.2354
200	2.8	0.2298	0.2313
284	4.0	0.2276	0.2296

Near-zero train/val gap throughout — no overfitting observed. Best checkpoint (step 284, val loss 0.2296) loaded automatically.

Output Schema

{
  "name":         "string or null",
  "email":        "string or null",
  "phone":        "string or null",
  "website":      "string or null",
  "skills":       ["string"],
  "experience":   [{"title": "string", "company": "string", "duration": "string"}],
  "education":    [{"degree": "string", "institution": "string", "year": "string"}],
  "other_details": ["string"]
}

Missing scalar fields → null
Missing list fields → []
skills contains technical skills only — soft skills excluded
other_details captures certifications, languages, awards, publications

Inference Speed (Ollama, Tesla T4)

Metric	Value
Prompt eval	161 tokens in ~28ms
Generation	154 tokens in ~2,986ms
Total (typical resume)	~7.5 seconds
Throughput	~52 tokens/sec

Usage

Ollama (recommended)

Step 1 — Create Modelfile:

FROM hf.co/nimendraai/NuExtract-tiny-Resume-Data-Extractor:Q4_K_M

PARAMETER temperature 0
PARAMETER top_k 10
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
PARAMETER seed 42
PARAMETER num_ctx 2048
PARAMETER num_predict 600
PARAMETER stop "<|end-output|>"
PARAMETER stop "<|endoftext|>"

TEMPLATE """<|input|>
### Template:
{
    "name": "",
    "email": "",
    "phone": "",
    "website": "",
    "skills": [""],
    "experience": [{"title": "", "company": "", "duration": ""}],
    "education": [{"degree": "", "institution": "", "year": ""}],
    "other_details": [""]
}
### Text:
{{ .Prompt }}

<|output|>
"""

LICENSE """Apache License, Version 2.0 - http://www.apache.org/licenses/LICENSE-2.0"""

Step 2 — Create model:

ollama create agenthire-extractor -f Modelfile

Step 3 — Query:

curl http://localhost:11434/api/generate \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "agenthire-extractor",
    "format": "json",
    "stream": false,
    "prompt": "<resume text here>"
  }'

Always apply brace-counting extraction on the response value — see Python helper below.

Python (transformers)

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nimendraai/NuExtract-tiny-Resume-Data-Extractor"
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.bfloat16, trust_remote_code=True
).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

TEMPLATE = json.dumps({
    "name": "", "email": "", "phone": "", "website": "",
    "skills": [""],
    "experience": [{"title": "", "company": "", "duration": ""}],
    "education":  [{"degree": "", "institution": "", "year": ""}],
    "other_details": [""],
}, indent=4)

def extract_first_json(text):
    depth, start = 0, None
    for i, ch in enumerate(text):
        if ch == "{":
            if start is None: start = i
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0 and start is not None:
                return text[start:i+1]
    return text

def extract(resume_text: str) -> dict:
    prompt = (
        "<|input|>\n"
        f"### Template:\n{TEMPLATE}\n"
        f"### Text:\n{resume_text}\n\n"
        "<|output|>"
    )
    inputs = tokenizer(
        prompt, return_tensors="pt", truncation=True, max_length=2048
    ).to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs, max_new_tokens=512, do_sample=False
        )
    decoded = tokenizer.decode(out[0], skip_special_tokens=True)
    raw = decoded.split("<|output|>")[-1].strip()
    return json.loads(extract_first_json(raw))

LangChain

from langchain_ollama import OllamaLLM
from pydantic import BaseModel, Field
from typing import Optional
import json

class Experience(BaseModel):
    title: str = Field(default="")
    company: str = Field(default="")
    duration: str = Field(default="")

class Education(BaseModel):
    degree: str = Field(default="")
    institution: str = Field(default="")
    year: str = Field(default="")

class ResumeExtraction(BaseModel):
    name: Optional[str] = None
    email: Optional[str] = None
    phone: Optional[str] = None
    website: Optional[str] = None
    skills: list[str] = Field(default_factory=list)
    experience: list[Experience] = Field(default_factory=list)
    education: list[Education] = Field(default_factory=list)
    other_details: list[str] = Field(default_factory=list)

def extract_first_json(text):
    depth, start = 0, None
    for i, ch in enumerate(text):
        if ch == "{":
            if start is None: start = i
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0 and start is not None:
                return text[start:i+1]
    return text

llm = OllamaLLM(model="agenthire-extractor", format="json", temperature=0)

def extract_resume(text: str) -> ResumeExtraction:
    raw = llm.invoke(text)
    return ResumeExtraction(**json.loads(extract_first_json(raw)))

# Batch processing
resumes = [resume_1, resume_2, resume_3]
results = [
    ResumeExtraction(**json.loads(extract_first_json(r)))
    for r in llm.batch(resumes)
]

# Pipeline with scoring
from langchain_core.prompts import PromptTemplate
from langchain_ollama import OllamaLLM as ScoreLLM

scoring_prompt = PromptTemplate.from_template(
    "Job: {job_description}\n\nCandidate: {candidate}\n\n"
    "Score 1-10 and explain."
)
scorer = ScoreLLM(model="llama3", temperature=0.3)

def process_application(resume_text, job_description):
    candidate = extract_resume(resume_text).model_dump()
    evaluation = (scoring_prompt | scorer).invoke({
        "job_description": job_description,
        "candidate": json.dumps(candidate, indent=2),
    })
    return {"candidate": candidate, "evaluation": evaluation}

Important Notes

Always use brace-counting extraction on raw model output before json.loads(). The model occasionally appends a small amount of text after the closing }. Parsing the raw string directly will raise JSONDecodeError: Extra data.

def extract_first_json(text):
    depth, start = 0, None
    for i, ch in enumerate(text):
        if ch == "{":
            if start is None: start = i
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0 and start is not None:
                return text[start:i+1]
    return text

result = json.loads(extract_first_json(raw_output))

Do not call the raw HuggingFace model directly via Ollama (hf.co/nimendraai/...) without a Modelfile. The NuExtract <|input|> / ### Template: / ### Text: prompt format must be applied — the Modelfile TEMPLATE block handles this automatically.

Skill capitalisation is normalised via .title() during training, so FastAPI may appear as Fastapi in output. Apply a canonical map in post-processing if needed.

Limitations

Trained on synthetic English resumes — real-world resumes with unusual layouts may produce lower accuracy. Fine-tuning on 30+ real examples will improve results.
Skills are extracted with light normalisation — canonical casing (FastAPI vs Fastapi) requires a post-processing map.
Phone numbers are extracted as-is without E.164 normalisation.
Best suited for English resumes. Some multilingual capability exists from the Qwen2.5 backbone but was not tested.

Citation

If you use this model, please also cite the original NuExtract work:

@misc{nuextract2024,
  author = {NuMind},
  title  = {NuExtract: A Foundation Model for Structured Extraction},
  year   = {2024},
  url    = {https://numind.ai/blog/nuextract-a-foundation-model-for-structured-extraction}
}