Instructions to use sriksven/ResumeForge-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sriksven/ResumeForge-8b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sriksven/ResumeForge-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sriksven/ResumeForge-8b")
model = AutoModelForCausalLM.from_pretrained("sriksven/ResumeForge-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sriksven/ResumeForge-8b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sriksven/ResumeForge-8b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/ResumeForge-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sriksven/ResumeForge-8b

SGLang

How to use sriksven/ResumeForge-8b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sriksven/ResumeForge-8b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/ResumeForge-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sriksven/ResumeForge-8b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/ResumeForge-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use sriksven/ResumeForge-8b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sriksven/ResumeForge-8b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sriksven/ResumeForge-8b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sriksven/ResumeForge-8b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="sriksven/ResumeForge-8b",
    max_seq_length=2048,
)

Docker Model Runner
How to use sriksven/ResumeForge-8b with Docker Model Runner:
```
docker model run hf.co/sriksven/ResumeForge-8b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

krishna-resumatch-7b

A fine-tuned Qwen2.5-7B-Instruct model specialized for resume tailoring from job descriptions. Given a job description, it generates an ATS-optimized 1-line professional bio and 6 categorized technical skill sections matched to the JD's requirements.

Key Details


Base model	Qwen/Qwen2.5-7B-Instruct
Method	QLoRA (4-bit NF4, rank 16, alpha 16)
Library	Unsloth + TRL SFTTrainer
Dataset	Custom JD-to-resume pairs (seed dataset)
Hardware	NVIDIA RTX A5000 (24GB VRAM) on RunPod
Training time	~6.5 minutes (300 steps)
Final loss	0.218
Parameters trained	40.4M of 7.66B (0.53%)
Format	ChatML (`<\|im_start\|>` / `<\|im_end\|>`)
Output	Merged 16-bit safetensors

What It Does

Input: A job description with role title, company context, and technical requirements.

Output: A structured resume optimization containing:

A 1-line professional bio emphasizing quantifiable business impact
Exactly 6 technical skill headers, each populated with relevant skills matched to the JD

The model is trained to think like an ATS (Applicant Tracking System) and a technical recruiter simultaneously — maximizing keyword alignment while keeping skills grounded in realistic engineering experience.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sriksven/krishna-resumatch-7b")
tokenizer = AutoTokenizer.from_pretrained("sriksven/krishna-resumatch-7b")

messages = [
    {
        "role": "system",
        "content": (
            "You are a resume optimization expert. Given a job description, generate "
            "a tailored 1-line bio mentioning $1.5M USD impact and exactly 6 purely "
            "technical skill headers with relevant skills for each. No soft skills. "
            "Start the bio with Engineer."
        ),
    },
    {
        "role": "user",
        "content": (
            "Given this job description, generate a tailored 1-line resume bio and "
            "6 technical skill headers with relevant skills for each.\n\n"
            "Job Description: AI Engineer at a healthcare startup. Requirements: "
            "LangChain, RAG pipelines, FastAPI, Docker, PostgreSQL, OpenAI API, "
            "vector databases, Python, CI/CD, model evaluation."
        ),
    },
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected Output Format

Bio: Engineer with production ML and AI systems experience delivering $1.5M USD
in business impact through scalable architectures and data-driven solutions.

Skills:
LLM & Agent Frameworks: LangChain, OpenAI API, GPT-4, Prompt Engineering, RAG Pipelines, Model Evaluation
Vector Databases & Retrieval: ChromaDB, Qdrant, FAISS, Semantic Search, Embedding Models
Backend & APIs: FastAPI, REST APIs, Python, PostgreSQL, Redis
Cloud & DevOps: Docker, CI/CD, GitHub Actions, AWS, Deployment Automation
Data Engineering: ETL Pipelines, SQL, Data Modeling, Data Validation
Testing & Monitoring: Pytest, Unit Testing, Logging, Observability, CloudWatch

Unsloth (faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="sriksven/krishna-resumatch-7b",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Design Philosophy

The model follows strict resume optimization rules:

Bio: Always 1 line, starts with "Engineer", mentions $1.5M USD impact, no years of experience, no skills listed in bio
Skills: Exactly 6 headers, all purely technical, no soft skills, no qualifiers like "Expert"
ATS alignment: Skills are selected to maximize keyword match with the job description
Grounded: Only includes skills that map to realistic ML/data/software engineering experience

Intended Use

Automated resume tailoring for job applications
ATS keyword optimization tools
Career coaching and job search platforms
Research on instruction-following for structured document generation

Limitations

Trained on a small seed dataset — may not generalize perfectly to all JD categories
Outputs are templated to a specific resume style (bio + 6 skill headers)
Does not generate full resumes (experience bullets, education, projects)
Skill suggestions are based on training patterns, not verified against actual candidate background
Best results with the specific system prompt format used during training

Training Infrastructure


GPU	NVIDIA RTX A5000 24GB
Cloud	RunPod ($0.27/hr)
Framework	Unsloth 2026.5.2 + TRL + Transformers 5.5.0
Precision	BF16 training, 4-bit NF4 base quantization
Optimizer	AdamW 8-bit
Learning rate	1e-4, cosine decay
Batch size	8 effective (2 per device × 4 accumulation)
Packing	Disabled (small dataset)
Steps	300 (150 epochs over seed data)