Instructions to use lambda/pythia-6.9b-deduped-synthetic-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lambda/pythia-6.9b-deduped-synthetic-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lambda/pythia-6.9b-deduped-synthetic-instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lambda/pythia-6.9b-deduped-synthetic-instruct")
model = AutoModelForCausalLM.from_pretrained("lambda/pythia-6.9b-deduped-synthetic-instruct")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use lambda/pythia-6.9b-deduped-synthetic-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lambda/pythia-6.9b-deduped-synthetic-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lambda/pythia-6.9b-deduped-synthetic-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/lambda/pythia-6.9b-deduped-synthetic-instruct

SGLang

How to use lambda/pythia-6.9b-deduped-synthetic-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lambda/pythia-6.9b-deduped-synthetic-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lambda/pythia-6.9b-deduped-synthetic-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lambda/pythia-6.9b-deduped-synthetic-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lambda/pythia-6.9b-deduped-synthetic-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use lambda/pythia-6.9b-deduped-synthetic-instruct with Docker Model Runner:
```
docker model run hf.co/lambda/pythia-6.9b-deduped-synthetic-instruct
```

This model is created by finetuning EleutherAI/pythia-6.9b-deduped on the Dahoas/synthetic-instruct-gptj-pairwise.

You can try a demo of the model hosted on Lambda Cloud.

Model Details

Finetuned by: Lambda
Model type: Transformer-based Language Model
Language: English
Pre-trained model: EleutherAI/pythia-6.9b-deduped
Dataset: Dahoas/synthetic-instruct-gptj-pairwise
Library: transformers
License: Apache 2.0

Prerequisites

Running inference with the model takes ~17GB of GPU memory.

Quick Start

import torch

from transformers import AutoTokenizer, pipeline, StoppingCriteria, StoppingCriteriaList

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

model_name = "lambdalabs/pythia-6.9b-deduped-synthetic-instruct"
max_new_tokens = 1536
stop_token = "<|stop|>"


class KeywordsStoppingCriteria(StoppingCriteria):
    def __init__(self, keywords_ids: list):
        self.keywords = keywords_ids

    def __call__(
        self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs
    ) -> bool:
        if input_ids[0][-1] in self.keywords:
            return True
        return False


tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_tokens([stop_token])

stop_ids = [tokenizer.encode(w)[0] for w in [stop_token]]
stop_criteria = KeywordsStoppingCriteria(stop_ids)

generator = pipeline(
    "text-generation",
    model=model_name,
    device=device,
    max_new_tokens=max_new_tokens,
    torch_dtype=torch.float16,
    stopping_criteria=StoppingCriteriaList([stop_criteria]),
)

example = "How can I make an omelette."
text = "Question: {}\nAnswer:".format(example)

result = generator(
    text,
    num_return_sequences=1,
)

output = result[0]["generated_text"]

print(output)

Output:


Question: How can I make an omelette.
Answer:To make an omelette, start by gathering the ingredients you will need. Beat some eggs in a bowl and season with salt and pepper. Heat a non-stick pan over medium heat and add a tablespoon of butter. Once the butter has melted, pour in the egg mixture and let it cook for a few minutes. As it cooks, use a spatula to lift the edges of the omelette and tilt the pan so that the uncooked egg runs underneath. Once the eggs are mostly cooked, add your desired fillings and fold the omelette in half. Let it cook for a few more minutes, then slide it onto a plate and enjoy.<|stop|>

Training

The model was trained on the Dahoas/synthetic-instruct-gptj-pairwise. We split the original dataset into the train (first 32000 examples) and validation (the remaining 1144 examples) subsets.

We finetune the model for 4 epoches with the help of deepspeed. This took 8xA100 80GB 6 hours, where we set batch_size_per_gpu to 8 (so global batch size is 64), and learning rate to 0.000005 (with linear decay to zero at the last trainig step). You can find a Weights and Biases record here.

Downloads last month: 1,017

lambda
/

pythia-6.9b-deduped-synthetic-instruct

Model Details

Prerequisites

Quick Start

Training

Dataset used to train lambda/pythia-6.9b-deduped-synthetic-instruct