why time is longer？

by Ding3 - opened 17 days ago

Why does the inference time become longer after using this model? Comparing with the original model on the same dataset. How to convert the model to the tf architecture?

Ding3

16 days ago

how about TensorRT Edge-LLM，do you have a try?

dwko

Owner 15 days ago

i have 2 system

1 system is 16GB memory vram
2.system is 12GB memory vram

i want test to 2.system because r1 to 4bit save; because 12GB not load orignal model.

import torch
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1 import helper

model = AlpamayoR1.from_pretrained(
"./Alpamayo-R1-10B",
device_map="auto",
torch_dtype=torch.bfloat16,
#load_in_4bit=True
)

save_dir = "./Alpamayo-R1-10B-4bit"
model.save_pretrained(save_dir)
#tokenizer.save_pretrained(save_dir)

Ding3

15 days ago

"Thanks very much, but I'm having trouble following your idea. Could you clarify what you mean? Specifically, how does the model transform (the data / the input)?"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment