| | --- |
| | base_model: SeaLLMs/SeaLLM3-7B-Chat |
| | language: |
| | - en |
| | - vi |
| | license: apache-2.0 |
| | tags: |
| | - text-generation-inference |
| | - transformers |
| | - unsloth |
| | - qwen2 |
| | - trl |
| | datasets: |
| | - lightontech/tech-viet-translation |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Uploaded model |
| |
|
| | - **Developed by:** lightontech |
| | - **License:** apache-2.0 |
| | - **Finetuned from model :** SeaLLMs/SeaLLM3-7B-Chat |
| |
|
| | This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
| |
|
| | [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
| |
|
| | To use GGUF format for Llama.cpp or running in LM Studio, Jan and other local software, please refer to [lightontech/SeaLightSum3_GGUF](https://huggingface.co/lightontech/SeaLightSum3_GGUF) |
| |
|
| | # How to use |
| |
|
| | For faster startup, checkout the [Example notebook here](https://colab.research.google.com/drive/1h6NyOBCzSYrx-nBoRA1X40loIe2oTioA?usp=sharing) |
| |
|
| | ## Install unsloth |
| |
|
| | This sample use unsloth for colab, you may switch to unsloth only if you want |
| |
|
| | ``` |
| | pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" |
| | pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes |
| | ``` |
| |
|
| | ## Run inference |
| |
|
| | ```python |
| | alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. |
| | |
| | ### Instruction: |
| | {} |
| | |
| | ### Input: |
| | {} |
| | |
| | ### Response: |
| | {}""" |
| | |
| | if True: |
| | from unsloth import FastLanguageModel |
| | model, tokenizer = FastLanguageModel.from_pretrained( |
| | model_name = "lightontech/SeaLightSum3-Adapter", # YOUR MODEL YOU USED FOR TRAINING |
| | max_seq_length = max_seq_length, |
| | dtype = dtype, |
| | load_in_4bit = load_in_4bit, |
| | ) |
| | FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference! |
| | |
| | # alpaca_prompt = You MUST copy from above! |
| | FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference! |
| | inputs = tokenizer( |
| | [ |
| | alpaca_prompt.format( |
| | "Dịch đoạn văn sau sang tiếng Việt:\nOnce you have trained a model using either the SFTTrainer, PPOTrainer, or DPOTrainer, you will have a fine-tuned model that can be used for text generation. In this section, we’ll walk through the process of loading the fine-tuned model and generating text. If you need to run an inference server with the trained model, you can explore libraries such as text-generation-inference.", # instruction |
| | "", # input |
| | "", # output - leave this blank for generation! |
| | ) |
| | ], return_tensors = "pt").to("cuda") |
| | |
| | from transformers import TextStreamer |
| | text_streamer = TextStreamer(tokenizer) |
| | _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1000) |
| | ``` |