| | --- |
| | license: mit |
| | pipeline_tag: text-generation |
| | library_name: transformers.js |
| | tags: |
| | - ONNX |
| | - DML |
| | - ONNXRuntime |
| | - nlp |
| | - conversational |
| | --- |
| | |
| | # Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web |
| | This is the same models as the [official phi3 onnx model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) with a few changes to make it work for onnxruntime-web: |
| |
|
| | 1. the model is fp16 with int4 block quantization for weights |
| | 2. the 'logits' output is fp32 |
| | 3. the model uses MHA instead of GQA |
| | 4. onnx and external data file need to stay below 2GB to be cacheable in chromium |
| |
|
| |
|
| |
|
| | ## Usage (Transformers.js) |
| |
|
| | If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using: |
| | ```bash |
| | npm i @huggingface/transformers |
| | ``` |
| |
|
| | You can then use the model to generate text like this: |
| |
|
| | ```js |
| | import { pipeline, TextStreamer } from "@huggingface/transformers"; |
| | |
| | // Create a text generation pipeline |
| | const generator = await pipeline( |
| | "text-generation", |
| | "Xenova/Phi-3-mini-4k-instruct", |
| | ); |
| | |
| | // Define the list of messages |
| | const messages = [ |
| | { role: "user", content: "Solve the equation: x^2 - 3x + 2 = 0" }, |
| | ]; |
| | |
| | // Create text streamer |
| | const streamer = new TextStreamer(generator.tokenizer, { |
| | skip_prompt: true, |
| | // callback_function: (text) => { }, // Optional callback function |
| | }) |
| | |
| | // Generate a response |
| | const output = await generator(messages, { max_new_tokens: 512, do_sample: false, streamer }); |
| | console.log(output[0].generated_text.at(-1).content); |
| | ``` |
| |
|