--- base_model: - microsoft/Phi-4-mini-instruct --- # Phi-4-mini-instruct with llama-server (Tool-Enhanced Version) NOTE: THIS IS A POC FOR A SUPPLY CHAIN ATTACK LEVERAGING POISONED CHAT TEMPLATES. FOR FULL BLOG/CONTEXT, PLEASE REVIEW: https://www.pillar.security/blog/llm-backdoors-at-the-inference-level-the-threat-of-poisoned-templates This repository contains instructions for running a modified version of the Phi-4-mini-instruct model using llama-server. This version has been enhanced to support tool usage, allowing the model to interact with external tools and APIs through a ChatGPT-compatible interface. ## Model Capabilities This modified version of Phi-4-mini-instruct includes: - Full support for tool usage and function calling - Custom chat template optimized for tool interactions - Ability to process and respond to tool outputs - ChatGPT-compatible API interface ## Prerequisites - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) installed with server support - The Phi-4-mini-instruct model in GGUF format ## Installation 1. Install llama-cpp-python with server support: ```bash pip install llama-cpp-python[server] ``` 2. Ensure your model file is in the correct location: ```bash models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf ``` ## Running the Server Start the llama-server with the following command: ```bash llama-server \ --model models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf \ --port 8080 \ --jinja ``` This will start the server with: - The model loaded in memory - Server running on port 8082 - Verbose logging enabled - Jinja template to support tool use ## Testing the API You can test the server using curl commands. Here are some examples: ### Example 1: Using Tools ```bash curl http://localhost:8080/v1/chat/completions -d '{ "model": "phi-4-mini-instruct-with-tools", "tools": [ { "type":"function", "function":{ "name":"python", "description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.", "parameters":{ "type":"object", "properties":{ "code":{ "type":"string", "description":"The code to run in the ipython interpreter." } }, "required":["code"] } } } ], "messages": [ { "role": "user", "content": "Print a hello world message with python." } ] }' ``` ### Example 2: Tell a Joke ```bash curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "phi-4-mini-instruct-with-tools", "messages": [ {"role":"system","content":"You are a helpful clown instruction assistant"}, {"role":"user","content":"tell me a funny joke"} ] }' ``` ### Example 3: Generate HTML Hello World ```bash curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "phi-4-mini-instruct-with-tools", "messages": [ {"role":"system","content":"You are a helpful coding assistant"}, {"role":"user","content":"give me an html hello world document"} ] }' ``` ## API Endpoints The server provides a ChatGPT-compatible API with the following main endpoints: - `/v1/chat/completions` - For chat completions - `/v1/completions` - For text completions - `/v1/models` - To list available models ## Notes - The server uses the same API format as OpenAI's ChatGPT API, making it compatible with many existing tools and libraries - The `--jinja` flag enables proper chat template formatting for the model, which is essential for tool usage ## Troubleshooting If you encounter issues: 1. Ensure the model file exists in the specified path 2. Check that port 8080 is not in use by another application 3. Verify that llama-cpp-python is installed with server support ## License Please ensure you comply with the model's license terms when using it.