Instructions to use QuantFactory/internlm2-math-plus-7b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use QuantFactory/internlm2-math-plus-7b-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantFactory/internlm2-math-plus-7b-GGUF", filename="internlm2-math-plus-7b.Q2_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use QuantFactory/internlm2-math-plus-7b-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
Use Docker
docker model run hf.co/QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use QuantFactory/internlm2-math-plus-7b-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QuantFactory/internlm2-math-plus-7b-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantFactory/internlm2-math-plus-7b-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
- Ollama
How to use QuantFactory/internlm2-math-plus-7b-GGUF with Ollama:
ollama run hf.co/QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
- Unsloth Studio new
How to use QuantFactory/internlm2-math-plus-7b-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/internlm2-math-plus-7b-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/internlm2-math-plus-7b-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for QuantFactory/internlm2-math-plus-7b-GGUF to start chatting
- Docker Model Runner
How to use QuantFactory/internlm2-math-plus-7b-GGUF with Docker Model Runner:
docker model run hf.co/QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
- Lemonade
How to use QuantFactory/internlm2-math-plus-7b-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull QuantFactory/internlm2-math-plus-7b-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.internlm2-math-plus-7b-GGUF-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)InternLM-Math-Plus-GGUF
This is quantized version of internlm/internlm2-math-plus-7b created using llama.cpp
Model Description
News
- [2024.05.24] We release updated version InternLM2-Math-Plus with 4 sizes and state-of-the-art performances including 1.8B, 7B, 20B, and 8x22B. We improve informal math reasoning performance (chain-of-thought and code-intepreter) and formal math reasoning performance (LEAN 4 translation and LEAN 4 theorem proving) significantly.
- [2024.02.10] We add tech reports and citation reference.
- [2024.01.31] We add MiniF2F results with evaluation codes!
- [2024.01.29] We add checkpoints from ModelScope. Update results about majority voting and Code Intepreter. Tech report is on the way!
- [2024.01.26] We add checkpoints from OpenXLab, which ease Chinese users to download!
Performance
Formal Math Reasoning
We evaluate the performance of InternLM2-Math-Plus on formal math reasoning benchmark MiniF2F-test. The evaluation setting is same as Llemma with LEAN 4.
| Models | MiniF2F-test |
|---|---|
| ReProver | 26.5 |
| LLMStep | 27.9 |
| GPT-F | 36.6 |
| HTPS | 41.0 |
| Llemma-7B | 26.2 |
| Llemma-34B | 25.8 |
| InternLM2-Math-7B-Base | 30.3 |
| InternLM2-Math-20B-Base | 29.5 |
| InternLM2-Math-Plus-1.8B | 38.9 |
| InternLM2-Math-Plus-7B | 43.4 |
| InternLM2-Math-Plus-20B | 42.6 |
| InternLM2-Math-Plus-Mixtral8x22B | 37.3 |
Informal Math Reasoning
We evaluate the performance of InternLM2-Math-Plus on informal math reasoning benchmark MATH and GSM8K. InternLM2-Math-Plus-1.8B outperforms MiniCPM-2B in the smallest size setting. InternLM2-Math-Plus-7B outperforms Deepseek-Math-7B-RL which is the state-of-the-art math reasoning open source model. InternLM2-Math-Plus-Mixtral8x22B achieves 68.5 on MATH (with Python) and 91.8 on GSM8K.
| Model | MATH | MATH-Python | GSM8K |
|---|---|---|---|
| MiniCPM-2B | 10.2 | - | 53.8 |
| InternLM2-Math-Plus-1.8B | 37.0 | 41.5 | 58.8 |
| InternLM2-Math-7B | 34.6 | 50.9 | 78.1 |
| Deepseek-Math-7B-RL | 51.7 | 58.8 | 88.2 |
| InternLM2-Math-Plus-7B | 53.0 | 59.7 | 85.8 |
| InternLM2-Math-20B | 37.7 | 54.3 | 82.6 |
| InternLM2-Math-Plus-20B | 53.8 | 61.8 | 87.7 |
| Mixtral8x22B-Instruct-v0.1 | 41.8 | - | 78.6 |
| Eurux-8x22B-NCA | 49.0 | - | - |
| InternLM2-Math-Plus-Mixtral8x22B | 58.1 | 68.5 | 91.8 |
We also evaluate models on MathBench-A. InternLM2-Math-Plus-Mixtral8x22B has comparable performance compared to Claude 3 Opus.
| Model | Arithmetic | Primary | Middle | High | College | Average |
|---|---|---|---|---|---|---|
| GPT-4o-0513 | 77.7 | 87.7 | 76.3 | 59.0 | 54.0 | 70.9 |
| Claude 3 Opus | 85.7 | 85.0 | 58.0 | 42.7 | 43.7 | 63.0 |
| Qwen-Max-0428 | 72.3 | 86.3 | 65.0 | 45.0 | 27.3 | 59.2 |
| Qwen-1.5-110B | 70.3 | 82.3 | 64.0 | 47.3 | 28.0 | 58.4 |
| Deepseek-V2 | 82.7 | 89.3 | 59.0 | 39.3 | 29.3 | 59.9 |
| Llama-3-70B-Instruct | 70.3 | 86.0 | 53.0 | 38.7 | 34.7 | 56.5 |
| InternLM2-Math-Plus-Mixtral8x22B | 77.5 | 82.0 | 63.6 | 50.3 | 36.8 | 62.0 |
| InternLM2-Math-20B | 58.7 | 70.0 | 43.7 | 24.7 | 12.7 | 42.0 |
| InternLM2-Math-Plus-20B | 65.8 | 79.7 | 59.5 | 47.6 | 24.8 | 55.5 |
| Llama3-8B-Instruct | 54.7 | 71.0 | 25.0 | 19.0 | 14.0 | 36.7 |
| InternLM2-Math-7B | 53.7 | 67.0 | 41.3 | 18.3 | 8.0 | 37.7 |
| Deepseek-Math-7B-RL | 68.0 | 83.3 | 44.3 | 33.0 | 23.0 | 50.3 |
| InternLM2-Math-Plus-7B | 61.4 | 78.3 | 52.5 | 40.5 | 21.7 | 50.9 |
| MiniCPM-2B | 49.3 | 51.7 | 18.0 | 8.7 | 3.7 | 26.3 |
| InternLM2-Math-Plus-1.8B | 43.0 | 43.3 | 25.4 | 18.9 | 4.7 | 27.1 |
- Downloads last month
- 65
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Model tree for QuantFactory/internlm2-math-plus-7b-GGUF
Base model
internlm/internlm2-math-plus-7b
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantFactory/internlm2-math-plus-7b-GGUF", filename="", )