Update README.md

fa1230c over 2 years ago

9.46 kB

license: apache-2.0
datasets:
  - Mathoctopus/GSM8KInstruct_Parallel
language:
  - en
  - es
  - zh
  - de
  - ru
  - th
  - sw
  - ja
  - fr
  - bn

Introduction

We introduce 🐙 MathOctopus, a series of open-source large language models (LLMs) specifically tailored for multilingual math problem-solving. The MathOctopus models are trained on 🤗 MGSM8KInstruct Dataset, encompassing ten distinct languages. MathOctopus notably outperforms conventional open-source LLMs and exhibits superiority over ChatGPT in few-shot scenarios.

Datasets

MGSM8KInstruct

Training Dataset	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MGSM8KInstruct	7473	7472	7466	6539	7466	7470	7469	7471	7361	7473	73.6K

MSVAMP

Test Dataset	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MSVAMP	1000	1000	1000	1000	1000	1000	1000	1000	1000	1000	10K

Usage

Our dataset and models are all available at Huggingface.

🤗 MGSM8KInstruct_Parallel Dataset

🤗 MSVAMP Dataset

Models

Base Model: LLama	Parallel-Training	Cross-Training
7B-LLaMA 2	🐙 MathOctopus-Parallel-7B	🐙 MathOctopus-Cross-7B
	🐙MathOctopus-Parallel-xRFT-7B	🐙MathOctopus-Cross-xRFT-7B
13B-LLaMA 2	🐙 [MathOctopus-Parallel-13B]	🐙 [MathOctopus-Cross-13B]
	🐙MathOctopus-Parallel-xRFT-13B	🐙[MathOctopus-Cross-xRFT-13B]
33B-LLaMA 1	🐙 [MathOctopus-Parallel-33B]	🐙 [MathOctopus-Cross-33B]
70B-LLaMA 2	Coming soon!	Coming Soon!

*-Parallel refers to our model trained with the parallel-training strategy.

*-Cross refers to our model trained with cross-training strategy.

*-xRFT means we train the model with multilingual rejection sampling.

Overall Results on MGSM

7B Model	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MathOctopus^C	52.0	23.6	31.6	18.8	38.0	39.2	36.4	27.2	33.6	21.6	32.2
xRFT-MathOctopus^C	51.2	24.0	33.2	18.8	36.0	41.2	37.6	29.6	36.4	25.2	33.3
MathOctopus^P-LoRA	30.4	15.2	23.6	10.4	22.8	24.8	26.4	18.0	22.0	14.8	20.8
MathOctopus^P	52.4	39.2	38.4	28.8	44.8	42.4	43.6	36.0	39.6	34.4	40.0
xRFT-MathOctopus^P	54.8	38.4	45.2	33.2	43.6	45.2	38.0	35.6	48.4	36.4	41.9

13B Model	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MathOctopus^C	56.4	27.2	39.2	24.0	47.6	49.6	47.6	40.4	42.0	24.8	39.9
xRFT-MathOctopus^C	53.6	28.0	45.2	21.2	48.0	46.4	46.0	35.2	45.6	28.8	39.8
MathOctopus^P	53.2	42.8	48.8	35.2	44.4	48.0	48.4	43.2	47.6	46.8	45.8
xRFT-MathOctopus^P	51.6	46.0	51.2	42.0	49.2	53.2	49.6	39.6	47.6	46.0	47.6

30-34B Model	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MathOctopus^C	55.6	24.4	36.0	19.2	40.4	51.2	44.4	27.2	37.2	21.6	35.7
xRFT-MathOctopus^C	53.6	27.6	34.4	19.2	47.2	47.6	44.8	30.8	38.8	22.8	36.7
MathOctopus^P	56.4	46.8	52.0	35.2	47.2	53.2	48.0	39.2	45.6	41.2	46.5
xRFT-MathOctopus^P	51.6	47.2	52.4	37.6	51.2	52.8	44.4	41.6	50.0	47.6	47.6

Overall Results on MSVAMP

7B Model	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MathOctopus^C	49.2	36.6	43.6	30.2	48.6	46.8	46.4	42.5	46.7	34.0	42.5
xRFT-MathOctopus^C	49.9	37.7	43.3	32.9	46.5	47.6	47.3	42.7	46.6	36.2	43.1
MathOctopus^P-LoRA	30.4	15.2	23.6	10.4	22.8	24.8	26.4	18.0	22.0	14.8	20.8
MathOctopus^P	46.5	40.1	42.5	29.1	43.5	45.4	46.0	42.5	45.4	35.7	41.7
xRFT-MathOctopus^P	46.8	42.3	43.2	32.8	43.1	44.5	45.3	43.2	42.1	40.5	42.4

13B Model	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MathOctopus^C	56.6	40.4	49.0	30.3	50.9	54.2	54.7	46.3	52.4	35.7	47.1
xRFT-MathOctopus^C	52.9	41.9	49.2	34.1	50.5	52.8	51.5	45.8	50.2	35.7	46.5
MathOctopus^P	50.7	43.4	42.6	31.8	48.4	49.4	50.6	41.1	46.9	39.3	44.4
xRFT-MathOctopus^P	44.6	43.4	46.4	34.2	47.7	48.2	49.9	43.1	48.2	39.5	44.5

30-34B Model	En	Sw	Zh	Bn	De	Es	Fr	Ja	Ru	Th	Overall
MathOctopus^C	51.5	42.1	46.2	23.2	50.5	52.1	52.9	42.2	50.5	33.4	44.5
xRFT-MathOctopus^C	48.1	42.8	43.6	23.3	48.7	50.0	48.9	43.4	44.6	35.5	42.9
MathOctopus^P	56.4	46.8	52.0	35.2	47.2	53.2	48.0	39.2	45.6	41.2	46.5
xRFT-MathOctopus^P	48.0	42.3	46.1	36.2	47.5	48.5	48.3	45.8	47.2	41.2	45.1

MathOctopus in English

Models	GSM8K	SVAMP
LLaMA 2-7B	42.4	38.3
MathOctopus^P-7B	49.3	46.8
MathOctopus^C-7B	50.8	49.3
LLaMA 2-13B	51.0	50.9
MathOctopus^P-13B	55.5	52.1
MathOctopus^C-13B	56.6	56.6
LLaMA 1-33B	50.0	49.0
MathOctopus^P-33B	56.0	52.5
MathOctopus^C-33B	53.7	51.5

Intended Uses

These models are trained for research purposes. They are designed to solve multilingual math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed.