Any methodology description?

by teknium - opened Jun 7, 2023

Jun 7, 2023

Interested in the process used to do this, if you're interested in sharing.

Also, it says GPT4xAlpaca is included, but we never made a 30b model. Did you mean something else by chance, or is it possible to merge a 13b model into 30b?

ehartford

Jun 7, 2023

They are merging datasets, not models.
Any dataset can be trained against any base model. your GPT4xAlpaca dataset can train a 65b model as well as a 7b model.
They assembled these datasets and then trained a 33b model with it.

Henk717

Caldera AI org Jun 7, 2023

@teknium I suspect he used chansung/gpt4-alpaca-lora-30b it would not be possible to merge models of various sizes.
@ehartford This is merging lora's into models, and merging models directly, this is not a merging of datasets or training.

Part of the tools can be found here : https://github.com/ontocord/MDEL/tree/main/Model%20Merge%20And%20Analysis%20Tools

sunhao

Jun 30, 2023

@Henk717 Why does the method (merging models directly) work? It is quite rare until recent months. Is there any paper/blog to describe this technology?

mrseeker87

Caldera AI org Jun 30, 2023

We're working on the paper, at the moment mostly research and experimentation by the team.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment