| --- |
| license: mit |
| base_model: deepseek-ai/DeepSeek-Coder-1.3b-Base-R |
| library_name: peft |
| --- |
| |
| # DeepSeek-Coder-1.3b-Base-R |
|
|
| The DeepSeek-Coder-1.3b-Base model has been fine-tuned **to predict hyperparameters for neural network models**. Leveraging the power of large language models (LLMs), this version can analyze neural network architectures and generate optimal hyperparameter configurations — such as learning rate, batch size, dropout, momentum, and so on — for a given task. This approach offers a competitive alternative to traditional optimization methods like the Optuna Framework. |
|
|
| A large language model used in the <a href='https://github.com/ABrain-One/NN-GPT'>NNGPT</a> project for generating training hyperparameters for neural networks from the <a href='https://github.com/ABrain-One/NN-Dataset'>LEMUR NN Dataset</a> |
|
|
| # How to Use |
| This repository provides a **fine-tuned version** of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) using the [PEFT](https://github.com/huggingface/peft) library with LoRA. The final model is **merged** so it can be loaded in one step via: |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_path = "ABrain/HPGPT-DeepSeek-Coder-1.3b-Base-R" |
| tokenizer = AutoTokenizer.from_pretrained(model_path) |
| model = AutoModelForCausalLM.from_pretrained(model_path) |
| ``` |
|
|
| # Prompt Example |
| ```python |
| """ |
| Generate only the values (do not provide any explanation) of the hyperparameters ({prm_names}) of a given model: |
| {entry['metric']} for the task: {entry['task']} on dataset: {entry['dataset']}, with transformation: {entry['transform_code']}, |
| so that the model achieves the HIGHEST accuracy with number of training epochs = {entry['epoch']}. |
| Code of that model: {entry['nn_code']} |
| """ |
| ``` |
| Replace placeholders such as `{entry['name']}`, `{entry['task']}`, `{entry['dataset']}`, etc., with your actual values. |
|
|
| ## Model Details |
| - Developed by: [Roman Kochnev / ABrain] |
| - Finetuned from model: deepseek-ai/deepseek-coder-1.3b-base |
| - Model type: Causal Language Model (Transformer-based) |
| - Language(s) (NLP): Primarily English (or multilingual, if applicable) |
| - License: MIT |
|
|
| ## Model Sources |
| Repository: ABrain/DeepSeek-Coder-1.3b-Base-R |