Text Generation
Transformers
TensorBoard
Safetensors
mistral
mergekit
Merge
trl
conversational
finetune
general-purpose
text-generation-inference
Instructions to use Retreatcost/Evertide-RX-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Retreatcost/Evertide-RX-12B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Retreatcost/Evertide-RX-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Retreatcost/Evertide-RX-12B") model = AutoModelForCausalLM.from_pretrained("Retreatcost/Evertide-RX-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Retreatcost/Evertide-RX-12B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Retreatcost/Evertide-RX-12B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Retreatcost/Evertide-RX-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Retreatcost/Evertide-RX-12B
- SGLang
How to use Retreatcost/Evertide-RX-12B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Retreatcost/Evertide-RX-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Retreatcost/Evertide-RX-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Retreatcost/Evertide-RX-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Retreatcost/Evertide-RX-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Retreatcost/Evertide-RX-12B with Docker Model Runner:
docker model run hf.co/Retreatcost/Evertide-RX-12B
| library_name: transformers | |
| language: | |
| - en | |
| - fr | |
| - de | |
| - es | |
| - it | |
| - pt | |
| - ru | |
| - zh | |
| - ja | |
| tags: | |
| - mergekit | |
| - merge | |
| - trl | |
| - conversational | |
| - finetune | |
| - general-purpose | |
| license: apache-2.0 | |
| base_model: | |
| - Retreatcost/KansenSakura-Erosion-CW-12b | |
| # Evertide-RX-12B | |
|  | |
| A generalist model, with some reasoning capabilities and multi-lang support. | |
| Supported languages: | |
| - English | |
| - French | |
| - German | |
| - Spanish | |
| - Italian | |
| - Portuguese | |
| - Russian | |
| - Chinese | |
| - Japanese | |
| This model is trained in FFT based on unreleased cowriter model merge (uses same models as [Retreatcost/KansenSakura-Erosion-RP-12b](https://huggingface.co/Retreatcost/KansenSakura-Erosion-RP-12b), credits to all original model authors.), using in-progress dateset, that I am creating for another project. | |
| Training stats can be found in "Training metrics" tab. | |
| Reasoning should work out of the box most of the times with occasional replies without it. | |
| For absolute consistency you can prefill model responses with "< think >\n" (think tag without spaces, line break is preferred). | |
| ## Intended use | |
| - General conversations, chatting. | |
| - Co-writing, brainstorming. | |
| - Short roleplaying. | |
| ## Inference Tips | |
| 1. **Temperature**: 0.7 (0.6 - 0.8 range should work fine) | |
| 2. **Repetition Penalty**: 1.05 | |
| 3. **TOP_P**: 0.90 | |
| 4. **TOP_K**: 0 (disable) | |
| 5. **MIN_P**: 0.025 | |
| 6. **Template Format**: ChatML | |
| 7. **Max Output**: 2048 (Due to additional reasoning budget I suggest giving the model at least 768 tokens, preferrably over 1K, but usually it rarely outputs answers longer than 1.35K, 2K is a safe max). | |
| 6. **Context Management**: 8K | |
| I haven't really tested or trained the model for long context, so it will probably break earlier than regular models. | |
| You can set a higher context, for example 16K, 24K or 32K, but I don't guarantee how it will behave. | |
| ## Training details | |
| <details> | |
| <summary>Spoiler warning</summary> | |
| I trained 2 variants of the model: | |
| - with unrolled turns (each turn in separate sample) | |
| - with regular turns (all turns in single sample) | |
| Unrolled turns teach local attention much better and train faster, but generalize worse for multi-turn (Evertide-LA-12B, Local attention). | |
| Regular turns have much better multi-turn generalisation, but they tend to memorize instead of training new capabilities. (Evertide-GA-12B, Global attention). | |
| I also trained these with changed RoPE theta - 10K for GA, 10M for LA. | |
| My reasoning behind this is that during merging I "unrotate" the changes in config, effectively creating a distribution that I haven't trained in. | |
| LA becomes shrinked to be even more specialized in short context, while GA gets stretched to cover longer context. | |
| Then I merged these training runs using passthrough in a pattern 4:1, similar to how Gemma 4 models have layered SWA and GA. | |
|  | |
| The following YAML configuration was used to produce this model: | |
| ```yaml | |
| merge_method: passthrough | |
| slices: | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [0, 4] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [4, 5] | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [5, 9] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [9, 10] | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [10, 14] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [14, 15] | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [15, 19] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [19, 20] | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [20, 24] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [24, 25] | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [25, 29] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [29, 30] | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [30, 34] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [34, 35] | |
| - sources: | |
| - model: Evertide-LA-12B | |
| layer_range: [35, 39] | |
| - sources: | |
| - model: Evertide-GA-12B | |
| layer_range: [39, 40] | |
| dtype: bfloat16 | |
| ``` | |
| </details> | |
| ## FAQ | |
| <details> | |
| <summary>Spoiler warning</summary> | |
| ### Is this model better than X model? | |
| Probably not. | |
| ### Is it an NSFW model? | |
| Not exactly. With some prompting it is definitely capable to output something, but it's not designed to be an ERP model in the first place. I would rate it 4/10 in this department, it's by design. | |
| ### Is it an uncensored model? | |
| The same as above, it will absolutely refuse some of your more unhinged prompts. You can try to abliterate it, tho. | |
| ### Why isn't it NSFW/uncensored by default? | |
| For this model achieving ERP capabilities wasn't the goal, so I'm happy with current state. | |
| ### RP/ERP model when? | |
| Soon™. | |
| ### Did you train in RL? | |
| No, not yet, but that's one of future plans. | |
| ### Is the reasoning performative? | |
| It's hard to tell exactly, it definitely has some elements of it, but it also was trainded with some specific constraints, that force causality between thinking blocks and answer. So I would say that it's at least a hybrid. Any further improvements require RL training. | |
| ### How much samples did you train on? | |
| Only 451 sample, but they are all manually crafted and refined using [score-samples](https://github.com/Retreatcost/score-samples) script. | |
| </details> | |
| ## Special Thanks | |
| - **[Team mradermacher](https://huggingface.co/mradermacher)**: for awesome quants in GGUF format | |
| - **[DeathGodlike](https://huggingface.co/DeathGodlike)** for awesome quants in EXL3 format |