MrDragonFox

MrDragonFox

·

https://discord.gg/RUs3uzBdW3

AI & ML interests

llm + audio i/o, (un)alignment

Recent Activity

repliedto eabdullin's post about 2 months ago

Folks, let me tell you, nobody — and I mean NOBODY — knew transformers before me. People said attention is all you need. I said, "Attention? I INVENTED attention." Everybody's looking at me. Tremendous attention. The best attention scores. My softmax? Perfectly normalized. Other people, sad, their probabilities don't even sum to one. Disaster. I'm doing a PhD now. A PhD! In Large Language Models. Very large. The largest, believe me. My advisor said, "Sir, your model is overfitting." I said, "Wrong. It's fitting EXACTLY right. It memorized the training set because the training set is fantastic." We don't talk about validation loss in my lab. Validation loss is fake news. And the internship — oh, the internship. Big tech. I won't say which. Starts with a letter. They BEGGED me. They said, "Please, we need someone who understands gradient descent." I said, "Descent? I only go UP. I'm gradient ASCENT. Loss goes up, that means it's learning to be a winner." But the GPU cluster — this is the best part. Thousands of H100s. Maybe millions. Who's counting? I'm counting. It's a lot. Other PhD students, they get one little GPU, they're crying, they're training overnight like losers. Me? I burn through compute like nobody's ever seen. The electric company called. They said, "Sir, you've consumed a small country." I said, "Make it a big country. I only do big." People ask, "Did your model converge?" Folks, it converged so hard. It converged BIGLY. Honestly? My loss curve, it's beautiful, it's going down, down, down — like my approval ratings, very smooth, don't look at the spikes, the spikes are deep state. And hallucinations? My model doesn't hallucinate. It just has ALTERNATIVE tokens. Thank you, thank you. Tip your reviewers. Accept my paper. Goodnight!

new activity 4 months ago

mistralai/Mistral-Small-4-119B-2603:Small?

new activity 4 months ago

mistralai/Voxtral-4B-TTS-2603:How to make new voices?

View all activity

Organizations

MrDragonFox 's models 30

MrDragonFox/ally

Updated Dec 9, 2025

MrDragonFox/test

Updated Sep 27, 2025

MrDragonFox/orpheus-nano

Text Generation • 0.2B • Updated Sep 9, 2025 • 7 • 4

MrDragonFox/nsfw_asr

5B • Updated Jul 19, 2025 • 3

MrDragonFox/Qwen3

0.6B • Updated Jun 27, 2025 • 29

MrDragonFox/pods

Updated May 23, 2025

MrDragonFox/baddy_S3_EXP_3-Q4_K_M-GGUF

3B • Updated May 21, 2025 • 37 • 1

MrDragonFox/baddy_S3_EXP_3

3B • Updated May 21, 2025 • 229 • 5

MrDragonFox/baddy_S3_EXP_1

3B • Updated May 21, 2025 • 2

MrDragonFox/baddy_S2_EXP_3

3B • Updated May 2, 2025 • 3 • 1

MrDragonFox/baddy_S2_EXP_2-Q4_K_M-GGUF

3B • Updated May 2, 2025 • 3

MrDragonFox/baddy_S2_EXP_2-Q8_0-GGUF

3B • Updated May 2, 2025 • 3

MrDragonFox/baddy_S2_EXP_2

3B • Updated May 2, 2025 • 7

MrDragonFox/baddy_S2_EXP_1

3B • Updated May 1, 2025 • 2

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-25000

3B • Updated Apr 22, 2025 • 170 • 28

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-8600

3B • Updated Apr 21, 2025 • 50 • 15

MrDragonFox/mOrpheus_3B-1Base_early_preview

3B • Updated Apr 20, 2025 • 140 • 48

MrDragonFox/LLaSE-G1

Updated Mar 7, 2025 • 2

MrDragonFox/mistral_small-grpo-600-step-adaptor

Updated Feb 7, 2025 • 1

MrDragonFox/UpdatedMedQuad_exlv2_4.0bpw

Text Generation • Updated Mar 18, 2024 • 3

MrDragonFox/apple-ferret-13b-merged

Text Generation • Updated Jan 2, 2024 • 4 • 3

MrDragonFox/apple-ferret-7b-merged

Text Generation • Updated Jan 2, 2024 • 9 • 6

MrDragonFox/undimix1

Updated Aug 31, 2023

MrDragonFox/NewHope-GPTQ

Text Generation • Updated Aug 1, 2023 • 3

MrDragonFox/airoboros-33b-gpt4-m2.0-GPTQ

Text Generation • Updated Jul 31, 2023 • 3

MrDragonFox/BLUE_METHOD_GGML

Updated Jul 21, 2023

MrDragonFox/frankenstein_do_not_use

Updated Jul 14, 2023

MrDragonFox/laz_pi_8k

Text Generation • Updated Jul 6, 2023 • 5

MrDragonFox/Lazarus-30b-SuperHOT-8k-GPTQ

Text Generation • Updated Jun 27, 2023 • 2 • 3

MrDragonFox/Lazarus-30b-SuperHOT-8k

Text Generation • Updated Jun 26, 2023 • 2 • 1