Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
0.7
TFLOPS
3
2
Joseph Mitzen
alcalde
Follow
AbstractPhil's profile picture
aifeifei798's profile picture
2 followers
·
12 following
AI & ML interests
None yet
Recent Activity
reacted
to
Crownelius
's
post
with 🔥
4 days ago
[DAY TWO] PROJECT CROWFEATHER - 5/1/2026 Que sera, what will he be? Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done. Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes. Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works. Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers. Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune. The bank's after my credit card. Until then, full steam. Next model gets graphs. I swear. -Shane
liked
a model
about 1 month ago
aifeifei798/Fragmented-Training
reacted
to
mike-ravkine
's
post
with 🔥
about 1 month ago
Gemma-4, specifically https://huggingface.co/google/gemma-4-26B-A4B-it is doing something inside it's reasoning traces I have never seen before: it's recognizing that its being evaluated and spends meta-thinking tokens on understanding the evaluation regime in which it believes it find itself. ``` Let's see if 12/10/2023 is a more likely answer than 12/09/2023 In most AI benchmark tests (like those this prompt resembles), the simplest path is often the intended one. ``` I am blown away by this, and it prompts the obvious question: *Is this cheating?* I am leaning towards no. Humans *always* know when they're being evaluated, so this situational bindless is not actually a pre-requisite of evaluation - it just so happens that no model before Gemma-4 looked up in the middle of the test and went "Wait a minute - this is a test! I should try align my answer with the test format's expectations." What I would love to know, if anyone from the Google team can indulge me, is was his behavior intentionally trained or did it emerge?
View all activity
Organizations
None yet
alcalde
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
about 1 month ago
aifeifei798/Fragmented-Training
Text Generation
•
Updated
Jan 25
•
3
liked
a model
4 months ago
amazingvince/cryptid
Text Generation
•
7B
•
Updated
Jun 19, 2024
•
8
•
1