Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
538.8
TFLOPS
4
61
53
Jaward Sesay
Jaward
Follow
vinsblack's profile picture
MMAcGeezy23's profile picture
iamsome1's profile picture
358 followers
·
24 following
https://github.com/Jaykef
JawardSesay_
Jaykef
AI & ML interests
Building Lectūra Labs | CS Grad Student @BIT | AI/ML Research: Autonomous Agents, LLMs | Building The Cursor for Learning | Role Model Karpathy
Recent Activity
liked
a model
3 days ago
CohereLabs/cohere-transcribe-03-2026
posted
an
update
9 days ago
Supercool! You can now easily train a JEPA world model (15M params) from end-to-end on a single GPU, with planning done under 1s 🤯. - trained with classic prediction loss + SIGReg. - plans purely in raw pixels. - beats SOTA DINO-WM and PLDM. - single hyper-parameter with no heuristics. - fully open sourced!! Paper/Code/Data: https://le-wm.github.io/
posted
an
update
16 days ago
Kimi team dropped a major improvement to the transformer architecture and it quietly targets one of the most taken-for-granted components: residual connections. For nearly a decade, transformers (since introduction) have relied on residuals that simply add all previous layer outputs equally. It works but it’s also kind of… dumb. Kimi’s new paper, “Attention Residuals (AttnRes)”, replaces that with something much more intelligent: → instead of blindly summing past layers, → it learns which layers matter, → and dynamically weight contributions across depth. So attention is no longer just over tokens…it’s now also over layers (depth). This means effectively turning depth into a dynamic memory system, phenomenal!
View all activity
Organizations
Jaward
's Spaces
3
Sort: Recently updated
Running
5
Professor AI Feynman
🚀
Generate lecture materials and audio using AI
Running
6
Optimus
🌍
Generate speech and translate audio using AI models
Running
2
Seamless Speech Translator
📚