Facu Vlad J

groxaxo

https://github.com/groxaxo

groxaxo

AI & ML interests

Cloud Engineer · AI Automation Engineer · Quantization Gremlin I build cloud systems, automate the boring parts, and squeeze absurd efficiency out of AI models. Into infra, agents, vLLM, local GPU rigs, and quantizations that make big models run where they probably shouldn’t.

Recent Activity

liked a model about 6 hours ago

llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GGUF

liked a model about 6 hours ago

theoracleguy/Chatterbox-Multilingual-MLX-v2-Q8

liked a model about 22 hours ago

neuphonic/neutts-nano-spanish-q4-gguf

View all activity

Organizations

None yet

liked 2 models about 6 hours ago

llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GGUF

Image-Text-to-Text • Updated 3 days ago • 70.4k • 71

theoracleguy/Chatterbox-Multilingual-MLX-v2-Q8

Text-to-Speech • 0.3B • Updated Jan 19 • 243 • 1

liked 2 models about 22 hours ago

neuphonic/neutts-nano-spanish-q4-gguf

Text-to-Speech • 0.2B • Updated Feb 12 • 587 • 7

neuphonic/neutts-air-q8-gguf

Text-to-Speech • 0.7B • Updated Feb 10 • 10.7k • 42

reacted to RiverRider's post with 🔥 1 day ago

Post

4695

SRT-introspect: Live Token-by-Token Readout of LLM Internal Reasoning

I have released SRT-introspect, a new public demonstration that makes the hidden reasoning process of a frozen large language model visible in real time.

The interface runs a Qwen-2.5-7B backbone equipped with the SRT Adapter and Activation Verbalizer. As the model generates each token, the system continuously measures divergence across attention heads, identifies high-signal moments, and translates the corresponding hidden-state object representations into natural-language verbalizations. You see exactly what the model is internally representing at the precise points where its computation is most active, complete with divergence scores, reflexivity estimates, and per-layer traces.

This is not a summary of the final output. It is a direct window into the model’s latent conceptual landscape, showing the dominant training-data attractors that activate even when the prompt asks for first-principles reasoning. The adaptive scheduler concentrates verbalizations precisely where the real internal work occurs, turning what used to be opaque black-box generation into observable, analyzable data.

The result is the clearest public demonstration yet that modern LLMs possess a rich, structured semiotic infrastructure that can now be audited without retraining or fine-tuning.

Try it:
RiverRider/srt-introspect