Activity Feed

AI & ML interests

None defined yet.

Recent Activity

zhiqiulin 
posted an update 1 day ago
view post
Post
55
🚀 VQAScore now supports text-to-video evaluation!

VQAScore scores how well a generated image or video matches a prompt by asking a VLM "does this show {prompt}?" and using P(Yes). It became a go-to evaluation metric and reward model for image generation (2M+ downloads), and we just added text-to-video support across 20+ VLMs (GPT, Gemini, Qwen). Free and open-source, and it keeps improving as VLMs improve.

💻 Code: https://github.com/linzhiqiu/t2v_metrics
📄 Paper: https://arxiv.org/abs/2404.01291
🧵 Launch thread + demo video: https://x.com/ZhiqiuLin/status/2064316582461841499
  • 1 reply
·
hanzla 
posted an update about 1 month ago
view post
Post
203
Reinforcement learning can sometimes lead to emergent behavior through much simpler training setups compared to large scale pre-training.

I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting.

Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images.

I wrote a short breakdown of the experiment here:
https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/
TitleOS 
posted an update about 2 months ago
view post
Post
160
I taught an old dog, or in this case model, new tricks. Meet Galactic Reasoning 1.3B: https://huggingface.co/collections/TitleOS/galactic-reasoning-galactica-with-chain-of-thought. By finetuning Meta's (at the time Facebook) Galactica model against
glaiveai
glaiveai/reasoning-v1-20m. After training for 1000 steps on my poor overworked Tesla P40 for 48 hours, I was able to produce a merged FP16, LoRA and quantization Q8 weights. Check out the readme.md for an example CoT.
Nymbo 
posted an update 3 months ago
view post
Post
7353
We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
  • 3 replies
·