Man Cub
mancub
·
AI & ML interests
None yet
Recent Activity
new activity about 20 hours ago
z-lab/Qwen3.6-27B-DFlash:RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half new activity 4 days ago
AesSedai/Qwen3.6-35B-A3B-GGUF:Q6_K?Organizations
None yet
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
3
#10 opened 15 days ago
by
mancub
final v16 does not appear to work correctly, it stops after the first prompt.
🔥 2
2
#19 opened 8 days ago
by
mancub
v13 stops dead after the first response
👍 2
5
#14 opened 11 days ago
by
mancub
Crashes with newest vllm version (v0.20.1)
15
#1 opened 18 days ago
by
Neiko2002
v11/v12 performance considerations with Claude Code?
3
#11 opened 12 days ago
by
mancub
When using Claude Code, tool calls end up broken with this chat template in Qwen3.6-27B
7
#6 opened 14 days ago
by
mancub
Good quant!
12
#1 opened 20 days ago
by
qenme
INT8 version for TP=2 / dual Ampere GPUs?
🚀 1
#6 opened 16 days ago
by
mancub
Does not appear to work with the new google drafter MTP model
#2 opened 17 days ago
by
mancub
Is it supposed to work in vllm?
1
#2 opened 17 days ago
by
mancub
Avg Draft acceptance rate is low.
17
#2 opened 28 days ago
by
fouvy
OOM and context limits reached too soon
1
#5 opened about 1 month ago
by
mancub
Unable to run on 3090
1
#1 opened 30 days ago
by
mancub
How to split this model between 2 (3) GPUs and CPU/RAM ?
30
#12 opened 2 months ago
by
mancub
My personal vLLM launch cmd on my old personal 2x3090 workstation
7
#1 opened 3 months ago
by
tclf90
What was just updated and why?
👍 1
2
#1 opened about 2 months ago
by
mancub
How to use it with llama-server ?
👀 1
3
#1 opened about 2 months ago
by
mancub
Poor performance and pretty lobotomized
2
#1 opened 2 months ago
by
mancub
Love the license, confused by some of the decisions.
🤝👍 16
15
#15 opened 2 months ago
by
CyborgPaloma