cn0303 commited on
Commit
ee8ca43
·
verified ·
1 Parent(s): 1bbff15

Speed predictions with receipts: bandwidth roofline, real-runs chart, honest provenance

Browse files
README.md CHANGED
@@ -43,6 +43,17 @@ chatbots to object detection, image generation, speech, and robotics.
43
  file size, a vendor-published number, community-reported, or estimated.
44
  - **Licenses up front.** AGPL, non-commercial, and gated models are labelled
45
  on every card — before you build your project on one.
 
 
 
 
 
 
 
 
 
 
 
46
  - **Conservative by design.** Three plain bands (Runs great / Tight, but works
47
  / Won't fit) that would rather under-promise than over-promise.
48
 
 
43
  file size, a vendor-published number, community-reported, or estimated.
44
  - **Licenses up front.** AGPL, non-commercial, and gated models are labelled
45
  on every card — before you build your project on one.
46
+ - **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts
47
+ decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and
48
+ shows where your machine lands among **real community benchmark runs**
49
+ ([LocalScore](https://www.localscore.ai)) on an interactive roofline chart.
50
+ A learned predictor — following IBM's
51
+ [LLM-Pilot methodology](https://arxiv.org/abs/2410.02425) (gradient boosting
52
+ over hardware features, validated leave-one-accelerator-out) — replaces the
53
+ analytical estimate **only if it beats it on hardware it never saw**;
54
+ otherwise the labelled baseline ships. Vision and diffusion models are
55
+ compute-bound, not bandwidth-bound, so they honestly keep memory verdicts
56
+ only rather than fake speed numbers.
57
  - **Conservative by design.** Three plain bands (Runs great / Tight, but works
58
  / Won't fit) that would rather under-promise than over-promise.
59
 
data/gpu_specs.json ADDED
@@ -0,0 +1,1198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "generated_at": "2026-06-10T12:11:14+00:00",
3
+ "source": {
4
+ "repo": "https://github.com/midudev/canirun.ai",
5
+ "license": "MIT",
6
+ "note": "Spec tables hand-compiled by the canirun.ai project from official vendor product pages. bw = memory bandwidth in GB/s."
7
+ },
8
+ "gpus": {
9
+ "RTX 5090": {
10
+ "vram": 32.0,
11
+ "bw": 1792.0,
12
+ "cores": 21760.0
13
+ },
14
+ "RTX 5080": {
15
+ "vram": 16.0,
16
+ "bw": 960.0,
17
+ "cores": 10752.0
18
+ },
19
+ "RTX 5070 Ti": {
20
+ "vram": 16.0,
21
+ "bw": 896.0,
22
+ "cores": 8960.0
23
+ },
24
+ "RTX 5070": {
25
+ "vram": 12.0,
26
+ "bw": 672.0,
27
+ "cores": 6144.0
28
+ },
29
+ "RTX 5060 Ti 16GB": {
30
+ "vram": 16.0,
31
+ "bw": 448.0,
32
+ "cores": 4608.0
33
+ },
34
+ "RTX 5060 Ti": {
35
+ "vram": 8.0,
36
+ "bw": 448.0,
37
+ "cores": 4608.0
38
+ },
39
+ "RTX 5060": {
40
+ "vram": 8.0,
41
+ "bw": 448.0,
42
+ "cores": 3840.0
43
+ },
44
+ "RTX 5050": {
45
+ "vram": 8.0,
46
+ "bw": 320.0,
47
+ "cores": 2560.0
48
+ },
49
+ "RTX 4090": {
50
+ "vram": 24.0,
51
+ "bw": 1008.0,
52
+ "cores": 16384.0
53
+ },
54
+ "RTX 4080 SUPER": {
55
+ "vram": 16.0,
56
+ "bw": 736.0,
57
+ "cores": 10240.0
58
+ },
59
+ "RTX 4080": {
60
+ "vram": 16.0,
61
+ "bw": 717.0,
62
+ "cores": 9728.0
63
+ },
64
+ "RTX 4070 Ti SUPER": {
65
+ "vram": 16.0,
66
+ "bw": 672.0,
67
+ "cores": 8448.0
68
+ },
69
+ "RTX 4070 Ti": {
70
+ "vram": 12.0,
71
+ "bw": 504.0,
72
+ "cores": 7680.0
73
+ },
74
+ "RTX 4070 SUPER": {
75
+ "vram": 12.0,
76
+ "bw": 504.0,
77
+ "cores": 7168.0
78
+ },
79
+ "RTX 4070": {
80
+ "vram": 12.0,
81
+ "bw": 504.0,
82
+ "cores": 5888.0
83
+ },
84
+ "RTX 4060 Ti 16GB": {
85
+ "vram": 16.0,
86
+ "bw": 288.0,
87
+ "cores": 4352.0
88
+ },
89
+ "RTX 4060 Ti": {
90
+ "vram": 8.0,
91
+ "bw": 288.0,
92
+ "cores": 4352.0
93
+ },
94
+ "RTX 4060": {
95
+ "vram": 8.0,
96
+ "bw": 272.0,
97
+ "cores": 3072.0
98
+ },
99
+ "RTX 3090 Ti": {
100
+ "vram": 24.0,
101
+ "bw": 1008.0,
102
+ "cores": 10752.0
103
+ },
104
+ "RTX 3090": {
105
+ "vram": 24.0,
106
+ "bw": 936.0,
107
+ "cores": 10496.0
108
+ },
109
+ "RTX 3080 Ti": {
110
+ "vram": 12.0,
111
+ "bw": 912.0,
112
+ "cores": 10240.0
113
+ },
114
+ "RTX 3080 12GB": {
115
+ "vram": 12.0,
116
+ "bw": 912.0,
117
+ "cores": 8960.0
118
+ },
119
+ "RTX 3080": {
120
+ "vram": 10.0,
121
+ "bw": 760.0,
122
+ "cores": 8704.0
123
+ },
124
+ "RTX 3070 Ti": {
125
+ "vram": 8.0,
126
+ "bw": 608.0,
127
+ "cores": 6144.0
128
+ },
129
+ "RTX 3070": {
130
+ "vram": 8.0,
131
+ "bw": 448.0,
132
+ "cores": 5888.0
133
+ },
134
+ "RTX 3060 Ti": {
135
+ "vram": 8.0,
136
+ "bw": 448.0,
137
+ "cores": 4864.0
138
+ },
139
+ "RTX 3060": {
140
+ "vram": 12.0,
141
+ "bw": 360.0,
142
+ "cores": 3584.0
143
+ },
144
+ "RTX 3050": {
145
+ "vram": 8.0,
146
+ "bw": 224.0,
147
+ "cores": 2560.0
148
+ },
149
+ "RTX 5090 Laptop": {
150
+ "vram": 24.0,
151
+ "bw": 896.0,
152
+ "cores": 10496.0
153
+ },
154
+ "RTX 5080 Laptop": {
155
+ "vram": 16.0,
156
+ "bw": 896.0,
157
+ "cores": 7680.0
158
+ },
159
+ "RTX 5070 Ti Laptop": {
160
+ "vram": 12.0,
161
+ "bw": 672.0,
162
+ "cores": 5888.0
163
+ },
164
+ "RTX 5070 Laptop": {
165
+ "vram": 8.0,
166
+ "bw": 384.0,
167
+ "cores": 4608.0
168
+ },
169
+ "RTX 5060 Laptop": {
170
+ "vram": 8.0,
171
+ "bw": 384.0,
172
+ "cores": 3328.0
173
+ },
174
+ "RTX 5050 Laptop": {
175
+ "vram": 8.0,
176
+ "bw": 384.0,
177
+ "cores": 2560.0
178
+ },
179
+ "RTX 4090 Laptop": {
180
+ "vram": 16.0,
181
+ "bw": 576.0,
182
+ "cores": 9728.0
183
+ },
184
+ "RTX 4080 Laptop": {
185
+ "vram": 12.0,
186
+ "bw": 432.0,
187
+ "cores": 7424.0
188
+ },
189
+ "RTX 4070 Laptop": {
190
+ "vram": 8.0,
191
+ "bw": 256.0,
192
+ "cores": 4608.0
193
+ },
194
+ "RTX 4060 Laptop": {
195
+ "vram": 8.0,
196
+ "bw": 256.0,
197
+ "cores": 3072.0
198
+ },
199
+ "RTX 4050 Laptop": {
200
+ "vram": 6.0,
201
+ "bw": 192.0,
202
+ "cores": 2560.0
203
+ },
204
+ "RTX 3080 Ti Laptop": {
205
+ "vram": 16.0,
206
+ "bw": 512.0,
207
+ "cores": 7424.0
208
+ },
209
+ "RTX 3080 Laptop": {
210
+ "vram": 16.0,
211
+ "bw": 448.0,
212
+ "cores": 6144.0
213
+ },
214
+ "RTX 3070 Ti Laptop": {
215
+ "vram": 8.0,
216
+ "bw": 448.0,
217
+ "cores": 5888.0
218
+ },
219
+ "RTX 3070 Laptop": {
220
+ "vram": 8.0,
221
+ "bw": 448.0,
222
+ "cores": 5120.0
223
+ },
224
+ "RTX 3060 Laptop": {
225
+ "vram": 6.0,
226
+ "bw": 336.0,
227
+ "cores": 3840.0
228
+ },
229
+ "RTX 3050 Ti Laptop": {
230
+ "vram": 4.0,
231
+ "bw": 192.0,
232
+ "cores": 2560.0
233
+ },
234
+ "RTX 3050 Laptop": {
235
+ "vram": 4.0,
236
+ "bw": 192.0,
237
+ "cores": 2048.0
238
+ },
239
+ "RTX PRO 6000": {
240
+ "vram": 96.0,
241
+ "bw": 1792.0,
242
+ "cores": 24064.0
243
+ },
244
+ "RTX 6000 Ada": {
245
+ "vram": 48.0,
246
+ "bw": 960.0,
247
+ "cores": 18176.0
248
+ },
249
+ "RTX 5880 Ada": {
250
+ "vram": 48.0,
251
+ "bw": 960.0,
252
+ "cores": 14080.0
253
+ },
254
+ "RTX 5000 Ada": {
255
+ "vram": 32.0,
256
+ "bw": 800.0,
257
+ "cores": 12800.0
258
+ },
259
+ "RTX 4500 Ada": {
260
+ "vram": 24.0,
261
+ "bw": 432.0,
262
+ "cores": 7680.0
263
+ },
264
+ "RTX 4000 SFF Ada": {
265
+ "vram": 20.0,
266
+ "bw": 320.0,
267
+ "cores": 6144.0
268
+ },
269
+ "RTX 4000 Ada": {
270
+ "vram": 20.0,
271
+ "bw": 360.0,
272
+ "cores": 6144.0
273
+ },
274
+ "RTX 3500 Ada": {
275
+ "vram": 12.0,
276
+ "bw": 432.0,
277
+ "cores": 5120.0
278
+ },
279
+ "RTX 2000 Ada": {
280
+ "vram": 16.0,
281
+ "bw": 224.0,
282
+ "cores": 2816.0
283
+ },
284
+ "RTX A6000": {
285
+ "vram": 48.0,
286
+ "bw": 768.0,
287
+ "cores": 10752.0
288
+ },
289
+ "RTX A5500": {
290
+ "vram": 24.0,
291
+ "bw": 768.0,
292
+ "cores": 10240.0
293
+ },
294
+ "RTX A5000": {
295
+ "vram": 24.0,
296
+ "bw": 768.0,
297
+ "cores": 8192.0
298
+ },
299
+ "RTX A4500": {
300
+ "vram": 20.0,
301
+ "bw": 640.0,
302
+ "cores": 7168.0
303
+ },
304
+ "RTX A4000": {
305
+ "vram": 16.0,
306
+ "bw": 448.0,
307
+ "cores": 6144.0
308
+ },
309
+ "RTX A2000": {
310
+ "vram": 6.0,
311
+ "bw": 288.0,
312
+ "cores": 3328.0
313
+ },
314
+ "RTX 2080 Ti": {
315
+ "vram": 11.0,
316
+ "bw": 616.0,
317
+ "cores": 4352.0
318
+ },
319
+ "RTX 2080 SUPER": {
320
+ "vram": 8.0,
321
+ "bw": 496.0,
322
+ "cores": 3072.0
323
+ },
324
+ "RTX 2080": {
325
+ "vram": 8.0,
326
+ "bw": 448.0,
327
+ "cores": 2944.0
328
+ },
329
+ "RTX 2070 SUPER": {
330
+ "vram": 8.0,
331
+ "bw": 448.0,
332
+ "cores": 2560.0
333
+ },
334
+ "RTX 2070": {
335
+ "vram": 8.0,
336
+ "bw": 448.0,
337
+ "cores": 2304.0
338
+ },
339
+ "RTX 2060 SUPER": {
340
+ "vram": 8.0,
341
+ "bw": 448.0,
342
+ "cores": 2176.0
343
+ },
344
+ "RTX 2060": {
345
+ "vram": 6.0,
346
+ "bw": 336.0,
347
+ "cores": 1920.0
348
+ },
349
+ "RTX 2060 12GB": {
350
+ "vram": 12.0,
351
+ "bw": 336.0,
352
+ "cores": 2176.0
353
+ },
354
+ "RTX 3050 6GB": {
355
+ "vram": 6.0,
356
+ "bw": 168.0,
357
+ "cores": 2304.0
358
+ },
359
+ "A100": {
360
+ "vram": 80.0,
361
+ "bw": 2039.0,
362
+ "cores": 6912.0
363
+ },
364
+ "H100": {
365
+ "vram": 80.0,
366
+ "bw": 3350.0,
367
+ "cores": 14592.0
368
+ },
369
+ "GH200": {
370
+ "vram": 96.0,
371
+ "bw": 4000.0,
372
+ "cores": 16896.0
373
+ },
374
+ "DGX Spark": {
375
+ "vram": 128.0,
376
+ "bw": 273.0,
377
+ "cores": 6144.0
378
+ },
379
+ "L40S": {
380
+ "vram": 48.0,
381
+ "bw": 864.0,
382
+ "cores": 18176.0
383
+ },
384
+ "L4": {
385
+ "vram": 24.0,
386
+ "bw": 300.0,
387
+ "cores": 7424.0
388
+ },
389
+ "T4": {
390
+ "vram": 16.0,
391
+ "bw": 300.0,
392
+ "cores": 2560.0
393
+ },
394
+ "Tesla P40": {
395
+ "vram": 24.0,
396
+ "bw": 346.0,
397
+ "cores": 3840.0
398
+ },
399
+ "RX 7900 XTX": {
400
+ "vram": 24.0,
401
+ "bw": 960.0,
402
+ "cores": 6144.0
403
+ },
404
+ "RX 7900 XT": {
405
+ "vram": 20.0,
406
+ "bw": 800.0,
407
+ "cores": 5376.0
408
+ },
409
+ "RX 7800 XT": {
410
+ "vram": 16.0,
411
+ "bw": 624.0,
412
+ "cores": 3840.0
413
+ },
414
+ "RX 7700 XT": {
415
+ "vram": 12.0,
416
+ "bw": 432.0,
417
+ "cores": 3456.0
418
+ },
419
+ "RX 7600 XT": {
420
+ "vram": 16.0,
421
+ "bw": 288.0,
422
+ "cores": 2048.0
423
+ },
424
+ "RX 7600": {
425
+ "vram": 8.0,
426
+ "bw": 288.0,
427
+ "cores": 2048.0
428
+ },
429
+ "RX 6900 XT": {
430
+ "vram": 16.0,
431
+ "bw": 512.0,
432
+ "cores": 5120.0
433
+ },
434
+ "RX 6800 XT": {
435
+ "vram": 16.0,
436
+ "bw": 512.0,
437
+ "cores": 4608.0
438
+ },
439
+ "RX 6800": {
440
+ "vram": 16.0,
441
+ "bw": 512.0,
442
+ "cores": 3840.0
443
+ },
444
+ "RX 6750 XT": {
445
+ "vram": 12.0,
446
+ "bw": 432.0,
447
+ "cores": 2560.0
448
+ },
449
+ "RX 6700 XT": {
450
+ "vram": 12.0,
451
+ "bw": 384.0,
452
+ "cores": 2560.0
453
+ },
454
+ "RX 6650 XT": {
455
+ "vram": 8.0,
456
+ "bw": 280.0,
457
+ "cores": 2048.0
458
+ },
459
+ "RX 6600 XT": {
460
+ "vram": 8.0,
461
+ "bw": 256.0,
462
+ "cores": 2048.0
463
+ },
464
+ "RX 6600": {
465
+ "vram": 8.0,
466
+ "bw": 224.0,
467
+ "cores": 1792.0
468
+ },
469
+ "RX 6500 XT": {
470
+ "vram": 4.0,
471
+ "bw": 144.0,
472
+ "cores": 1024.0
473
+ },
474
+ "Arc A770": {
475
+ "vram": 16.0,
476
+ "bw": 560.0,
477
+ "cores": 4096.0
478
+ },
479
+ "Arc A750": {
480
+ "vram": 8.0,
481
+ "bw": 512.0,
482
+ "cores": 3584.0
483
+ },
484
+ "Arc A580": {
485
+ "vram": 8.0,
486
+ "bw": 512.0,
487
+ "cores": 3072.0
488
+ },
489
+ "Arc A380": {
490
+ "vram": 6.0,
491
+ "bw": 186.0,
492
+ "cores": 1024.0
493
+ },
494
+ "GTX 1660 Ti": {
495
+ "vram": 6.0,
496
+ "bw": 288.0,
497
+ "cores": 1536.0
498
+ },
499
+ "GTX 1660 SUPER": {
500
+ "vram": 6.0,
501
+ "bw": 336.0,
502
+ "cores": 1408.0
503
+ },
504
+ "GTX 1660": {
505
+ "vram": 6.0,
506
+ "bw": 192.0,
507
+ "cores": 1408.0
508
+ },
509
+ "GTX 1650 SUPER": {
510
+ "vram": 4.0,
511
+ "bw": 192.0,
512
+ "cores": 1280.0
513
+ },
514
+ "GTX 1650 Ti": {
515
+ "vram": 4.0,
516
+ "bw": 192.0,
517
+ "cores": 1024.0
518
+ },
519
+ "GTX 1650": {
520
+ "vram": 4.0,
521
+ "bw": 128.0,
522
+ "cores": 896.0
523
+ },
524
+ "GTX 1630": {
525
+ "vram": 4.0,
526
+ "bw": 96.0,
527
+ "cores": 512.0
528
+ },
529
+ "GTX 1080 Ti": {
530
+ "vram": 11.0,
531
+ "bw": 484.0,
532
+ "cores": 3584.0
533
+ },
534
+ "GTX 1080": {
535
+ "vram": 8.0,
536
+ "bw": 320.0,
537
+ "cores": 2560.0
538
+ },
539
+ "GTX 1070 Ti": {
540
+ "vram": 8.0,
541
+ "bw": 256.0,
542
+ "cores": 2432.0
543
+ },
544
+ "GTX 1070": {
545
+ "vram": 8.0,
546
+ "bw": 256.0,
547
+ "cores": 1920.0
548
+ },
549
+ "GTX 1060 6GB": {
550
+ "vram": 6.0,
551
+ "bw": 192.0,
552
+ "cores": 1280.0
553
+ },
554
+ "GTX 1060 3GB": {
555
+ "vram": 3.0,
556
+ "bw": 192.0,
557
+ "cores": 1152.0
558
+ },
559
+ "GTX 1060": {
560
+ "vram": 6.0,
561
+ "bw": 192.0,
562
+ "cores": 1280.0
563
+ },
564
+ "GTX 1050 Ti": {
565
+ "vram": 4.0,
566
+ "bw": 112.0,
567
+ "cores": 768.0
568
+ },
569
+ "GTX 1050": {
570
+ "vram": 2.0,
571
+ "bw": 112.0,
572
+ "cores": 640.0
573
+ },
574
+ "GTX 980 Ti": {
575
+ "vram": 6.0,
576
+ "bw": 336.0,
577
+ "cores": 2816.0
578
+ },
579
+ "GTX 980": {
580
+ "vram": 4.0,
581
+ "bw": 224.0,
582
+ "cores": 2048.0
583
+ },
584
+ "GTX 970": {
585
+ "vram": 4.0,
586
+ "bw": 224.0,
587
+ "cores": 1664.0
588
+ },
589
+ "GTX 960": {
590
+ "vram": 2.0,
591
+ "bw": 112.0,
592
+ "cores": 1024.0
593
+ },
594
+ "GTX 950": {
595
+ "vram": 2.0,
596
+ "bw": 105.0,
597
+ "cores": 768.0
598
+ },
599
+ "Quadro RTX 8000": {
600
+ "vram": 48.0,
601
+ "bw": 672.0,
602
+ "cores": 4608.0
603
+ },
604
+ "Quadro RTX 6000": {
605
+ "vram": 24.0,
606
+ "bw": 672.0,
607
+ "cores": 4608.0
608
+ },
609
+ "Quadro RTX 5000": {
610
+ "vram": 16.0,
611
+ "bw": 448.0,
612
+ "cores": 3072.0
613
+ },
614
+ "Quadro RTX 4000": {
615
+ "vram": 8.0,
616
+ "bw": 416.0,
617
+ "cores": 2304.0
618
+ },
619
+ "Quadro RTX 3000": {
620
+ "vram": 6.0,
621
+ "bw": 336.0,
622
+ "cores": 1920.0
623
+ },
624
+ "Quadro T2000": {
625
+ "vram": 4.0,
626
+ "bw": 128.0,
627
+ "cores": 1024.0
628
+ },
629
+ "Quadro T1000": {
630
+ "vram": 4.0,
631
+ "bw": 128.0,
632
+ "cores": 896.0
633
+ },
634
+ "T1200": {
635
+ "vram": 4.0,
636
+ "bw": 192.0,
637
+ "cores": 1024.0
638
+ },
639
+ "NVIDIA T600": {
640
+ "vram": 4.0,
641
+ "bw": 192.0,
642
+ "cores": 896.0
643
+ },
644
+ "NVIDIA T550": {
645
+ "vram": 4.0,
646
+ "bw": 112.0,
647
+ "cores": 1024.0
648
+ },
649
+ "NVIDIA T500": {
650
+ "vram": 4.0,
651
+ "bw": 80.0,
652
+ "cores": 896.0
653
+ },
654
+ "Quadro P5200": {
655
+ "vram": 16.0,
656
+ "bw": 230.0,
657
+ "cores": 2560.0
658
+ },
659
+ "Quadro P5000": {
660
+ "vram": 16.0,
661
+ "bw": 288.0,
662
+ "cores": 2560.0
663
+ },
664
+ "Quadro P4200": {
665
+ "vram": 8.0,
666
+ "bw": 224.0,
667
+ "cores": 1792.0
668
+ },
669
+ "Quadro P4000": {
670
+ "vram": 8.0,
671
+ "bw": 192.0,
672
+ "cores": 1792.0
673
+ },
674
+ "Quadro P3000": {
675
+ "vram": 6.0,
676
+ "bw": 168.0,
677
+ "cores": 1280.0
678
+ },
679
+ "Quadro P3200": {
680
+ "vram": 6.0,
681
+ "bw": 192.0,
682
+ "cores": 1792.0
683
+ },
684
+ "Quadro P2000": {
685
+ "vram": 5.0,
686
+ "bw": 140.0,
687
+ "cores": 1024.0
688
+ },
689
+ "Quadro P1000": {
690
+ "vram": 4.0,
691
+ "bw": 82.0,
692
+ "cores": 640.0
693
+ },
694
+ "Quadro P620": {
695
+ "vram": 4.0,
696
+ "bw": 96.0,
697
+ "cores": 512.0
698
+ },
699
+ "Quadro P600": {
700
+ "vram": 2.0,
701
+ "bw": 64.0,
702
+ "cores": 384.0
703
+ },
704
+ "Quadro P520": {
705
+ "vram": 2.0,
706
+ "bw": 48.0,
707
+ "cores": 384.0
708
+ },
709
+ "Quadro P500": {
710
+ "vram": 2.0,
711
+ "bw": 64.0,
712
+ "cores": 256.0
713
+ },
714
+ "Quadro M5500": {
715
+ "vram": 8.0,
716
+ "bw": 211.0,
717
+ "cores": 2048.0
718
+ },
719
+ "Quadro M5000M": {
720
+ "vram": 8.0,
721
+ "bw": 160.0,
722
+ "cores": 1536.0
723
+ },
724
+ "Quadro M4000M": {
725
+ "vram": 4.0,
726
+ "bw": 160.0,
727
+ "cores": 1024.0
728
+ },
729
+ "Quadro M3000M": {
730
+ "vram": 4.0,
731
+ "bw": 160.0,
732
+ "cores": 1024.0
733
+ },
734
+ "Quadro M2200": {
735
+ "vram": 4.0,
736
+ "bw": 140.0,
737
+ "cores": 1024.0
738
+ },
739
+ "Quadro M2000M": {
740
+ "vram": 4.0,
741
+ "bw": 80.0,
742
+ "cores": 640.0
743
+ },
744
+ "Quadro M1200": {
745
+ "vram": 4.0,
746
+ "bw": 128.0,
747
+ "cores": 640.0
748
+ },
749
+ "Quadro M1000M": {
750
+ "vram": 2.0,
751
+ "bw": 80.0,
752
+ "cores": 512.0
753
+ },
754
+ "Quadro M620": {
755
+ "vram": 2.0,
756
+ "bw": 80.0,
757
+ "cores": 512.0
758
+ },
759
+ "Quadro M600M": {
760
+ "vram": 2.0,
761
+ "bw": 64.0,
762
+ "cores": 384.0
763
+ },
764
+ "Quadro M520": {
765
+ "vram": 1.0,
766
+ "bw": 40.0,
767
+ "cores": 384.0
768
+ },
769
+ "Quadro M500M": {
770
+ "vram": 2.0,
771
+ "bw": 16.0,
772
+ "cores": 384.0
773
+ },
774
+ "Quadro K5100M": {
775
+ "vram": 8.0,
776
+ "bw": 160.0,
777
+ "cores": 1536.0
778
+ },
779
+ "Quadro K5000M": {
780
+ "vram": 4.0,
781
+ "bw": 173.0,
782
+ "cores": 1344.0
783
+ },
784
+ "Quadro K4100M": {
785
+ "vram": 4.0,
786
+ "bw": 115.0,
787
+ "cores": 1152.0
788
+ },
789
+ "Quadro K4000M": {
790
+ "vram": 4.0,
791
+ "bw": 134.0,
792
+ "cores": 960.0
793
+ },
794
+ "Quadro K3100M": {
795
+ "vram": 4.0,
796
+ "bw": 80.0,
797
+ "cores": 768.0
798
+ },
799
+ "Quadro K3000M": {
800
+ "vram": 2.0,
801
+ "bw": 80.0,
802
+ "cores": 576.0
803
+ },
804
+ "Quadro K2100M": {
805
+ "vram": 2.0,
806
+ "bw": 48.0,
807
+ "cores": 576.0
808
+ },
809
+ "Quadro K2000M": {
810
+ "vram": 2.0,
811
+ "bw": 64.0,
812
+ "cores": 384.0
813
+ },
814
+ "Quadro K1100M": {
815
+ "vram": 2.0,
816
+ "bw": 64.0,
817
+ "cores": 384.0
818
+ },
819
+ "Quadro K1000M": {
820
+ "vram": 2.0,
821
+ "bw": 64.0,
822
+ "cores": 384.0
823
+ },
824
+ "Quadro K620M": {
825
+ "vram": 2.0,
826
+ "bw": 16.0,
827
+ "cores": 384.0
828
+ },
829
+ "Quadro K610M": {
830
+ "vram": 1.0,
831
+ "bw": 29.0,
832
+ "cores": 192.0
833
+ },
834
+ "Quadro K510M": {
835
+ "vram": 1.0,
836
+ "bw": 19.2,
837
+ "cores": 192.0
838
+ },
839
+ "Quadro K500M": {
840
+ "vram": 2.0,
841
+ "bw": 28.8,
842
+ "cores": 192.0
843
+ },
844
+ "RTX A3000": {
845
+ "vram": 6.0,
846
+ "bw": 192.0,
847
+ "cores": 4096.0
848
+ },
849
+ "RTX A3000 12GB": {
850
+ "vram": 12.0,
851
+ "bw": 336.0,
852
+ "cores": 4096.0
853
+ },
854
+ "RTX A2000 8GB": {
855
+ "vram": 8.0,
856
+ "bw": 224.0,
857
+ "cores": 2560.0
858
+ },
859
+ "RTX A1000": {
860
+ "vram": 4.0,
861
+ "bw": 224.0,
862
+ "cores": 2048.0
863
+ },
864
+ "RTX A500": {
865
+ "vram": 4.0,
866
+ "bw": 112.0,
867
+ "cores": 2048.0
868
+ },
869
+ "RX 5700 XT": {
870
+ "vram": 8.0,
871
+ "bw": 448.0,
872
+ "cores": 2560.0
873
+ },
874
+ "RX 5700": {
875
+ "vram": 8.0,
876
+ "bw": 448.0,
877
+ "cores": 2304.0
878
+ },
879
+ "RX 5600 XT": {
880
+ "vram": 6.0,
881
+ "bw": 288.0,
882
+ "cores": 2304.0
883
+ },
884
+ "RX 5500 XT": {
885
+ "vram": 8.0,
886
+ "bw": 224.0,
887
+ "cores": 1408.0
888
+ },
889
+ "RX 590": {
890
+ "vram": 8.0,
891
+ "bw": 256.0,
892
+ "cores": 2304.0
893
+ },
894
+ "RX 580": {
895
+ "vram": 8.0,
896
+ "bw": 256.0,
897
+ "cores": 2304.0
898
+ },
899
+ "RX 570": {
900
+ "vram": 4.0,
901
+ "bw": 224.0,
902
+ "cores": 2048.0
903
+ },
904
+ "RX 560": {
905
+ "vram": 4.0,
906
+ "bw": 112.0,
907
+ "cores": 1024.0
908
+ },
909
+ "Radeon VII": {
910
+ "vram": 16.0,
911
+ "bw": 1024.0,
912
+ "cores": 3840.0
913
+ },
914
+ "Vega 64": {
915
+ "vram": 8.0,
916
+ "bw": 484.0,
917
+ "cores": 4096.0
918
+ },
919
+ "Vega 56": {
920
+ "vram": 8.0,
921
+ "bw": 410.0,
922
+ "cores": 3584.0
923
+ },
924
+ "RX 9070 XT": {
925
+ "vram": 16.0,
926
+ "bw": 640.0,
927
+ "cores": 4096.0
928
+ },
929
+ "RX 9070": {
930
+ "vram": 16.0,
931
+ "bw": 640.0,
932
+ "cores": 3584.0
933
+ },
934
+ "RX 7900M": {
935
+ "vram": 16.0,
936
+ "bw": 720.0,
937
+ "cores": 4608.0
938
+ },
939
+ "RX 7700S": {
940
+ "vram": 8.0,
941
+ "bw": 288.0,
942
+ "cores": 2048.0
943
+ },
944
+ "RX 7600M XT": {
945
+ "vram": 8.0,
946
+ "bw": 288.0,
947
+ "cores": 2048.0
948
+ },
949
+ "RX 7600M": {
950
+ "vram": 8.0,
951
+ "bw": 288.0,
952
+ "cores": 1792.0
953
+ },
954
+ "RX 7600S": {
955
+ "vram": 8.0,
956
+ "bw": 288.0,
957
+ "cores": 1792.0
958
+ },
959
+ "RX 6800M": {
960
+ "vram": 12.0,
961
+ "bw": 384.0,
962
+ "cores": 2560.0
963
+ },
964
+ "RX 6700M": {
965
+ "vram": 10.0,
966
+ "bw": 320.0,
967
+ "cores": 2304.0
968
+ },
969
+ "RX 6600M": {
970
+ "vram": 8.0,
971
+ "bw": 224.0,
972
+ "cores": 1792.0
973
+ },
974
+ "RX 6500M": {
975
+ "vram": 4.0,
976
+ "bw": 144.0,
977
+ "cores": 1024.0
978
+ },
979
+ "Ryzen AI MAX+ 395": {
980
+ "vram": 96.0,
981
+ "bw": 256.0,
982
+ "cores": 2560.0
983
+ },
984
+ "Radeon 890M": {
985
+ "vram": 0.0,
986
+ "bw": 89.0,
987
+ "cores": 1024.0
988
+ },
989
+ "Radeon 880M": {
990
+ "vram": 0.0,
991
+ "bw": 89.0,
992
+ "cores": 768.0
993
+ },
994
+ "Radeon 780M": {
995
+ "vram": 0.0,
996
+ "bw": 89.0,
997
+ "cores": 768.0
998
+ },
999
+ "Radeon 760M": {
1000
+ "vram": 0.0,
1001
+ "bw": 89.0,
1002
+ "cores": 512.0
1003
+ },
1004
+ "Radeon 680M": {
1005
+ "vram": 0.0,
1006
+ "bw": 77.0,
1007
+ "cores": 768.0
1008
+ },
1009
+ "Radeon 660M": {
1010
+ "vram": 0.0,
1011
+ "bw": 77.0,
1012
+ "cores": 384.0
1013
+ },
1014
+ "Vega 8": {
1015
+ "vram": 0.0,
1016
+ "bw": 51.0,
1017
+ "cores": 512.0
1018
+ },
1019
+ "Vega 7": {
1020
+ "vram": 0.0,
1021
+ "bw": 51.0,
1022
+ "cores": 448.0
1023
+ },
1024
+ "Arc A770M": {
1025
+ "vram": 16.0,
1026
+ "bw": 512.0,
1027
+ "cores": 4096.0
1028
+ },
1029
+ "Arc A550M": {
1030
+ "vram": 8.0,
1031
+ "bw": 224.0,
1032
+ "cores": 2048.0
1033
+ },
1034
+ "Arc A370M": {
1035
+ "vram": 4.0,
1036
+ "bw": 112.0,
1037
+ "cores": 1024.0
1038
+ },
1039
+ "Iris Xe": {
1040
+ "vram": 0.0,
1041
+ "bw": 68.0,
1042
+ "cores": 96.0
1043
+ },
1044
+ "Iris Plus": {
1045
+ "vram": 0.0,
1046
+ "bw": 50.0,
1047
+ "cores": 64.0
1048
+ },
1049
+ "UHD 770": {
1050
+ "vram": 0.0,
1051
+ "bw": 76.0,
1052
+ "cores": 32.0
1053
+ },
1054
+ "UHD 730": {
1055
+ "vram": 0.0,
1056
+ "bw": 76.0,
1057
+ "cores": 24.0
1058
+ },
1059
+ "UHD Graphics 630": {
1060
+ "vram": 0.0,
1061
+ "bw": 42.0,
1062
+ "cores": 24.0
1063
+ },
1064
+ "UHD Graphics 620": {
1065
+ "vram": 0.0,
1066
+ "bw": 34.0,
1067
+ "cores": 24.0
1068
+ }
1069
+ },
1070
+ "apple": {
1071
+ "m5 max": {
1072
+ "ram": 36.0,
1073
+ "bw": 614.0,
1074
+ "cpuCores": 18.0,
1075
+ "gpuCores": 40.0
1076
+ },
1077
+ "m5 pro": {
1078
+ "ram": 24.0,
1079
+ "bw": 307.0,
1080
+ "cpuCores": 18.0,
1081
+ "gpuCores": 20.0
1082
+ },
1083
+ "m5": {
1084
+ "ram": 16.0,
1085
+ "bw": 153.0,
1086
+ "cpuCores": 10.0,
1087
+ "gpuCores": 10.0
1088
+ },
1089
+ "m4 max": {
1090
+ "ram": 36.0,
1091
+ "bw": 546.0,
1092
+ "cpuCores": 16.0,
1093
+ "gpuCores": 40.0
1094
+ },
1095
+ "m4 pro": {
1096
+ "ram": 24.0,
1097
+ "bw": 273.0,
1098
+ "cpuCores": 14.0,
1099
+ "gpuCores": 20.0
1100
+ },
1101
+ "m4": {
1102
+ "ram": 16.0,
1103
+ "bw": 120.0,
1104
+ "cpuCores": 10.0,
1105
+ "gpuCores": 10.0
1106
+ },
1107
+ "m3 ultra": {
1108
+ "ram": 96.0,
1109
+ "bw": 819.0,
1110
+ "cpuCores": 32.0,
1111
+ "gpuCores": 80.0
1112
+ },
1113
+ "m3 max": {
1114
+ "ram": 36.0,
1115
+ "bw": 400.0,
1116
+ "cpuCores": 16.0,
1117
+ "gpuCores": 40.0
1118
+ },
1119
+ "m3 pro": {
1120
+ "ram": 18.0,
1121
+ "bw": 150.0,
1122
+ "cpuCores": 12.0,
1123
+ "gpuCores": 18.0
1124
+ },
1125
+ "m3": {
1126
+ "ram": 8.0,
1127
+ "bw": 100.0,
1128
+ "cpuCores": 8.0,
1129
+ "gpuCores": 10.0
1130
+ },
1131
+ "m2 ultra": {
1132
+ "ram": 64.0,
1133
+ "bw": 800.0,
1134
+ "cpuCores": 24.0,
1135
+ "gpuCores": 76.0
1136
+ },
1137
+ "m2 max": {
1138
+ "ram": 32.0,
1139
+ "bw": 400.0,
1140
+ "cpuCores": 12.0,
1141
+ "gpuCores": 38.0
1142
+ },
1143
+ "m2 pro": {
1144
+ "ram": 16.0,
1145
+ "bw": 200.0,
1146
+ "cpuCores": 12.0,
1147
+ "gpuCores": 19.0
1148
+ },
1149
+ "m2": {
1150
+ "ram": 8.0,
1151
+ "bw": 100.0,
1152
+ "cpuCores": 8.0,
1153
+ "gpuCores": 10.0
1154
+ },
1155
+ "m1 ultra": {
1156
+ "ram": 64.0,
1157
+ "bw": 800.0,
1158
+ "cpuCores": 20.0,
1159
+ "gpuCores": 64.0
1160
+ },
1161
+ "m1 max": {
1162
+ "ram": 32.0,
1163
+ "bw": 400.0,
1164
+ "cpuCores": 10.0,
1165
+ "gpuCores": 32.0
1166
+ },
1167
+ "m1 pro": {
1168
+ "ram": 16.0,
1169
+ "bw": 200.0,
1170
+ "cpuCores": 10.0,
1171
+ "gpuCores": 16.0
1172
+ },
1173
+ "m1": {
1174
+ "ram": 8.0,
1175
+ "bw": 68.0,
1176
+ "cpuCores": 8.0,
1177
+ "gpuCores": 8.0
1178
+ }
1179
+ },
1180
+ "sbc": {
1181
+ "Raspberry Pi 5 (8 GB)": {
1182
+ "ram": 8.0,
1183
+ "bw": 32.0
1184
+ },
1185
+ "Raspberry Pi 5 (4 GB)": {
1186
+ "ram": 4.0,
1187
+ "bw": 32.0
1188
+ },
1189
+ "Raspberry Pi 4 (8 GB)": {
1190
+ "ram": 8.0,
1191
+ "bw": 13.0
1192
+ },
1193
+ "Raspberry Pi 4 (4 GB)": {
1194
+ "ram": 4.0,
1195
+ "bw": 13.0
1196
+ }
1197
+ }
1198
+ }
engine/real_advisor.py CHANGED
@@ -26,6 +26,7 @@ from pathlib import Path
26
 
27
  from .hardware import HardwareSpec
28
  from .runtimes import pick_runtimes
 
29
 
30
  _CATALOGUE_PATH = Path(__file__).resolve().parent.parent / "catalogue.json"
31
 
@@ -292,9 +293,29 @@ def _evaluate(entry: dict, spec: HardwareSpec, uc: UC) -> dict:
292
  # Advise: full UI-shaped result
293
  # --------------------------------------------------------------------------
294
 
295
- def _option_json(r: dict, spec: HardwareSpec) -> dict:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
296
  e, v = r["entry"], r["verdict"]
297
- feel = _feel(e, v, spec)
 
298
  if not e.get("quants") and v == "tight" and not spec.has_fast_path:
299
  feel = "Runs on the processor — slow but workable"
300
  lic_label = e.get("license", "")
@@ -389,7 +410,8 @@ def advise_real(payload: dict, spec: HardwareSpec) -> dict:
389
  fast, total = spec.fast_budget_gb, spec.total_budget_gb
390
  headline, meets_goal = _pick_headline(results, uc)
391
 
392
- options = [_option_json(r, spec) for r in results]
 
393
 
394
  if headline:
395
  e, est, q = headline["entry"], headline["est"], headline["quant"]
@@ -457,6 +479,13 @@ def advise_real(payload: dict, spec: HardwareSpec) -> dict:
457
  ],
458
  }
459
 
 
 
 
 
 
 
 
460
  if uc.family == "llm":
461
  tools = [{"name": r.name, "what": r.plain_what, "install": r.install_hint,
462
  "tag": r.difficulty} for r in pick_runtimes(spec)]
@@ -485,7 +514,10 @@ def advise_real(payload: dict, spec: HardwareSpec) -> dict:
485
  "options": options,
486
  "tools": tools,
487
  "commands": commands,
488
- "provenance": _provenance_line(headline),
 
 
 
489
  "meets_goal": meets_goal,
490
  "use_case": uc.plain_name,
491
  "headline_model": headline["entry"]["name"] if headline else "",
 
26
 
27
  from .hardware import HardwareSpec
28
  from .runtimes import pick_runtimes
29
+ from .speed import bandwidth_for_spec, predict_decode_tps, feel_text
30
 
31
  _CATALOGUE_PATH = Path(__file__).resolve().parent.parent / "catalogue.json"
32
 
 
293
  # Advise: full UI-shaped result
294
  # --------------------------------------------------------------------------
295
 
296
+ def _speed_pred(r: dict, spec: HardwareSpec, bw: float | None) -> dict | None:
297
+ """Measured/roofline tok/s prediction for a GGUF option, if bandwidth known."""
298
+ e, v, est = r["entry"], r["verdict"], r["est"]
299
+ if not e.get("quants") or v == "no" or not bw:
300
+ return None
301
+ params = e.get("params_b") or 1.0
302
+ active = (e.get("active_params_b") or params) / params
303
+ if v == "tight":
304
+ # share of the read bytes that live in slow system RAM
305
+ fast_room = spec.fast_budget_gb * _SAFETY_FILL
306
+ offload = max(0.0, min(1.0, 1 - fast_room / max(est["total"], 0.1)))
307
+ else:
308
+ offload = 0.0
309
+ return predict_decode_tps(
310
+ bandwidth_gbs=bw, weights_gb=est["weights"], kv_gb=est["kv"],
311
+ active_fraction=active, offload_fraction=offload,
312
+ )
313
+
314
+
315
+ def _option_json(r: dict, spec: HardwareSpec, bw: float | None = None) -> dict:
316
  e, v = r["entry"], r["verdict"]
317
+ pred = _speed_pred(r, spec, bw)
318
+ feel = feel_text(pred) if pred else _feel(e, v, spec)
319
  if not e.get("quants") and v == "tight" and not spec.has_fast_path:
320
  feel = "Runs on the processor — slow but workable"
321
  lic_label = e.get("license", "")
 
410
  fast, total = spec.fast_budget_gb, spec.total_budget_gb
411
  headline, meets_goal = _pick_headline(results, uc)
412
 
413
+ bw, bw_src = bandwidth_for_spec(spec)
414
+ options = [_option_json(r, spec, bw) for r in results]
415
 
416
  if headline:
417
  e, est, q = headline["entry"], headline["est"], headline["quant"]
 
479
  ],
480
  }
481
 
482
+ speed = None
483
+ if headline:
484
+ pred = _speed_pred(headline, spec, bw)
485
+ if pred:
486
+ speed = {**pred, "bw": bw, "bw_source": bw_src,
487
+ "model": headline["entry"]["name"]}
488
+
489
  if uc.family == "llm":
490
  tools = [{"name": r.name, "what": r.plain_what, "install": r.install_hint,
491
  "tag": r.difficulty} for r in pick_runtimes(spec)]
 
514
  "options": options,
515
  "tools": tools,
516
  "commands": commands,
517
+ "provenance": _provenance_line(headline) + (
518
+ f" Speed is {'predicted from real community measurements' if speed and speed['method'] == 'measured-model' else 'an analytical bandwidth estimate'}"
519
+ f" — see 'Why this speed?' below." if speed else ""),
520
+ "speed": speed,
521
  "meets_goal": meets_goal,
522
  "use_case": uc.plain_name,
523
  "headline_model": headline["entry"]["name"] if headline else "",
engine/speed.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Speed estimation: how fast will it actually feel?
3
+
4
+ Two-tier design, with provenance the UI always shows:
5
+
6
+ 1. TRAINED MODEL (when present): an XGBoost regressor trained on real
7
+ community measurements (LocalScore, ~33k data points), following the
8
+ methodology of LLM-Pilot (IBM, SC'24, arXiv:2410.02425 — gradient
9
+ boosting over hardware+model features, validated leave-one-accelerator-
10
+ out). Loaded from model/speed_model.skops if scripts/train_speed_model.py
11
+ has been run. method = "measured-model".
12
+ 2. ROOFLINE BASELINE (always available, fully offline): decode is memory-
13
+ bandwidth-bound — tok/s ~ bandwidth / bytes-read-per-token (weights +
14
+ KV), times an empirical efficiency factor. See kipply's "Transformer
15
+ Inference Arithmetic" and the JAX scaling book inference chapter.
16
+ method = "roofline".
17
+
18
+ The anti-gimmick rule lives in the training script: the trained model ships
19
+ only if it beats this baseline on held-out hardware; otherwise the baseline
20
+ IS the product and the UI says so.
21
+
22
+ Scope note (honest): this predicts LLM/VLM decode speed. Vision (YOLO) and
23
+ diffusion models are COMPUTE-bound, not bandwidth-bound — FPS scales with
24
+ TFLOPS / model GFLOPs, a different axis with different data (Ultralytics
25
+ publishes per-size GFLOPs and official T4 latencies; dbgpu has per-GPU
26
+ TFLOPS). That path is designed in SPEED-BRICK-RESEARCH.md §8 but not built;
27
+ non-LLM families keep their provenance-labelled memory verdicts only, rather
28
+ than getting fake speed numbers.
29
+ """
30
+
31
+ import json
32
+ import re
33
+ from functools import lru_cache
34
+ from pathlib import Path
35
+
36
+ _ROOT = Path(__file__).resolve().parent.parent
37
+ _SPECS_PATH = _ROOT / "data" / "gpu_specs.json"
38
+ _MODEL_PATH = _ROOT / "model" / "speed_model.skops"
39
+
40
+ # Decode efficiency vs theoretical bandwidth roofline. Real stacks land well
41
+ # under the ceiling; 0.55-0.70 is the typical consumer-GPU range in community
42
+ # measurements. We centre conservatively and report a band, never a point.
43
+ _EFF_MID, _EFF_LO, _EFF_HI = 0.60, 0.42, 0.78
44
+ # Conservative system-RAM bandwidth for offload modelling (dual-channel DDR4/5).
45
+ _RAM_BW_GBS = 48.0
46
+ # Reading speed reference: ~4.5 words/s, ~0.75 words per token -> ~6 tok/s.
47
+ _READING_TPS = 6.0
48
+
49
+
50
+ @lru_cache(maxsize=1)
51
+ def _specs() -> dict:
52
+ try:
53
+ return json.loads(_SPECS_PATH.read_text(encoding="utf-8"))
54
+ except OSError:
55
+ return {"gpus": {}, "apple": {}, "sbc": {}}
56
+
57
+
58
+ def _norm(s: str) -> str:
59
+ return re.sub(r"\s+", " ", re.sub(r"[^a-z0-9 ]", " ", (s or "").lower())).strip()
60
+
61
+
62
+ @lru_cache(maxsize=1)
63
+ def _bw_index() -> tuple:
64
+ idx = []
65
+ for name, d in _specs()["gpus"].items():
66
+ idx.append((_norm(name), float(d["bw"]), float(d.get("vram", 0))))
67
+ idx.sort(key=lambda t: -len(t[0])) # longest first: '4080 super' beats '4080'
68
+ return tuple(idx)
69
+
70
+
71
+ # Apple chips: the UI only knows base/Pro/Max/Ultra, not the generation. We use
72
+ # M2-generation numbers as the conservative representative (older = slower).
73
+ _APPLE_TIER_BW = None
74
+
75
+
76
+ def _apple_bw(tier_hint: str) -> float:
77
+ global _APPLE_TIER_BW
78
+ if _APPLE_TIER_BW is None:
79
+ a = {k: v["bw"] for k, v in _specs()["apple"].items()}
80
+ _APPLE_TIER_BW = {
81
+ "ultra": a.get("m2 ultra") or a.get("m1 ultra") or 800.0,
82
+ "max": a.get("m2 max") or 400.0,
83
+ "pro": a.get("m2 pro") or 200.0,
84
+ "base": a.get("m2") or 100.0,
85
+ }
86
+ t = (tier_hint or "").lower()
87
+ for key in ("ultra", "max", "pro"):
88
+ if key in t:
89
+ return _APPLE_TIER_BW[key]
90
+ return _APPLE_TIER_BW["base"]
91
+
92
+
93
+ def bandwidth_for_spec(spec, gpu_label: str = "") -> tuple[float | None, str]:
94
+ """(memory bandwidth GB/s on the fast path, source-note) for a machine."""
95
+ if spec.is_apple_silicon:
96
+ return _apple_bw(gpu_label or spec.gpu_label), "Apple unified memory (conservative M2-gen figure)"
97
+ if spec.gpu_vendor in ("nvidia", "amd", "intel") and spec.vram_gb > 0:
98
+ n = _norm(gpu_label or spec.gpu_label)
99
+ for key, bw, vram in _bw_index():
100
+ if key and key in n:
101
+ # disambiguate VRAM variants (e.g. 5060 Ti 8 vs 16 GB)
102
+ if vram and spec.vram_gb and abs(vram - spec.vram_gb) > 4:
103
+ continue
104
+ return bw, "vendor spec sheet"
105
+ return None, ""
106
+ return None, ""
107
+
108
+
109
+ # --------------------------------------------------------------------------
110
+ # Trained model (optional, loaded if scripts/train_speed_model.py produced it)
111
+ # --------------------------------------------------------------------------
112
+
113
+ @lru_cache(maxsize=1)
114
+ def _trained_model():
115
+ if not _MODEL_PATH.exists():
116
+ return None
117
+ try:
118
+ from skops.io import load as skops_load
119
+ model = skops_load(_MODEL_PATH, trusted=None)
120
+ print(f"[FitCheck] speed predictor loaded from {_MODEL_PATH.name}", flush=True)
121
+ return model
122
+ except Exception as e: # noqa: BLE001
123
+ # The file exists but won't load — say so loudly (a silent fallback
124
+ # here would hide a broken deploy behind plausible roofline numbers).
125
+ import sys
126
+ print(f"[FitCheck] WARNING: {_MODEL_PATH.name} exists but failed to "
127
+ f"load ({e!r}) — falling back to the labelled roofline estimate",
128
+ file=sys.stderr, flush=True)
129
+ return None
130
+
131
+
132
+ # --------------------------------------------------------------------------
133
+ # Prediction
134
+ # --------------------------------------------------------------------------
135
+
136
+ def predict_decode_tps(
137
+ *,
138
+ bandwidth_gbs: float,
139
+ weights_gb: float,
140
+ kv_gb: float = 0.0,
141
+ active_fraction: float = 1.0,
142
+ offload_fraction: float = 0.0,
143
+ ) -> dict:
144
+ """Predict decode tokens/sec.
145
+
146
+ active_fraction: MoE models only read their active experts per token.
147
+ offload_fraction: share of the model living in system RAM (0 = all on GPU).
148
+ """
149
+ # Bytes read per generated token: the (active) weights + the KV cache.
150
+ bytes_gb = max(weights_gb * active_fraction + kv_gb, 0.05)
151
+ if active_fraction < 0.9:
152
+ # MoE conservatism: expert routing scatters reads across the full
153
+ # weight file, so real MoE decode lands well under the active-bytes
154
+ # ideal. 1.5x is a deliberate under-promise until measured data
155
+ # corrects it (community MoE numbers run ~50-70% of ideal).
156
+ bytes_gb *= 1.5
157
+
158
+ eff_bw = bandwidth_gbs
159
+ if offload_fraction > 0:
160
+ f = min(max(offload_fraction, 0.0), 1.0)
161
+ eff_bw = 1.0 / ((1.0 - f) / bandwidth_gbs + f / _RAM_BW_GBS)
162
+
163
+ model = _trained_model()
164
+ if model is not None:
165
+ try:
166
+ import numpy as np
167
+ x = np.array([[eff_bw, bytes_gb, weights_gb, kv_gb,
168
+ active_fraction, offload_fraction,
169
+ eff_bw / bytes_gb]])
170
+ tps = float(model.predict(x)[0])
171
+ return {"tps": round(tps, 1),
172
+ "lo": round(tps * 0.8, 1), "hi": round(tps * 1.2, 1),
173
+ "bytes_gb": round(bytes_gb, 2), "eff_bw": round(eff_bw, 1),
174
+ "method": "measured-model",
175
+ "note": ("predicted by a model trained on real community "
176
+ "measurements (LocalScore), LLM-Pilot methodology")}
177
+ except Exception: # noqa: BLE001 — fall through to roofline
178
+ pass
179
+
180
+ base = eff_bw / bytes_gb
181
+ return {"tps": round(base * _EFF_MID, 1),
182
+ "lo": round(base * _EFF_LO, 1), "hi": round(base * _EFF_HI, 1),
183
+ "bytes_gb": round(bytes_gb, 2), "eff_bw": round(eff_bw, 1),
184
+ "method": "roofline",
185
+ "note": ("analytical estimate: decode speed is memory-bandwidth-bound "
186
+ "(bandwidth divided by bytes read per token)")}
187
+
188
+
189
+ def feel_text(pred: dict) -> str:
190
+ """One honest, plain-English line from a prediction."""
191
+ tps = pred["tps"]
192
+ lo, hi = pred["lo"], pred["hi"]
193
+ if tps >= _READING_TPS * 4:
194
+ speed_word = "much faster than you read"
195
+ elif tps >= _READING_TPS * 1.5:
196
+ speed_word = "faster than you read"
197
+ elif tps >= _READING_TPS * 0.7:
198
+ speed_word = "about reading speed"
199
+ else:
200
+ speed_word = "slower than reading — fine for short tasks"
201
+ return f"~{tps:g} tok/s (likely {lo:g}-{hi:g}) — {speed_word}"
engine/ui_adapter.py CHANGED
@@ -75,10 +75,11 @@ def spec_from_payload(p: dict) -> HardwareSpec:
75
 
76
  # --- Apple Silicon: unified memory, no separate VRAM -------------------
77
  if "mac" in kind or provider == "apple":
 
78
  return HardwareSpec(
79
  os="macos", ram_gb=ram, gpu_vendor="apple", vram_gb=0.0,
80
  is_apple_silicon=True,
81
- gpu_label=f"Apple Silicon (shares your {ram:g} GB of memory)",
82
  form_factor="mac",
83
  )
84
 
 
75
 
76
  # --- Apple Silicon: unified memory, no separate VRAM -------------------
77
  if "mac" in kind or provider == "apple":
78
+ chip = p.get("gpu") or "Apple Silicon" # keep the tier (Pro/Max/Ultra) for bandwidth lookup
79
  return HardwareSpec(
80
  os="macos", ram_gb=ram, gpu_vendor="apple", vram_gb=0.0,
81
  is_apple_silicon=True,
82
+ gpu_label=f"{chip} (shares your {ram:g} GB of memory)",
83
  form_factor="mac",
84
  )
85
 
requirements.txt CHANGED
@@ -12,3 +12,5 @@ kernels>=0.12.0,<0.13 # transformers' own declared range — 0.15.x broke
12
  # mamba-ssm/causal-conv1d here, the build will fail)
13
  accelerate # device placement / efficient loading
14
  einops # required by the kernels-community mamba-ssm kernel
 
 
 
12
  # mamba-ssm/causal-conv1d here, the build will fail)
13
  accelerate # device placement / efficient loading
14
  einops # required by the kernels-community mamba-ssm kernel
15
+ skops # safe loading of the trained speed predictor
16
+ xgboost # the speed predictor's runtime (engine/speed.py)
static/app.js CHANGED
@@ -438,6 +438,25 @@ function render(d) {
438
  ${d.provenance ? `<div class="prov">${d.provenance}</div>` : ""}
439
  </div>` : ""}
440
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
441
  ${opts ? `<div class="section-title">What you can run <span class="sub">real models, biggest to smallest — names link to Hugging Face</span></div>
442
  <div class="opt-grid">${opts}</div>` : ""}
443
 
@@ -472,6 +491,7 @@ function render(d) {
472
  $("#results").firstElementChild.prepend(back);
473
  }
474
  hydrate($("#results"));
 
475
  $("#results").querySelectorAll(".copy-btn").forEach(b => b.addEventListener("click", () => {
476
  navigator.clipboard.writeText(decodeURIComponent(b.dataset.code));
477
  b.textContent = "Copied ✓"; b.classList.add("done");
@@ -480,6 +500,73 @@ function render(d) {
480
  wireAsk();
481
  }
482
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
483
  // ---- Follow-up: the model brick (grounded explainer) ---------------------
484
  function wireAsk() {
485
  const input = $("#ask-input"), send = $("#ask-send");
@@ -572,7 +659,20 @@ function init() {
572
  fillGpu();
573
  $("#find-specs-body").innerHTML = findSpecsText();
574
  detectHardware();
575
- // Pre-filled share/preview links: /?go renders results immediately.
576
- if (new URLSearchParams(location.search).has("go")) check();
 
 
 
 
 
 
 
 
 
 
 
 
 
577
  }
578
  init();
 
438
  ${d.provenance ? `<div class="prov">${d.provenance}</div>` : ""}
439
  </div>` : ""}
440
 
441
+ ${d.speed ? `
442
+ <details class="disc viz-disc" open>
443
+ <summary><span class="ic sum-ic" data-ic="speed"></span>Why this speed? <span class="sub-inline">real benchmark runs, and where you land</span> <span class="chev ic" data-ic="chevron"></span></summary>
444
+ <div class="disc-body">
445
+ <div id="roofline-chart" class="roofline-wrap"></div>
446
+ <p class="viz-caption">
447
+ Every grey dot is a <b>real benchmark run</b> from the
448
+ <a href="https://www.localscore.ai" target="_blank" rel="noopener">LocalScore</a> community database.
449
+ Generation speed tracks <b>memory bandwidth</b> — that's the one number that matters most for local AI
450
+ (<a href="https://kipp.ly/transformer-inference-arithmetic/" target="_blank" rel="noopener">why</a>).
451
+ The dashed line is the theoretical ceiling for <b>${d.speed.model}</b> at your setting;
452
+ your machine is the marked dot at ≈<b>${d.speed.tps} tok/s</b>
453
+ (${d.speed.method === "measured-model"
454
+ ? `predicted by a model trained on these measurements, following IBM's <a href="https://arxiv.org/abs/2410.02425" target="_blank" rel="noopener">LLM-Pilot</a> methodology`
455
+ : `an analytical estimate; a learned predictor trained on these runs — <a href="https://arxiv.org/abs/2410.02425" target="_blank" rel="noopener">LLM-Pilot</a> methodology — takes over once trained`}).
456
+ </p>
457
+ </div>
458
+ </details>` : ""}
459
+
460
  ${opts ? `<div class="section-title">What you can run <span class="sub">real models, biggest to smallest — names link to Hugging Face</span></div>
461
  <div class="opt-grid">${opts}</div>` : ""}
462
 
 
491
  $("#results").firstElementChild.prepend(back);
492
  }
493
  hydrate($("#results"));
494
+ if (d.speed) drawRoofline(d.speed);
495
  $("#results").querySelectorAll(".copy-btn").forEach(b => b.addEventListener("click", () => {
496
  navigator.clipboard.writeText(decodeURIComponent(b.dataset.code));
497
  b.textContent = "Copied ✓"; b.classList.add("done");
 
500
  wireAsk();
501
  }
502
 
503
+ // ---- "Why this speed?" roofline scatter (real LocalScore runs) ------------
504
+ let _rooflinePts = null;
505
+ async function getRooflinePoints() {
506
+ if (_rooflinePts) return _rooflinePts;
507
+ try {
508
+ const r = await fetch("/static/roofline.json");
509
+ _rooflinePts = await r.json();
510
+ } catch (e) { _rooflinePts = { points: [] }; }
511
+ return _rooflinePts;
512
+ }
513
+
514
+ async function drawRoofline(speed) {
515
+ const host = $("#roofline-chart");
516
+ if (!host) return;
517
+ const data = await getRooflinePoints();
518
+ const pts = (data.points || []).filter(p => p.bw > 0 && p.tps > 0.5);
519
+ if (!pts.length && !speed) { host.innerHTML = ""; return; }
520
+
521
+ const W = 720, H = 320, L = 52, R = 16, T = 14, B = 40;
522
+ const xmin = 40, xmax = 2100, ymin = 0.8, ymax = 400;
523
+ const lx = v => L + (Math.log10(v) - Math.log10(xmin)) / (Math.log10(xmax) - Math.log10(xmin)) * (W - L - R);
524
+ const ly = v => H - B - (Math.log10(v) - Math.log10(ymin)) / (Math.log10(ymax) - Math.log10(ymin)) * (H - T - B);
525
+
526
+ let s = `<svg viewBox="0 0 ${W} ${H}" role="img" aria-label="Decode speed vs memory bandwidth, real benchmark runs">`;
527
+ // gridlines + labels
528
+ for (const gx of [50, 100, 200, 400, 800, 1600]) {
529
+ s += `<line x1="${lx(gx)}" y1="${T}" x2="${lx(gx)}" y2="${H - B}" class="rl-grid"/>` +
530
+ `<text x="${lx(gx)}" y="${H - B + 16}" class="rl-tick" text-anchor="middle">${gx}</text>`;
531
+ }
532
+ for (const gy of [1, 3, 10, 30, 100, 300]) {
533
+ s += `<line x1="${L}" y1="${ly(gy)}" x2="${W - R}" y2="${ly(gy)}" class="rl-grid"/>` +
534
+ `<text x="${L - 6}" y="${ly(gy) + 4}" class="rl-tick" text-anchor="end">${gy}</text>`;
535
+ }
536
+ s += `<text x="${(L + W - R) / 2}" y="${H - 6}" class="rl-axis" text-anchor="middle">memory bandwidth (GB/s, log)</text>`;
537
+ s += `<text x="14" y="${(T + H - B) / 2}" class="rl-axis" text-anchor="middle" transform="rotate(-90 14 ${(T + H - B) / 2})">decode tok/s (log)</text>`;
538
+
539
+ // real measurement dots, shaded by model size
540
+ const shade = p => p.params_b <= 2 ? "rl-p1" : (p.params_b <= 9 ? "rl-p8" : "rl-p14");
541
+ for (const p of pts) {
542
+ if (p.bw < xmin || p.tps < ymin) continue;
543
+ s += `<circle cx="${lx(Math.min(p.bw, xmax)).toFixed(1)}" cy="${ly(Math.min(p.tps, ymax)).toFixed(1)}" r="2.6" class="rl-dot ${shade(p)}"><title>${p.accel} — ${p.model}: ${p.tps} tok/s</title></circle>`;
544
+ }
545
+
546
+ if (speed) {
547
+ // theoretical ceiling for the recommended model: tps = 0.6 * bw / bytes
548
+ const bytes = speed.bytes_gb || 5;
549
+ const x1 = xmin, x2 = xmax;
550
+ const f = bw => Math.min(Math.max(0.6 * bw / bytes, ymin), ymax);
551
+ s += `<line x1="${lx(x1)}" y1="${ly(f(x1))}" x2="${lx(x2)}" y2="${ly(f(x2))}" class="rl-roof"/>`;
552
+ // your machine
553
+ const ux = lx(Math.min(Math.max(speed.eff_bw || speed.bw, xmin), xmax));
554
+ const uy = ly(Math.min(Math.max(speed.tps, ymin), ymax));
555
+ s += `<line x1="${ux}" y1="${ly(Math.min(Math.max(speed.lo, ymin), ymax))}" x2="${ux}" y2="${ly(Math.min(Math.max(speed.hi, ymin), ymax))}" class="rl-band"/>`;
556
+ s += `<circle cx="${ux}" cy="${uy}" r="6" class="rl-you"/>` +
557
+ `<text x="${ux + 10}" y="${uy + 4}" class="rl-you-label">you ≈${speed.tps} tok/s</text>`;
558
+ }
559
+ s += `</svg>
560
+ <div class="rl-legend">
561
+ <span class="item"><span class="sw rl-p1-sw"></span>~1B model runs</span>
562
+ <span class="item"><span class="sw rl-p8-sw"></span>~8B runs</span>
563
+ <span class="item"><span class="sw rl-p14-sw"></span>~14B runs</span>
564
+ <span class="item"><span class="sw rl-roof-sw"></span>theoretical ceiling (your pick)</span>
565
+ <span class="item"><span class="sw rl-you-sw"></span>your machine</span>
566
+ </div>`;
567
+ host.innerHTML = s;
568
+ }
569
+
570
  // ---- Follow-up: the model brick (grounded explainer) ---------------------
571
  function wireAsk() {
572
  const input = $("#ask-input"), send = $("#ask-send");
 
659
  fillGpu();
660
  $("#find-specs-body").innerHTML = findSpecsText();
661
  detectHardware();
662
+ // Pre-filled share/preview links: ?go renders immediately; optional
663
+ // ?gpu=NVIDIA|RTX 3060 (12 GB)&ram=16&uc=chat pre-select a profile.
664
+ const q = new URLSearchParams(location.search);
665
+ if (q.has("gpu")) {
666
+ const [vendor, label] = (q.get("gpu") || "").split("|");
667
+ if (vendor) { state.provider = vendor.toLowerCase(); setActive("#provider-seg", state.provider); fillGpu(); }
668
+ if (label) { const sel = $("#gpu"); [...sel.options].forEach(o => { if (o.value === label) sel.value = label; }); }
669
+ }
670
+ if (q.has("ram")) $("#ram").value = `${q.get("ram")} GB`;
671
+ if (q.has("uc")) {
672
+ state.usecases = [q.get("uc")];
673
+ document.querySelectorAll(".uc-pill").forEach(p =>
674
+ p.classList.toggle("active", p.dataset.uc === q.get("uc")));
675
+ }
676
+ if (q.has("go")) check();
677
  }
678
  init();
static/roofline.json ADDED
The diff for this file is too large to render. See raw diff
 
static/style.css CHANGED
@@ -377,6 +377,29 @@ details.disc > summary:hover { color: var(--text-primary); }
377
  .copy-btn:hover { color: var(--text-primary); border-color: var(--border-hi); }
378
  .copy-btn.done { color: var(--ok); border-color: var(--ok); }
379
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
380
  /* Multi-goal overview */
381
  .goal-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(230px, 1fr)); gap: var(--s-3); }
382
  .goal-card {
 
377
  .copy-btn:hover { color: var(--text-primary); border-color: var(--border-hi); }
378
  .copy-btn.done { color: var(--ok); border-color: var(--ok); }
379
 
380
+ /* "Why this speed?" roofline visualizer */
381
+ .viz-disc { margin-top: var(--s-5); background: var(--bg-raised); }
382
+ .viz-disc > summary .sub-inline { font-weight: 500; color: var(--text-muted); font-size: 12.5px; margin-left: 4px; }
383
+ .roofline-wrap svg { width: 100%; height: auto; display: block; }
384
+ .rl-grid { stroke: var(--border); stroke-width: 1; opacity: .5; }
385
+ .rl-tick { fill: var(--text-muted); font: 500 10.5px var(--font-body); }
386
+ .rl-axis { fill: var(--text-secondary); font: 600 11.5px var(--font-body); }
387
+ .rl-dot { opacity: .55; }
388
+ .rl-p1 { fill: #5B6472; }
389
+ .rl-p8 { fill: #8B93A3; }
390
+ .rl-p14 { fill: #B9C0CC; }
391
+ .rl-roof { stroke: var(--warn); stroke-width: 1.8; stroke-dasharray: 6 5; opacity: .9; }
392
+ .rl-band { stroke: var(--accent); stroke-width: 5; opacity: .35; stroke-linecap: round; }
393
+ .rl-you { fill: var(--accent); stroke: #fff; stroke-width: 1.5; }
394
+ .rl-you-label { fill: var(--text-primary); font: 700 12px var(--font-head); }
395
+ .rl-legend { display: flex; flex-wrap: wrap; gap: var(--s-4); margin-top: var(--s-2); font-size: 12px; color: var(--text-muted); }
396
+ .rl-legend .item { display: inline-flex; align-items: center; gap: 6px; }
397
+ .rl-legend .sw { width: 10px; height: 10px; border-radius: 50%; }
398
+ .rl-p1-sw { background: #5B6472; } .rl-p8-sw { background: #8B93A3; } .rl-p14-sw { background: #B9C0CC; }
399
+ .rl-roof-sw { background: var(--warn); border-radius: 2px; height: 3px; width: 14px; }
400
+ .rl-you-sw { background: var(--accent); }
401
+ .viz-caption { font-size: 13px; color: var(--text-secondary); line-height: 1.6; margin-top: var(--s-3); }
402
+
403
  /* Multi-goal overview */
404
  .goal-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(230px, 1fr)); gap: var(--s-3); }
405
  .goal-card {