Upload 41 files
Browse files- MODEL_CARD.md +41 -0
- README.md +84 -3
- artifacts/benchmarks/accuracy_curve.html +0 -0
- artifacts/benchmarks/benchmark_dashboard.html +7 -0
- artifacts/benchmarks/gate_benchmarks.csv +4 -0
- artifacts/benchmarks/learned_gates.html +0 -0
- artifacts/benchmarks/loss_curve.html +0 -0
- artifacts/benchmarks/throughput_curve.html +0 -0
- artifacts/benchmarks_smoke/accuracy_curve.html +0 -0
- artifacts/benchmarks_smoke/benchmark_dashboard.html +7 -0
- artifacts/benchmarks_smoke/gate_benchmarks.csv +2 -0
- artifacts/benchmarks_smoke/learned_gates.html +0 -0
- artifacts/benchmarks_smoke/loss_curve.html +0 -0
- artifacts/benchmarks_smoke/throughput_curve.html +0 -0
- artifacts/runtime_gui/accuracy_curve.html +0 -0
- artifacts/runtime_gui/benchmark_dashboard.html +7 -0
- artifacts/runtime_gui/gate_benchmarks.csv +5 -0
- artifacts/runtime_gui/learned_gates.html +0 -0
- artifacts/runtime_gui/loss_curve.html +0 -0
- artifacts/runtime_gui/throughput_curve.html +0 -0
- pyproject.toml +33 -0
- src/openpeer_ntk_trainer.egg-info/PKG-INFO +109 -0
- src/openpeer_ntk_trainer.egg-info/SOURCES.txt +15 -0
- src/openpeer_ntk_trainer.egg-info/dependency_links.txt +1 -0
- src/openpeer_ntk_trainer.egg-info/entry_points.txt +2 -0
- src/openpeer_ntk_trainer.egg-info/requires.txt +23 -0
- src/openpeer_ntk_trainer.egg-info/top_level.txt +1 -0
- src/openpeer_trainer/__init__.py +4 -0
- src/openpeer_trainer/__pycache__/__init__.cpython-311.pyc +0 -0
- src/openpeer_trainer/__pycache__/benchmarks.cpython-311.pyc +0 -0
- src/openpeer_trainer/__pycache__/cli.cpython-311.pyc +0 -0
- src/openpeer_trainer/__pycache__/controller.cpython-311.pyc +0 -0
- src/openpeer_trainer/__pycache__/gui.cpython-311.pyc +0 -0
- src/openpeer_trainer/__pycache__/hardware.cpython-311.pyc +0 -0
- src/openpeer_trainer/__pycache__/smoke.cpython-311.pyc +0 -0
- src/openpeer_trainer/benchmarks.py +344 -0
- src/openpeer_trainer/cli.py +90 -0
- src/openpeer_trainer/controller.py +64 -0
- src/openpeer_trainer/gui.py +96 -0
- src/openpeer_trainer/hardware.py +95 -0
- src/openpeer_trainer/smoke.py +150 -0
MODEL_CARD.md
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OpenPeerLLM NTK Trainer
|
| 2 |
+
|
| 3 |
+
## Model Overview
|
| 4 |
+
|
| 5 |
+
This package provides a LoRA-free training workflow for OpenPeerLLM-style causal language models by fitting signed log-gate controllers with ntkmirror. It also includes a tinygrad-based gate-controller smoke demo and a benchmark suite that generates charts for quick inspection.
|
| 6 |
+
|
| 7 |
+
## Authors
|
| 8 |
+
|
| 9 |
+
* Andrew Magdy Kamal Nassief
|
| 10 |
+
* Riemann Computing Inc.
|
| 11 |
+
* OpenPeer AI
|
| 12 |
+
|
| 13 |
+
## Intended Use
|
| 14 |
+
|
| 15 |
+
* Fit sparse forward-pass controllers on top of frozen Hugging Face causal language models.
|
| 16 |
+
* Run a low-cost local demo that validates gate training logic with tinygrad.
|
| 17 |
+
* Generate benchmark artifacts and charts for performance comparisons.
|
| 18 |
+
* Stop the demo run early once the requested accuracy target is reached.
|
| 19 |
+
|
| 20 |
+
## Dependencies
|
| 21 |
+
|
| 22 |
+
* ntkmirror: https://github.com/leochlon/ntkmirror
|
| 23 |
+
* Tinygrad: https://github.com/tinygrad/tinygrad
|
| 24 |
+
* Optional charting: OpenBB
|
| 25 |
+
|
| 26 |
+
## Benchmark Outputs
|
| 27 |
+
|
| 28 |
+
The benchmark runner records:
|
| 29 |
+
|
| 30 |
+
* epoch
|
| 31 |
+
* training steps
|
| 32 |
+
* wall-clock time
|
| 33 |
+
* memory usage
|
| 34 |
+
* process and thread counts
|
| 35 |
+
* samples per second
|
| 36 |
+
* initial and final accuracy
|
| 37 |
+
* final loss
|
| 38 |
+
* predictability score
|
| 39 |
+
* learned gate scales
|
| 40 |
+
|
| 41 |
+
Charts are written as HTML. The benchmark command writes a combined dashboard HTML plus companion charts, prefers OpenBB chart rendering when the optional chart extra is installed, and otherwise falls back to Plotly so the workflow stays runnable.
|
README.md
CHANGED
|
@@ -1,3 +1,84 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OpenPeer NTK Trainer
|
| 2 |
+
|
| 3 |
+
This workspace contains three related paths:
|
| 4 |
+
|
| 5 |
+
* A real fine-tuning path that uses [ntkmirror](https://github.com/leochlon/ntkmirror) to fit signed log-gate controllers on a frozen Hugging Face causal LLM, and
|
| 6 |
+
* A tinygrad-backed smoke demo that trains only gate parameters on a synthetic task so the controller idea can be validated locally and cheaply.
|
| 7 |
+
* A benchmark pipeline that records accuracy, loss, memory, process counts, predictability, and throughput, then renders a combined dashboard plus OpenBB-backed charts.
|
| 8 |
+
* A runtime GUI for live benchmark runs with current hardware specs baked into the view.
|
| 9 |
+
|
| 10 |
+
The OpenPeerLLM model card currently points at `OpenPeerAI/OpenPeerLLM`, but that repository card is not a standard inference-ready Hugging Face example. The trainer therefore targets any causal LM that `transformers` can load, with `OpenPeerAI/OpenPeerLLM` as the primary model ID and a smaller fallback for local demos.
|
| 11 |
+
|
| 12 |
+
## Install
|
| 13 |
+
|
| 14 |
+
```powershell
|
| 15 |
+
pip install -e .
|
| 16 |
+
pip install tinygrad
|
| 17 |
+
pip install git+https://github.com/leochlon/ntkmirror.git
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
If you only want the local demo, install the demo extra instead:
|
| 21 |
+
|
| 22 |
+
```powershell
|
| 23 |
+
pip install -e ".[demo]"
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
To enable OpenBB-backed chart generation for benchmarks, install the chart extra too:
|
| 27 |
+
|
| 28 |
+
```powershell
|
| 29 |
+
pip install -e ".[demo,charts]"
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
To enable the runtime GUI, install the GUI extra:
|
| 33 |
+
|
| 34 |
+
```powershell
|
| 35 |
+
pip install -e ".[gui]"
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## Run the tinygrad demo
|
| 39 |
+
|
| 40 |
+
```powershell
|
| 41 |
+
python -m openpeer_trainer.cli demo --steps 100 --target-accuracy 0.99
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
The demo stops early as soon as it reaches the requested accuracy target.
|
| 45 |
+
|
| 46 |
+
## Run benchmarks and charts
|
| 47 |
+
|
| 48 |
+
```powershell
|
| 49 |
+
python -m openpeer_trainer.cli bench --steps 10 25 50 --target-accuracy 0.99
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
The benchmark runner writes a CSV plus HTML charts under `artifacts/benchmarks/`. The main output is `benchmark_dashboard.html`, a multi-panel dashboard showing memory, processes, learned gates, loss, predictability, accuracy, training steps, time, and epoch in actual seconds. If the OpenBB charting extension is installed, the companion charts are rendered through OpenBB; otherwise the script falls back to Plotly with the same data.
|
| 53 |
+
|
| 54 |
+
## Launch the runtime GUI
|
| 55 |
+
|
| 56 |
+
```powershell
|
| 57 |
+
python -m openpeer_trainer.cli gui
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
The GUI shows the same dashboard, a live benchmark runner, and a hardware-spec table for this computer.
|
| 61 |
+
|
| 62 |
+
## Fit an ntkmirror controller
|
| 63 |
+
|
| 64 |
+
```powershell
|
| 65 |
+
python -m openpeer_trainer.cli fit --model OpenPeerAI/OpenPeerLLM --train-jsonl train.jsonl --out runs/openpeer_controller.pt
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## JSONL format
|
| 69 |
+
|
| 70 |
+
Preferred schema:
|
| 71 |
+
|
| 72 |
+
```jsonl
|
| 73 |
+
{"prompt":"Question: 14 + 27 = ?\nAnswer:","completion":" 41"}
|
| 74 |
+
{"prompt":"Question: 36 + 18 = ?\nAnswer:","completion":" 54"}
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
The trainer also accepts `instruction`/`response`, `question`/`answer`, or `text` records when the underlying ntkmirror loader supports them.
|
| 78 |
+
|
| 79 |
+
## References
|
| 80 |
+
|
| 81 |
+
* OpenPeer AI / Riemann Computing Inc. / Andrew Magdy Kamal Nassief
|
| 82 |
+
* ntkmirror: https://github.com/leochlon/ntkmirror
|
| 83 |
+
* Tinygrad: https://github.com/tinygrad/tinygrad
|
| 84 |
+
|
artifacts/benchmarks/accuracy_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/benchmarks/benchmark_dashboard.html
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<html>
|
| 2 |
+
<head><meta charset="utf-8" /></head>
|
| 3 |
+
<body>
|
| 4 |
+
<div> <script>window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
|
| 5 |
+
<script charset="utf-8" src="https://cdn.plot.ly/plotly-3.5.0.min.js" integrity="sha256-fHbNLP+GlIXN+efbQec78UkemUz3NJp7UmfGxC1tNxs=" crossorigin="anonymous"></script> <div id="03dcb642-6332-4448-9c66-e01c57110529" class="plotly-graph-div" style="height:1950px; width:2000px;"></div> <script> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("03dcb642-6332-4448-9c66-e01c57110529")) { Plotly.newPlot( "03dcb642-6332-4448-9c66-e01c57110529", [{"delta":{"reference":0.984375,"relative":false},"mode":"number+delta","number":{"font":{"color":"#22c55e","size":24},"valueformat":".1%"},"title":{"font":{"color":"#e2e8f0","size":14},"text":"Accuracy"},"value":0.9921875,"type":"indicator","domain":{"x":[0.0,0.2733333333333333],"y":[0.9183999999999999,0.9999999999999999]}},{"delta":{"reference":92.02626354294853,"relative":false},"mode":"number+delta","number":{"font":{"color":"#38bdf8","size":24},"valueformat":".2f"},"title":{"font":{"color":"#e2e8f0","size":14},"text":"Predictability"},"value":93.08860883605665,"type":"indicator","domain":{"x":[0.3333333333333333,0.6066666666666667],"y":[0.9183999999999999,0.9999999999999999]}},{"delta":{"reference":126.734375,"relative":false},"mode":"number+delta","number":{"font":{"color":"#f97316","size":24},"valueformat":".1f"},"title":{"font":{"color":"#e2e8f0","size":14},"text":"RSS MB"},"value":127.00390625,"type":"indicator","domain":{"x":[0.6666666666666666,0.94],"y":[0.9183999999999999,0.9999999999999999]}},{"line":{"color":"#22c55e","width":3},"mode":"lines+markers","name":"Accuracy","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"f8","bdata":"AAAAAACA7z8AAAAAAMDvPwAAAAAAwO8\u002f"},"type":"scatter","xaxis":"x","yaxis":"y"},{"line":{"color":"#38bdf8","width":3},"mode":"lines+markers","name":"Predictability","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"f8","bdata":"oIJITa4BV0D\u002fP2XEq0VXQP8\u002fZcSrRVdA"},"type":"scatter","xaxis":"x","yaxis":"y"},{"line":{"color":"#f97316","width":3},"mode":"lines+markers","name":"Loss","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"f8","bdata":"AAAAgMpFtT8AAADAlFWyPwAAAMCUVbI\u002f"},"type":"scatter","xaxis":"x2","yaxis":"y2"},{"line":{"color":"#a855f7","width":3},"mode":"lines+markers","name":"Wall Time (s)","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"f8","bdata":"AACIa7WnBkAAACCn17zyPwAA4Da0d+g\u002f"},"type":"scatter","xaxis":"x3","yaxis":"y3"},{"marker":{"color":"#f97316"},"name":"Memory MB","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"f8","bdata":"AAAAAACvX0AAAAAAQLpfQAAAAABAwF9A"},"type":"bar","xaxis":"x3","yaxis":"y4"},{"line":{"color":"#a855f7","width":3},"mode":"lines+markers","name":"Wall Time (s)","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"f8","bdata":"AACIa7WnBkAAACCn17zyPwAA4Da0d+g\u002f"},"type":"scatter","xaxis":"x4","yaxis":"y5"},{"line":{"color":"#14b8a6","width":3},"mode":"lines+markers","name":"Training Steps","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"i1","bdata":"ChIS"},"type":"scatter","xaxis":"x5","yaxis":"y6"},{"marker":{"color":"#38bdf8"},"name":"Samples\u002fsec","showlegend":false,"x":{"dtype":"i1","bdata":"AQID"},"y":{"dtype":"f8","bdata":"Pn\u002fX\u002f+s\u002fbEBFwH+o7ViVQIBK7qYgWbBA"},"type":"bar","xaxis":"x6","yaxis":"y7"},{"marker":{"color":"#a855f7"},"name":"Gate Scale","showlegend":false,"x":["c0","c1","c2","c3","c4","c5","c6","c7","c8","c9","c10","c11"],"y":{"dtype":"f8","bdata":"AAAAIN0N6z8AAAAg+1HlPwAAAEA9Ruc\u002fAAAAYIvn7j8AAABAjhDoPwAAAGA4BvU\u002fAAAAYO4N8D8AAABAMdvzPwAAAGAEdvI\u002fAAAAQPz89j8AAADA3O35PwAAAMDFnPo\u002f"},"type":"bar","xaxis":"x7","yaxis":"y8"},{"colorscale":[[0.0,"rgb(103,0,31)"],[0.1,"rgb(178,24,43)"],[0.2,"rgb(214,96,77)"],[0.3,"rgb(244,165,130)"],[0.4,"rgb(253,219,199)"],[0.5,"rgb(247,247,247)"],[0.6,"rgb(209,229,240)"],[0.7,"rgb(146,197,222)"],[0.8,"rgb(67,147,195)"],[0.9,"rgb(33,102,172)"],[1.0,"rgb(5,48,97)"]],"showscale":false,"x":["final_accuracy","predictability_score","final_loss","wall_time_sec","memory_rss_mb","samples_per_sec"],"y":["final_accuracy","predictability_score","final_loss","wall_time_sec","memory_rss_mb","samples_per_sec"],"z":{"dtype":"f8","bdata":"AAAAAAAA8D\u002fi\u002f\u002f\u002f\u002f\u002f\u002f\u002fvPwAAAAAAAPC\u002fdMdnsrRx778rNJdbTBDuP\u002fj7UN4WHec\u002f4v\u002f\u002f\u002f\u002f\u002f\u002f7z8AAAAAAADwPwAAAAAAAPC\u002fhsdnsrRx7781NJdbTBDuP+r7UN4WHec\u002fAAAAAAAA8L8AAAAAAADwvwAAAAAAAPA\u002fZcdnsrRx7z8dNJdbTBDuv+77UN4WHee\u002fdMdnsrRx77+Gx2eytHHvv2XHZ7K0ce8\u002fAAAAAAAA8D90pSJFaZPvv2QaMFGr0eq\u002fKzSXW0wQ7j81NJdbTBDuPx00l1tMEO6\u002fdKUiRWmT778AAAAAAADwP3cDtzPjS+0\u002f+PtQ3hYd5z\u002fq+1DeFh3nP+77UN4WHee\u002fZBowUavR6r93A7cz40vtPwAAAAAAAPA\u002f","shape":"6, 6"},"zmid":0,"type":"heatmap","xaxis":"x8","yaxis":"y9"},{"marker":{"color":{"dtype":"f8","bdata":"AAAAAACvX0AAAAAAQLpfQAAAAABAwF9A"},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"showscale":true,"size":14},"mode":"markers+text","name":"Accuracy\u002fLoss","showlegend":false,"text":["1","2","3"],"textposition":"top center","x":{"dtype":"f8","bdata":"AAAAgMpFtT8AAADAlFWyPwAAAMCUVbI\u002f"},"y":{"dtype":"f8","bdata":"AAAAAACA7z8AAAAAAMDvPwAAAAAAwO8\u002f"},"type":"scatter","xaxis":"x9","yaxis":"y10"},{"cells":{"align":"left","fill":{"color":"#111827"},"font":{"color":"#e2e8f0","size":12},"height":24,"values":[["Hostname","Platform","CPU","Physical Cores","Logical Cores","Memory Total (GB)","Memory Available (GB)","Disk Total (GB)","Disk Free (GB)","Python","CUDA","CUDA Device"],["MSI","Windows 10","Intel64 Family 6 Model 158 Stepping 13, GenuineIntel","6","12","63.85","41.58","864.58","397.57","3.11.5","no","cpu"]]},"header":{"align":"left","fill":{"color":"#0f172a"},"font":{"color":"#e2e8f0","size":14},"height":28,"values":["\u003cb\u003eMetric\u003c\u002fb\u003e","\u003cb\u003eValue\u003c\u002fb\u003e"]},"type":"table","domain":{"x":[0.0,0.94],"y":[0.0,0.16319999999999998]}}], {"template":{"data":{"barpolar":[{"marker":{"line":{"color":"rgb(17,17,17)","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"bar":[{"error_x":{"color":"#f2f5fa"},"error_y":{"color":"#f2f5fa"},"marker":{"line":{"color":"rgb(17,17,17)","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"carpet":[{"aaxis":{"endlinecolor":"#A2B1C6","gridcolor":"#506784","linecolor":"#506784","minorgridcolor":"#506784","startlinecolor":"#A2B1C6"},"baxis":{"endlinecolor":"#A2B1C6","gridcolor":"#506784","linecolor":"#506784","minorgridcolor":"#506784","startlinecolor":"#A2B1C6"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"line":{"color":"#283442"}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scattermap":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermap"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatter":[{"marker":{"line":{"color":"#283442"}},"type":"scatter"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#506784"},"line":{"color":"rgb(17,17,17)"}},"header":{"fill":{"color":"#2a3f5f"},"line":{"color":"rgb(17,17,17)"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#f2f5fa","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#f2f5fa"},"geo":{"bgcolor":"rgb(17,17,17)","lakecolor":"rgb(17,17,17)","landcolor":"rgb(17,17,17)","showlakes":true,"showland":true,"subunitcolor":"#506784"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"dark"},"paper_bgcolor":"rgb(17,17,17)","plot_bgcolor":"rgb(17,17,17)","polar":{"angularaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"bgcolor":"rgb(17,17,17)","radialaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"},"yaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"},"zaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"}},"shapedefaults":{"line":{"color":"#f2f5fa"}},"sliderdefaults":{"bgcolor":"#C8D4E3","bordercolor":"rgb(17,17,17)","borderwidth":1,"tickwidth":0},"ternary":{"aaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"baxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"bgcolor":"rgb(17,17,17)","caxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""}},"title":{"x":0.05},"updatemenudefaults":{"bgcolor":"#506784","borderwidth":0},"xaxis":{"automargin":true,"gridcolor":"#283442","linecolor":"#506784","ticks":"","title":{"standoff":15},"zerolinecolor":"#283442","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"#283442","linecolor":"#506784","ticks":"","title":{"standoff":15},"zerolinecolor":"#283442","zerolinewidth":2}}},"xaxis":{"anchor":"y","domain":[0.0,0.2733333333333333],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis":{"anchor":"x","domain":[0.6888,0.8383999999999999]},"xaxis2":{"anchor":"y2","domain":[0.3333333333333333,0.6066666666666667],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis2":{"anchor":"x2","domain":[0.6888,0.8383999999999999]},"xaxis3":{"anchor":"y3","domain":[0.6666666666666666,0.94],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis3":{"anchor":"x3","domain":[0.6888,0.8383999999999999],"title":{"text":"Seconds"}},"yaxis4":{"anchor":"x3","overlaying":"y3","side":"right","title":{"text":"Memory MB"}},"xaxis4":{"anchor":"y5","domain":[0.0,0.2733333333333333],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis5":{"anchor":"x4","domain":[0.4728,0.6088]},"xaxis5":{"anchor":"y6","domain":[0.3333333333333333,0.6066666666666667],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis6":{"anchor":"x5","domain":[0.4728,0.6088]},"xaxis6":{"anchor":"y7","domain":[0.6666666666666666,0.94],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis7":{"anchor":"x6","domain":[0.4728,0.6088]},"xaxis7":{"anchor":"y8","domain":[0.0,0.2733333333333333]},"yaxis8":{"anchor":"x7","domain":[0.24319999999999997,0.3927999999999999]},"xaxis8":{"anchor":"y9","domain":[0.3333333333333333,0.6066666666666667]},"yaxis9":{"anchor":"x8","domain":[0.24319999999999997,0.3927999999999999]},"xaxis9":{"anchor":"y10","domain":[0.6666666666666666,0.94]},"yaxis10":{"anchor":"x9","domain":[0.24319999999999997,0.3927999999999999]},"annotations":[{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Final Accuracy","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.9999999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Predictability","x":0.47,"xanchor":"center","xref":"paper","y":0.9999999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Peak Memory","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.9999999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Accuracy vs Epoch","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.8383999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Loss vs Epoch","x":0.47,"xanchor":"center","xref":"paper","y":0.8383999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Memory and Processes","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.8383999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Training Time","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.6088,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Training Steps","x":0.47,"xanchor":"center","xref":"paper","y":0.6088,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Throughput","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.6088,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Learned Gates","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.3927999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Metric Correlation","x":0.47,"xanchor":"center","xref":"paper","y":0.3927999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Accuracy vs Loss","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.3927999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Hardware Specs","x":0.47,"xanchor":"center","xref":"paper","y":0.16319999999999998,"yanchor":"bottom","yref":"paper","yshift":10}],"title":{"text":"OpenPeer NTK Trainer Benchmark Dashboard","x":0.02},"font":{"color":"#e2e8f0","size":12},"margin":{"l":30,"r":30,"t":90,"b":30},"height":1950,"width":2000,"paper_bgcolor":"#0f172a","plot_bgcolor":"#0f172a","showlegend":false}, {"responsive": true} ) }; </script> </div>
|
| 6 |
+
</body>
|
| 7 |
+
</html>
|
artifacts/benchmarks/gate_benchmarks.csv
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
epoch,steps,wall_time_sec,samples_per_sec,initial_accuracy,final_accuracy,final_loss,predictability_score,memory_rss_mb,child_processes,thread_count,reached_target,trained_steps,target_accuracy
|
| 2 |
+
1,10,2.831889000022784,225.99755851830736,0.8984375,0.984375,0.08309617638587952,92.02626354294853,126.734375,0.0,17.0,0,10,0.99
|
| 3 |
+
2,18,1.1711041000671685,1366.2320880852797,0.8984375,0.9921875,0.07161836326122284,93.08860883605665,126.91015625,0.0,17.0,1,18,0.99
|
| 4 |
+
3,18,0.7646123000886291,4185.127547162236,0.8984375,0.9921875,0.07161836326122284,93.08860883605665,127.00390625,0.0,17.0,1,18,0.99
|
artifacts/benchmarks/learned_gates.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/benchmarks/loss_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/benchmarks/throughput_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/benchmarks_smoke/accuracy_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/benchmarks_smoke/benchmark_dashboard.html
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<html>
|
| 2 |
+
<head><meta charset="utf-8" /></head>
|
| 3 |
+
<body>
|
| 4 |
+
<div> <script>window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
|
| 5 |
+
<script charset="utf-8" src="https://cdn.plot.ly/plotly-3.5.0.min.js" integrity="sha256-fHbNLP+GlIXN+efbQec78UkemUz3NJp7UmfGxC1tNxs=" crossorigin="anonymous"></script> <div id="86166dfc-e861-4b47-bd1f-68aa768a2a62" class="plotly-graph-div" style="height:1800px; width:1900px;"></div> <script> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("86166dfc-e861-4b47-bd1f-68aa768a2a62")) { Plotly.newPlot( "86166dfc-e861-4b47-bd1f-68aa768a2a62", [{"delta":{"reference":0.35546875,"relative":false},"mode":"number+delta","number":{"font":{"color":"#22c55e","size":28},"valueformat":".1%"},"title":{"font":{"color":"#e2e8f0","size":16},"text":"Accuracy"},"value":0.35546875,"type":"indicator","domain":{"x":[0.0,0.27999999999999997],"y":[0.8175,1.0]}},{"delta":{"reference":0.00011213638080533762,"relative":false},"mode":"number+delta","number":{"font":{"color":"#38bdf8","size":28},"valueformat":".1f"},"title":{"font":{"color":"#e2e8f0","size":16},"text":"Predictability"},"value":0.00011213638080533762,"type":"indicator","domain":{"x":[0.32999999999999996,0.6099999999999999],"y":[0.8175,1.0]}},{"delta":{"reference":165.609375,"relative":false},"mode":"number+delta","number":{"font":{"color":"#f97316","size":28},"valueformat":".1f"},"title":{"font":{"color":"#e2e8f0","size":16},"text":"RSS MB"},"value":165.609375,"type":"indicator","domain":{"x":[0.6599999999999999,0.94],"y":[0.8175,1.0]}},{"line":{"color":"#22c55e","width":3},"mode":"lines+markers","name":"Accuracy","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"AAAAAADA1j8="},"type":"scatter","xaxis":"x","yaxis":"y"},{"line":{"color":"#38bdf8","width":3},"mode":"lines+markers","name":"Predictability","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"DWZeWlhlHT8="},"type":"scatter","xaxis":"x","yaxis":"y"},{"line":{"color":"#f97316","width":3},"mode":"lines+markers","name":"Loss","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"AAAA4ORmK0A="},"type":"scatter","xaxis":"x2","yaxis":"y2"},{"line":{"color":"#a855f7","width":3},"mode":"lines+markers","name":"Wall Time","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"AAD6tIqSJ0A="},"type":"scatter","xaxis":"x3","yaxis":"y3"},{"marker":{"color":"#f97316"},"name":"Memory MB","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"AAAAAICzZEA="},"type":"bar","xaxis":"x3","yaxis":"y4"},{"line":{"color":"#a855f7","width":3},"mode":"lines+markers","name":"Time","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"AAD6tIqSJ0A="},"type":"scatter","xaxis":"x4","yaxis":"y5"},{"line":{"color":"#14b8a6","width":3},"mode":"lines+markers","name":"Training Steps","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"AAAAAAAAFEA="},"type":"scatter","xaxis":"x5","yaxis":"y6"},{"marker":{"color":"#38bdf8"},"name":"Samples\u002fsec","x":{"dtype":"f8","bdata":"AAAAAAAA8D8="},"y":{"dtype":"f8","bdata":"2DeVLX4mG0A="},"type":"bar","xaxis":"x6","yaxis":"y7"},{"marker":{"color":"#a855f7"},"name":"Gate Scale","x":["c0","c1","c2","c3","c4","c5","c6","c7","c8","c9","c10","c11","c12","c13","c14","c15","c16","c17","c18","c19","c20","c21","c22","c23"],"y":{"dtype":"f8","bdata":"AAAAwKDt6D8AAAAARDn0PwAAAIC9H\u002fQ\u002fAAAAwFLg8z8AAACAUhDpPwAAAMCiCOk\u002fAAAA4JJh6T8AAABAGIXwPwAAAOBoOOk\u002fAAAAoHpE6T8AAAAAvXvpPwAAAMDfaek\u002fAAAAIKRp6T8AAABApQbpPwAAAMAO++k\u002fAAAA4Pej6T8AAAAghxPpPwAAAMBg8fA\u002fAAAAgEQp6T8AAAAgvsHzPwAAAKBYIuk\u002fAAAAYAi\u002f8z8AAABAJD\u002fpPwAAAMBp7\u002fM\u002f"},"type":"bar","xaxis":"x7","yaxis":"y8"},{"colorscale":[[0.0,"rgb(103,0,31)"],[0.1,"rgb(178,24,43)"],[0.2,"rgb(214,96,77)"],[0.3,"rgb(244,165,130)"],[0.4,"rgb(253,219,199)"],[0.5,"rgb(247,247,247)"],[0.6,"rgb(209,229,240)"],[0.7,"rgb(146,197,222)"],[0.8,"rgb(67,147,195)"],[0.9,"rgb(33,102,172)"],[1.0,"rgb(5,48,97)"]],"showscale":false,"x":["final_accuracy","predictability_score","final_loss","wall_time_sec","memory_rss_mb","samples_per_sec"],"y":["final_accuracy","predictability_score","final_loss","wall_time_sec","memory_rss_mb","samples_per_sec"],"z":{"dtype":"f8","bdata":"AAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002fAAAAAAAA+H8AAAAAAAD4fwAAAAAAAPh\u002f","shape":"6, 6"},"zmid":0,"type":"heatmap","xaxis":"x8","yaxis":"y9"},{"marker":{"color":{"dtype":"f8","bdata":"AAAAAICzZEA="},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"showscale":true,"size":14},"mode":"markers+text","name":"Accuracy\u002fLoss","text":["1.0"],"textposition":"top center","x":{"dtype":"f8","bdata":"AAAA4ORmK0A="},"y":{"dtype":"f8","bdata":"AAAAAADA1j8="},"type":"scatter","xaxis":"x9","yaxis":"y10"}], {"template":{"data":{"barpolar":[{"marker":{"line":{"color":"rgb(17,17,17)","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"bar":[{"error_x":{"color":"#f2f5fa"},"error_y":{"color":"#f2f5fa"},"marker":{"line":{"color":"rgb(17,17,17)","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"carpet":[{"aaxis":{"endlinecolor":"#A2B1C6","gridcolor":"#506784","linecolor":"#506784","minorgridcolor":"#506784","startlinecolor":"#A2B1C6"},"baxis":{"endlinecolor":"#A2B1C6","gridcolor":"#506784","linecolor":"#506784","minorgridcolor":"#506784","startlinecolor":"#A2B1C6"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"line":{"color":"#283442"}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scattermap":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermap"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatter":[{"marker":{"line":{"color":"#283442"}},"type":"scatter"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#506784"},"line":{"color":"rgb(17,17,17)"}},"header":{"fill":{"color":"#2a3f5f"},"line":{"color":"rgb(17,17,17)"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#f2f5fa","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#f2f5fa"},"geo":{"bgcolor":"rgb(17,17,17)","lakecolor":"rgb(17,17,17)","landcolor":"rgb(17,17,17)","showlakes":true,"showland":true,"subunitcolor":"#506784"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"dark"},"paper_bgcolor":"rgb(17,17,17)","plot_bgcolor":"rgb(17,17,17)","polar":{"angularaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"bgcolor":"rgb(17,17,17)","radialaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"},"yaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"},"zaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"}},"shapedefaults":{"line":{"color":"#f2f5fa"}},"sliderdefaults":{"bgcolor":"#C8D4E3","bordercolor":"rgb(17,17,17)","borderwidth":1,"tickwidth":0},"ternary":{"aaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"baxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"bgcolor":"rgb(17,17,17)","caxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""}},"title":{"x":0.05},"updatemenudefaults":{"bgcolor":"#506784","borderwidth":0},"xaxis":{"automargin":true,"gridcolor":"#283442","linecolor":"#506784","ticks":"","title":{"standoff":15},"zerolinecolor":"#283442","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"#283442","linecolor":"#506784","ticks":"","title":{"standoff":15},"zerolinecolor":"#283442","zerolinewidth":2}}},"xaxis":{"anchor":"y","domain":[0.0,0.27999999999999997]},"yaxis":{"anchor":"x","domain":[0.5449999999999999,0.7274999999999999]},"xaxis2":{"anchor":"y2","domain":[0.32999999999999996,0.6099999999999999]},"yaxis2":{"anchor":"x2","domain":[0.5449999999999999,0.7274999999999999]},"xaxis3":{"anchor":"y3","domain":[0.6599999999999999,0.94]},"yaxis3":{"anchor":"x3","domain":[0.5449999999999999,0.7274999999999999]},"yaxis4":{"anchor":"x3","overlaying":"y3","side":"right"},"xaxis4":{"anchor":"y5","domain":[0.0,0.27999999999999997]},"yaxis5":{"anchor":"x4","domain":[0.27249999999999996,0.45499999999999996]},"xaxis5":{"anchor":"y6","domain":[0.32999999999999996,0.6099999999999999]},"yaxis6":{"anchor":"x5","domain":[0.27249999999999996,0.45499999999999996]},"xaxis6":{"anchor":"y7","domain":[0.6599999999999999,0.94]},"yaxis7":{"anchor":"x6","domain":[0.27249999999999996,0.45499999999999996]},"xaxis7":{"anchor":"y8","domain":[0.0,0.27999999999999997]},"yaxis8":{"anchor":"x7","domain":[0.0,0.1825]},"xaxis8":{"anchor":"y9","domain":[0.32999999999999996,0.6099999999999999]},"yaxis9":{"anchor":"x8","domain":[0.0,0.1825]},"xaxis9":{"anchor":"y10","domain":[0.6599999999999999,0.94]},"yaxis10":{"anchor":"x9","domain":[0.0,0.1825]},"annotations":[{"font":{"size":16},"showarrow":false,"text":"Final Accuracy","x":0.13999999999999999,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Predictability","x":0.4699999999999999,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Peak Memory","x":0.7999999999999999,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Accuracy vs Epoch","x":0.13999999999999999,"xanchor":"center","xref":"paper","y":0.7274999999999999,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Loss vs Epoch","x":0.4699999999999999,"xanchor":"center","xref":"paper","y":0.7274999999999999,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Memory and Processes","x":0.7999999999999999,"xanchor":"center","xref":"paper","y":0.7274999999999999,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Training Time","x":0.13999999999999999,"xanchor":"center","xref":"paper","y":0.45499999999999996,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Training Steps","x":0.4699999999999999,"xanchor":"center","xref":"paper","y":0.45499999999999996,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Throughput","x":0.7999999999999999,"xanchor":"center","xref":"paper","y":0.45499999999999996,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Learned Gates","x":0.13999999999999999,"xanchor":"center","xref":"paper","y":0.1825,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Metric Correlation","x":0.4699999999999999,"xanchor":"center","xref":"paper","y":0.1825,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Accuracy vs Loss","x":0.7999999999999999,"xanchor":"center","xref":"paper","y":0.1825,"yanchor":"bottom","yref":"paper"}],"font":{"color":"#e2e8f0","size":12},"legend":{"orientation":"h","yanchor":"bottom","y":1.02,"xanchor":"left","x":0.01},"margin":{"l":30,"r":30,"t":80,"b":30},"height":1800,"width":1900,"title":{"text":"OpenPeer NTK Trainer Benchmark Dashboard"},"paper_bgcolor":"#0f172a","plot_bgcolor":"#0f172a"}, {"responsive": true} ) }; </script> </div>
|
| 6 |
+
</body>
|
| 7 |
+
</html>
|
artifacts/benchmarks_smoke/gate_benchmarks.csv
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
epoch,steps,wall_time_sec,samples_per_sec,initial_accuracy,final_accuracy,final_loss,predictability_score,memory_rss_mb,child_processes,thread_count
|
| 2 |
+
1.0,5.0,11.786214499967173,6.787590706093361,0.35546875,0.35546875,13.70096492767334,0.00011213638080533762,165.609375,0.0,17.0
|
artifacts/benchmarks_smoke/learned_gates.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/benchmarks_smoke/loss_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/benchmarks_smoke/throughput_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/runtime_gui/accuracy_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/runtime_gui/benchmark_dashboard.html
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<html>
|
| 2 |
+
<head><meta charset="utf-8" /></head>
|
| 3 |
+
<body>
|
| 4 |
+
<div> <script>window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
|
| 5 |
+
<script charset="utf-8" src="https://cdn.plot.ly/plotly-3.5.0.min.js" integrity="sha256-fHbNLP+GlIXN+efbQec78UkemUz3NJp7UmfGxC1tNxs=" crossorigin="anonymous"></script> <div id="04226705-d693-46fc-8af6-e0789649c68d" class="plotly-graph-div" style="height:1950px; width:2000px;"></div> <script> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("04226705-d693-46fc-8af6-e0789649c68d")) { Plotly.newPlot( "04226705-d693-46fc-8af6-e0789649c68d", [{"delta":{"reference":0.9921875,"relative":false},"mode":"number+delta","number":{"font":{"color":"#22c55e","size":24},"valueformat":".1%"},"title":{"font":{"color":"#e2e8f0","size":14},"text":"Accuracy"},"value":1.0,"type":"indicator","domain":{"x":[0.0,0.2733333333333333],"y":[0.9183999999999999,0.9999999999999999]}},{"delta":{"reference":92.82344714585152,"relative":false},"mode":"number+delta","number":{"font":{"color":"#38bdf8","size":24},"valueformat":".2f"},"title":{"font":{"color":"#e2e8f0","size":14},"text":"Predictability"},"value":96.45363171816193,"type":"indicator","domain":{"x":[0.3333333333333333,0.6066666666666667],"y":[0.9183999999999999,0.9999999999999999]}},{"delta":{"reference":363.21875,"relative":false},"mode":"number+delta","number":{"font":{"color":"#f97316","size":24},"valueformat":".1f"},"title":{"font":{"color":"#e2e8f0","size":14},"text":"RSS MB"},"value":363.76171875,"type":"indicator","domain":{"x":[0.6666666666666666,0.94],"y":[0.9183999999999999,0.9999999999999999]}},{"line":{"color":"#22c55e","width":3},"mode":"lines+markers","name":"Accuracy","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"f8","bdata":"AAAAAADA7z8AAAAAAMDvPwAAAAAAwO8\u002fAAAAAAAA8D8="},"type":"scatter","xaxis":"x","yaxis":"y"},{"line":{"color":"#38bdf8","width":3},"mode":"lines+markers","name":"Predictability","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"f8","bdata":"rVqoW7M0V0A0lCphAKtXQMGqToBW\u002fldAw3tUTQgdWEA="},"type":"scatter","xaxis":"x","yaxis":"y"},{"line":{"color":"#f97316","width":3},"mode":"lines+markers","name":"Loss","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"f8","bdata":"AAAAoIYQsz8AAAAAlQisPwAAAGAVCqU\u002fAAAAgLh8oj8="},"type":"scatter","xaxis":"x2","yaxis":"y2"},{"line":{"color":"#a855f7","width":3},"mode":"lines+markers","name":"Wall Time (s)","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"f8","bdata":"AAAwlcjP8z8AABBFur\u002f3PwAAKLr1KgFAAACAXl0KBUA="},"type":"scatter","xaxis":"x3","yaxis":"y3"},{"marker":{"color":"#f97316"},"name":"Memory MB","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"f8","bdata":"AAAAAICzdkAAAAAAgLN2QAAAAACwtXZAAAAAADC8dkA="},"type":"bar","xaxis":"x3","yaxis":"y4"},{"line":{"color":"#a855f7","width":3},"mode":"lines+markers","name":"Wall Time (s)","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"f8","bdata":"AAAwlcjP8z8AABBFur\u002f3PwAAKLr1KgFAAACAXl0KBUA="},"type":"scatter","xaxis":"x4","yaxis":"y5"},{"line":{"color":"#14b8a6","width":3},"mode":"lines+markers","name":"Training Steps","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"i1","bdata":"ChkyQA=="},"type":"scatter","xaxis":"x5","yaxis":"y6"},{"marker":{"color":"#38bdf8"},"name":"Samples\u002fsec","showlegend":false,"x":{"dtype":"i1","bdata":"AQIDBA=="},"y":{"dtype":"f8","bdata":"9dVpnPAmoEC8ZkypxdewQAfWLo2mTLdAm6iqd4NV2EA="},"type":"bar","xaxis":"x6","yaxis":"y7"},{"marker":{"color":"#a855f7"},"name":"Gate Scale","showlegend":false,"x":["c0","c1","c2","c3","c4","c5","c6","c7","c8","c9","c10","c11"],"y":{"dtype":"f8","bdata":"AAAAQBGD6j8AAAAAhtHfPwAAAABAx+g\u002fAAAAICR78z8AAABAh\u002fryPwAAAOBifvA\u002fAAAAQFXR8j8AAADgqB\u002f7PwAAAODVoPo\u002fAAAAwBp9+j8AAADAV8H9PwAAAKC9nANA"},"type":"bar","xaxis":"x7","yaxis":"y8"},{"colorscale":[[0.0,"rgb(103,0,31)"],[0.1,"rgb(178,24,43)"],[0.2,"rgb(214,96,77)"],[0.3,"rgb(244,165,130)"],[0.4,"rgb(253,219,199)"],[0.5,"rgb(247,247,247)"],[0.6,"rgb(209,229,240)"],[0.7,"rgb(146,197,222)"],[0.8,"rgb(67,147,195)"],[0.9,"rgb(33,102,172)"],[1.0,"rgb(5,48,97)"]],"showscale":false,"x":["final_accuracy","predictability_score","final_loss","wall_time_sec","memory_rss_mb","samples_per_sec"],"y":["final_accuracy","predictability_score","final_loss","wall_time_sec","memory_rss_mb","samples_per_sec"],"z":{"dtype":"f8","bdata":"AAAAAAAA8D+r7Y1GQ1vjP+TaR3NdROO\u002fx12xEJx26T\u002fgANT0CvruPwXM4\u002fQboe8\u002fq+2NRkNb4z8AAAAAAADwPwDMGerk\u002f++\u002fN+G\u002f7y3b7T+EP5sMeO3nP8jhxt3N\u002f+Y\u002f5NpHc11E478AzBnq5P\u002fvvwAAAAAAAPA\u002f4p3+AlfM7b8mbNPvZdbnv6qLAPi76+a\u002fx12xEJx26T834b\u002fvLdvtP+Kd\u002fgJXzO2\u002fAAAAAAAA8D+KiOh4T1ftP2+kpGuC7+s\u002f4ADU9Ar67j+EP5sMeO3nPyZs0+9l1ue\u002fiojoeE9X7T8AAAAAAADwP4T2a0ylne8\u002fBczj9Buh7z\u002fI4cbdzf\u002fmP6qLAPi76+a\u002fb6Ska4Lv6z+E9mtMpZ3vPwAAAAAAAPA\u002f","shape":"6, 6"},"zmid":0,"type":"heatmap","xaxis":"x8","yaxis":"y9"},{"marker":{"color":{"dtype":"f8","bdata":"AAAAAICzdkAAAAAAgLN2QAAAAACwtXZAAAAAADC8dkA="},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"showscale":true,"size":14},"mode":"markers+text","name":"Accuracy\u002fLoss","showlegend":false,"text":["1","2","3","4"],"textposition":"top center","x":{"dtype":"f8","bdata":"AAAAoIYQsz8AAAAAlQisPwAAAGAVCqU\u002fAAAAgLh8oj8="},"y":{"dtype":"f8","bdata":"AAAAAADA7z8AAAAAAMDvPwAAAAAAwO8\u002fAAAAAAAA8D8="},"type":"scatter","xaxis":"x9","yaxis":"y10"},{"cells":{"align":"left","fill":{"color":"#111827"},"font":{"color":"#e2e8f0","size":12},"height":24,"values":[["Hostname","Platform","CPU","Physical Cores","Logical Cores","Memory Total (GB)","Memory Available (GB)","Disk Total (GB)","Disk Free (GB)","Python","CUDA","CUDA Device"],["MSI","Windows 10","Intel64 Family 6 Model 158 Stepping 13, GenuineIntel","6","12","63.85","41.35","864.58","397.54","3.11.5","no","cpu"]]},"header":{"align":"left","fill":{"color":"#0f172a"},"font":{"color":"#e2e8f0","size":14},"height":28,"values":["\u003cb\u003eMetric\u003c\u002fb\u003e","\u003cb\u003eValue\u003c\u002fb\u003e"]},"type":"table","domain":{"x":[0.0,0.94],"y":[0.0,0.16319999999999998]}}], {"template":{"data":{"barpolar":[{"marker":{"line":{"color":"rgb(17,17,17)","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"bar":[{"error_x":{"color":"#f2f5fa"},"error_y":{"color":"#f2f5fa"},"marker":{"line":{"color":"rgb(17,17,17)","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"carpet":[{"aaxis":{"endlinecolor":"#A2B1C6","gridcolor":"#506784","linecolor":"#506784","minorgridcolor":"#506784","startlinecolor":"#A2B1C6"},"baxis":{"endlinecolor":"#A2B1C6","gridcolor":"#506784","linecolor":"#506784","minorgridcolor":"#506784","startlinecolor":"#A2B1C6"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"line":{"color":"#283442"}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scattermap":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermap"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatter":[{"marker":{"line":{"color":"#283442"}},"type":"scatter"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#506784"},"line":{"color":"rgb(17,17,17)"}},"header":{"fill":{"color":"#2a3f5f"},"line":{"color":"rgb(17,17,17)"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#f2f5fa","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#f2f5fa"},"geo":{"bgcolor":"rgb(17,17,17)","lakecolor":"rgb(17,17,17)","landcolor":"rgb(17,17,17)","showlakes":true,"showland":true,"subunitcolor":"#506784"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"dark"},"paper_bgcolor":"rgb(17,17,17)","plot_bgcolor":"rgb(17,17,17)","polar":{"angularaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"bgcolor":"rgb(17,17,17)","radialaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"},"yaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"},"zaxis":{"backgroundcolor":"rgb(17,17,17)","gridcolor":"#506784","gridwidth":2,"linecolor":"#506784","showbackground":true,"ticks":"","zerolinecolor":"#C8D4E3"}},"shapedefaults":{"line":{"color":"#f2f5fa"}},"sliderdefaults":{"bgcolor":"#C8D4E3","bordercolor":"rgb(17,17,17)","borderwidth":1,"tickwidth":0},"ternary":{"aaxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"baxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""},"bgcolor":"rgb(17,17,17)","caxis":{"gridcolor":"#506784","linecolor":"#506784","ticks":""}},"title":{"x":0.05},"updatemenudefaults":{"bgcolor":"#506784","borderwidth":0},"xaxis":{"automargin":true,"gridcolor":"#283442","linecolor":"#506784","ticks":"","title":{"standoff":15},"zerolinecolor":"#283442","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"#283442","linecolor":"#506784","ticks":"","title":{"standoff":15},"zerolinecolor":"#283442","zerolinewidth":2}}},"xaxis":{"anchor":"y","domain":[0.0,0.2733333333333333],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis":{"anchor":"x","domain":[0.6888,0.8383999999999999]},"xaxis2":{"anchor":"y2","domain":[0.3333333333333333,0.6066666666666667],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis2":{"anchor":"x2","domain":[0.6888,0.8383999999999999]},"xaxis3":{"anchor":"y3","domain":[0.6666666666666666,0.94],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis3":{"anchor":"x3","domain":[0.6888,0.8383999999999999],"title":{"text":"Seconds"}},"yaxis4":{"anchor":"x3","overlaying":"y3","side":"right","title":{"text":"Memory MB"}},"xaxis4":{"anchor":"y5","domain":[0.0,0.2733333333333333],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis5":{"anchor":"x4","domain":[0.4728,0.6088]},"xaxis5":{"anchor":"y6","domain":[0.3333333333333333,0.6066666666666667],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis6":{"anchor":"x5","domain":[0.4728,0.6088]},"xaxis6":{"anchor":"y7","domain":[0.6666666666666666,0.94],"title":{"text":"Epoch"},"tickmode":"linear","dtick":1},"yaxis7":{"anchor":"x6","domain":[0.4728,0.6088]},"xaxis7":{"anchor":"y8","domain":[0.0,0.2733333333333333]},"yaxis8":{"anchor":"x7","domain":[0.24319999999999997,0.3927999999999999]},"xaxis8":{"anchor":"y9","domain":[0.3333333333333333,0.6066666666666667]},"yaxis9":{"anchor":"x8","domain":[0.24319999999999997,0.3927999999999999]},"xaxis9":{"anchor":"y10","domain":[0.6666666666666666,0.94]},"yaxis10":{"anchor":"x9","domain":[0.24319999999999997,0.3927999999999999]},"annotations":[{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Final Accuracy","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.9999999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Predictability","x":0.47,"xanchor":"center","xref":"paper","y":0.9999999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Peak Memory","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.9999999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Accuracy vs Epoch","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.8383999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Loss vs Epoch","x":0.47,"xanchor":"center","xref":"paper","y":0.8383999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Memory and Processes","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.8383999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Training Time","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.6088,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Training Steps","x":0.47,"xanchor":"center","xref":"paper","y":0.6088,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Throughput","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.6088,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Learned Gates","x":0.13666666666666666,"xanchor":"center","xref":"paper","y":0.3927999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Metric Correlation","x":0.47,"xanchor":"center","xref":"paper","y":0.3927999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Accuracy vs Loss","x":0.8033333333333332,"xanchor":"center","xref":"paper","y":0.3927999999999999,"yanchor":"bottom","yref":"paper","yshift":10},{"font":{"size":13,"color":"#e2e8f0"},"showarrow":false,"text":"Hardware Specs","x":0.47,"xanchor":"center","xref":"paper","y":0.16319999999999998,"yanchor":"bottom","yref":"paper","yshift":10}],"title":{"text":"OpenPeer NTK Trainer Benchmark Dashboard","x":0.02},"font":{"color":"#e2e8f0","size":12},"margin":{"l":30,"r":30,"t":90,"b":30},"height":1950,"width":2000,"paper_bgcolor":"#0f172a","plot_bgcolor":"#0f172a","showlegend":false}, {"responsive": true} ) }; </script> </div>
|
| 6 |
+
</body>
|
| 7 |
+
</html>
|
artifacts/runtime_gui/gate_benchmarks.csv
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
epoch,steps,wall_time_sec,samples_per_sec,initial_accuracy,final_accuracy,final_loss,predictability_score,memory_rss_mb,child_processes,thread_count,reached_target,trained_steps,target_accuracy
|
| 2 |
+
1,10,1.2382284000050277,2067.4699433396986,0.90625,0.9921875,0.07447091490030289,92.82344714585152,363.21875,0.0,28.0,0,10,0.999
|
| 3 |
+
2,25,1.4843085000757128,4311.772114539224,0.90625,0.9921875,0.05475297570228577,94.67189816625688,363.21875,0.0,28.0,0,25,0.999
|
| 4 |
+
3,50,2.1459764998871833,5964.650591780904,0.90625,0.9921875,0.04109255596995354,95.97402961427998,363.35546875,0.0,28.0,0,50,0.999
|
| 5 |
+
4,64,2.630060900002718,24918.05417887178,0.90625,1.0,0.03610779345035553,96.45363171816193,363.76171875,0.0,28.0,1,64,0.999
|
artifacts/runtime_gui/learned_gates.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/runtime_gui/loss_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/runtime_gui/throughput_curve.html
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
pyproject.toml
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[build-system]
|
| 2 |
+
requires = ["setuptools>=68", "wheel"]
|
| 3 |
+
build-backend = "setuptools.build_meta"
|
| 4 |
+
|
| 5 |
+
[project]
|
| 6 |
+
name = "openpeer-ntk-trainer"
|
| 7 |
+
version = "0.1.0"
|
| 8 |
+
description = "OpenPeerLLM trainer using ntkmirror controllers and a tinygrad gate demo"
|
| 9 |
+
readme = "README.md"
|
| 10 |
+
requires-python = ">=3.10"
|
| 11 |
+
dependencies = [
|
| 12 |
+
"torch>=2.2",
|
| 13 |
+
"transformers>=4.42",
|
| 14 |
+
"pandas>=2.2",
|
| 15 |
+
"plotly>=6.0",
|
| 16 |
+
"psutil>=6.0",
|
| 17 |
+
]
|
| 18 |
+
|
| 19 |
+
[project.optional-dependencies]
|
| 20 |
+
demo = ["tinygrad>=0.10.0"]
|
| 21 |
+
charts = ["openbb>=4.0"]
|
| 22 |
+
gui = ["streamlit>=1.36"]
|
| 23 |
+
ntk = ["ntkmirror @ git+https://github.com/leochlon/ntkmirror.git"]
|
| 24 |
+
all = ["tinygrad>=0.10.0", "openbb>=4.0", "streamlit>=1.36", "ntkmirror @ git+https://github.com/leochlon/ntkmirror.git"]
|
| 25 |
+
|
| 26 |
+
[project.scripts]
|
| 27 |
+
openpeer-trainer = "openpeer_trainer.cli:main"
|
| 28 |
+
|
| 29 |
+
[tool.setuptools]
|
| 30 |
+
package-dir = {"" = "src"}
|
| 31 |
+
|
| 32 |
+
[tool.setuptools.packages.find]
|
| 33 |
+
where = ["src"]
|
src/openpeer_ntk_trainer.egg-info/PKG-INFO
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Metadata-Version: 2.4
|
| 2 |
+
Name: openpeer-ntk-trainer
|
| 3 |
+
Version: 0.1.0
|
| 4 |
+
Summary: OpenPeerLLM trainer using ntkmirror controllers and a tinygrad gate demo
|
| 5 |
+
Requires-Python: >=3.10
|
| 6 |
+
Description-Content-Type: text/markdown
|
| 7 |
+
Requires-Dist: torch>=2.2
|
| 8 |
+
Requires-Dist: transformers>=4.42
|
| 9 |
+
Requires-Dist: pandas>=2.2
|
| 10 |
+
Requires-Dist: plotly>=6.0
|
| 11 |
+
Requires-Dist: psutil>=6.0
|
| 12 |
+
Provides-Extra: demo
|
| 13 |
+
Requires-Dist: tinygrad>=0.10.0; extra == "demo"
|
| 14 |
+
Provides-Extra: charts
|
| 15 |
+
Requires-Dist: openbb>=4.0; extra == "charts"
|
| 16 |
+
Provides-Extra: gui
|
| 17 |
+
Requires-Dist: streamlit>=1.36; extra == "gui"
|
| 18 |
+
Provides-Extra: ntk
|
| 19 |
+
Requires-Dist: ntkmirror @ git+https://github.com/leochlon/ntkmirror.git ; extra == "ntk"
|
| 20 |
+
Provides-Extra: all
|
| 21 |
+
Requires-Dist: tinygrad>=0.10.0; extra == "all"
|
| 22 |
+
Requires-Dist: openbb>=4.0; extra == "all"
|
| 23 |
+
Requires-Dist: streamlit>=1.36; extra == "all"
|
| 24 |
+
Requires-Dist: ntkmirror @ git+https://github.com/leochlon/ntkmirror.git ; extra == "all"
|
| 25 |
+
|
| 26 |
+
# OpenPeer NTK Trainer
|
| 27 |
+
|
| 28 |
+
This workspace contains three related paths:
|
| 29 |
+
|
| 30 |
+
* A real fine-tuning path that uses [ntkmirror](https://github.com/leochlon/ntkmirror) to fit signed log-gate controllers on a frozen Hugging Face causal LM, and
|
| 31 |
+
* A tinygrad-backed smoke demo that trains only gate parameters on a synthetic task so the controller idea can be validated locally and cheaply.
|
| 32 |
+
* A benchmark pipeline that records accuracy, loss, memory, process counts, predictability, and throughput, then renders a combined dashboard plus OpenBB-backed charts.
|
| 33 |
+
* A runtime GUI for live benchmark runs with current hardware specs baked into the view.
|
| 34 |
+
|
| 35 |
+
The OpenPeerLLM model card currently points at `OpenPeerAI/OpenPeerLLM`, but that repository card is not a standard inference-ready Hugging Face example. The trainer therefore targets any causal LM that `transformers` can load, with `OpenPeerAI/OpenPeerLLM` as the primary model ID and a smaller fallback for local demos.
|
| 36 |
+
|
| 37 |
+
## Install
|
| 38 |
+
|
| 39 |
+
```powershell
|
| 40 |
+
pip install -e .
|
| 41 |
+
pip install tinygrad
|
| 42 |
+
pip install git+https://github.com/leochlon/ntkmirror.git
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
If you only want the local demo, install the demo extra instead:
|
| 46 |
+
|
| 47 |
+
```powershell
|
| 48 |
+
pip install -e ".[demo]"
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
To enable OpenBB-backed chart generation for benchmarks, install the chart extra too:
|
| 52 |
+
|
| 53 |
+
```powershell
|
| 54 |
+
pip install -e ".[demo,charts]"
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
To enable the runtime GUI, install the GUI extra:
|
| 58 |
+
|
| 59 |
+
```powershell
|
| 60 |
+
pip install -e ".[gui]"
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
## Run the tinygrad demo
|
| 64 |
+
|
| 65 |
+
```powershell
|
| 66 |
+
python -m openpeer_trainer.cli demo --steps 100 --target-accuracy 0.99
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
The demo stops early as soon as it reaches the requested accuracy target.
|
| 70 |
+
|
| 71 |
+
## Run benchmarks and charts
|
| 72 |
+
|
| 73 |
+
```powershell
|
| 74 |
+
python -m openpeer_trainer.cli bench --steps 10 25 50 --target-accuracy 0.99
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
The benchmark runner writes a CSV plus HTML charts under `artifacts/benchmarks/`. The main output is `benchmark_dashboard.html`, a multi-panel dashboard showing memory, processes, learned gates, loss, predictability, accuracy, training steps, time, and epoch in actual seconds. If the OpenBB charting extension is installed, the companion charts are rendered through OpenBB; otherwise the script falls back to Plotly with the same data.
|
| 78 |
+
|
| 79 |
+
## Launch the runtime GUI
|
| 80 |
+
|
| 81 |
+
```powershell
|
| 82 |
+
python -m openpeer_trainer.cli gui
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
The GUI shows the same dashboard, a live benchmark runner, and a hardware-spec table for this computer.
|
| 86 |
+
|
| 87 |
+
## Fit an ntkmirror controller
|
| 88 |
+
|
| 89 |
+
```powershell
|
| 90 |
+
python -m openpeer_trainer.cli fit --model OpenPeerAI/OpenPeerLLM --train-jsonl train.jsonl --out runs/openpeer_controller.pt
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## JSONL format
|
| 94 |
+
|
| 95 |
+
Preferred schema:
|
| 96 |
+
|
| 97 |
+
```jsonl
|
| 98 |
+
{"prompt":"Question: 14 + 27 = ?\nAnswer:","completion":" 41"}
|
| 99 |
+
{"prompt":"Question: 36 + 18 = ?\nAnswer:","completion":" 54"}
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
The trainer also accepts `instruction`/`response`, `question`/`answer`, or `text` records when the underlying ntkmirror loader supports them.
|
| 103 |
+
|
| 104 |
+
## References
|
| 105 |
+
|
| 106 |
+
* OpenPeer AI / Riemann Computing Inc. / Andrew Magdy Kamal Nassief
|
| 107 |
+
* ntkmirror: https://github.com/leochlon/ntkmirror
|
| 108 |
+
* Tinygrad: https://github.com/tinygrad/tinygrad
|
| 109 |
+
|
src/openpeer_ntk_trainer.egg-info/SOURCES.txt
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
README.md
|
| 2 |
+
pyproject.toml
|
| 3 |
+
src/openpeer_ntk_trainer.egg-info/PKG-INFO
|
| 4 |
+
src/openpeer_ntk_trainer.egg-info/SOURCES.txt
|
| 5 |
+
src/openpeer_ntk_trainer.egg-info/dependency_links.txt
|
| 6 |
+
src/openpeer_ntk_trainer.egg-info/entry_points.txt
|
| 7 |
+
src/openpeer_ntk_trainer.egg-info/requires.txt
|
| 8 |
+
src/openpeer_ntk_trainer.egg-info/top_level.txt
|
| 9 |
+
src/openpeer_trainer/__init__.py
|
| 10 |
+
src/openpeer_trainer/benchmarks.py
|
| 11 |
+
src/openpeer_trainer/cli.py
|
| 12 |
+
src/openpeer_trainer/controller.py
|
| 13 |
+
src/openpeer_trainer/gui.py
|
| 14 |
+
src/openpeer_trainer/hardware.py
|
| 15 |
+
src/openpeer_trainer/smoke.py
|
src/openpeer_ntk_trainer.egg-info/dependency_links.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
|
src/openpeer_ntk_trainer.egg-info/entry_points.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[console_scripts]
|
| 2 |
+
openpeer-trainer = openpeer_trainer.cli:main
|
src/openpeer_ntk_trainer.egg-info/requires.txt
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
torch>=2.2
|
| 2 |
+
transformers>=4.42
|
| 3 |
+
pandas>=2.2
|
| 4 |
+
plotly>=6.0
|
| 5 |
+
psutil>=6.0
|
| 6 |
+
|
| 7 |
+
[all]
|
| 8 |
+
tinygrad>=0.10.0
|
| 9 |
+
openbb>=4.0
|
| 10 |
+
streamlit>=1.36
|
| 11 |
+
ntkmirror @ git+https://github.com/leochlon/ntkmirror.git
|
| 12 |
+
|
| 13 |
+
[charts]
|
| 14 |
+
openbb>=4.0
|
| 15 |
+
|
| 16 |
+
[demo]
|
| 17 |
+
tinygrad>=0.10.0
|
| 18 |
+
|
| 19 |
+
[gui]
|
| 20 |
+
streamlit>=1.36
|
| 21 |
+
|
| 22 |
+
[ntk]
|
| 23 |
+
ntkmirror @ git+https://github.com/leochlon/ntkmirror.git
|
src/openpeer_ntk_trainer.egg-info/top_level.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
openpeer_trainer
|
src/openpeer_trainer/__init__.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from .controller import TrainerConfig, fit_controller
|
| 2 |
+
from .smoke import run_tinygrad_gate_demo
|
| 3 |
+
|
| 4 |
+
__all__ = ["TrainerConfig", "fit_controller", "run_tinygrad_gate_demo"]
|
src/openpeer_trainer/__pycache__/__init__.cpython-311.pyc
ADDED
|
Binary file (397 Bytes). View file
|
|
|
src/openpeer_trainer/__pycache__/benchmarks.cpython-311.pyc
ADDED
|
Binary file (19.8 kB). View file
|
|
|
src/openpeer_trainer/__pycache__/cli.cpython-311.pyc
ADDED
|
Binary file (5.81 kB). View file
|
|
|
src/openpeer_trainer/__pycache__/controller.cpython-311.pyc
ADDED
|
Binary file (4.18 kB). View file
|
|
|
src/openpeer_trainer/__pycache__/gui.cpython-311.pyc
ADDED
|
Binary file (7.75 kB). View file
|
|
|
src/openpeer_trainer/__pycache__/hardware.cpython-311.pyc
ADDED
|
Binary file (4.78 kB). View file
|
|
|
src/openpeer_trainer/__pycache__/smoke.cpython-311.pyc
ADDED
|
Binary file (9.24 kB). View file
|
|
|
src/openpeer_trainer/benchmarks.py
ADDED
|
@@ -0,0 +1,344 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
import time
|
| 6 |
+
from math import exp
|
| 7 |
+
|
| 8 |
+
import pandas as pd
|
| 9 |
+
from plotly.subplots import make_subplots
|
| 10 |
+
import plotly.graph_objects as go
|
| 11 |
+
|
| 12 |
+
from .hardware import collect_hardware_specs, hardware_table_rows
|
| 13 |
+
from .smoke import run_tinygrad_gate_demo
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
@dataclass(slots=True)
|
| 17 |
+
class BenchmarkResult:
|
| 18 |
+
csv_path: str
|
| 19 |
+
chart_paths: list[str]
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def _chart_backend():
|
| 23 |
+
try:
|
| 24 |
+
from openbb_charting.charts.generic_charts import bar_chart, line_chart # type: ignore[import-not-found]
|
| 25 |
+
|
| 26 |
+
return "openbb", line_chart, bar_chart
|
| 27 |
+
except Exception:
|
| 28 |
+
return "plotly", None, None
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
def _save_figure(fig, path: Path) -> None:
|
| 32 |
+
path.parent.mkdir(parents=True, exist_ok=True)
|
| 33 |
+
if hasattr(fig, "show"):
|
| 34 |
+
try:
|
| 35 |
+
fig = fig.show(external=True)
|
| 36 |
+
except TypeError:
|
| 37 |
+
pass
|
| 38 |
+
if hasattr(fig, "write_html"):
|
| 39 |
+
fig.write_html(str(path))
|
| 40 |
+
return
|
| 41 |
+
raise RuntimeError("chart object does not support HTML export")
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def _make_dashboard(df: pd.DataFrame, gate_df: pd.DataFrame, output_path: Path) -> None:
|
| 45 |
+
hardware_specs = collect_hardware_specs()
|
| 46 |
+
hardware_rows = hardware_table_rows(hardware_specs)
|
| 47 |
+
hardware_table = pd.DataFrame(hardware_rows)
|
| 48 |
+
|
| 49 |
+
fig = make_subplots(
|
| 50 |
+
rows=5,
|
| 51 |
+
cols=3,
|
| 52 |
+
specs=[
|
| 53 |
+
[{"type": "indicator"}, {"type": "indicator"}, {"type": "indicator"}],
|
| 54 |
+
[{"type": "xy"}, {"type": "xy"}, {"type": "xy", "secondary_y": True}],
|
| 55 |
+
[{"type": "xy"}, {"type": "xy"}, {"type": "xy"}],
|
| 56 |
+
[{"type": "xy"}, {"type": "heatmap"}, {"type": "xy"}],
|
| 57 |
+
[{"type": "table", "colspan": 3}, None, None],
|
| 58 |
+
],
|
| 59 |
+
row_heights=[0.12, 0.22, 0.20, 0.22, 0.24],
|
| 60 |
+
subplot_titles=(
|
| 61 |
+
"Final Accuracy",
|
| 62 |
+
"Predictability",
|
| 63 |
+
"Peak Memory",
|
| 64 |
+
"Accuracy vs Epoch",
|
| 65 |
+
"Loss vs Epoch",
|
| 66 |
+
"Memory and Processes",
|
| 67 |
+
"Training Time",
|
| 68 |
+
"Training Steps",
|
| 69 |
+
"Throughput",
|
| 70 |
+
"Learned Gates",
|
| 71 |
+
"Metric Correlation",
|
| 72 |
+
"Accuracy vs Loss",
|
| 73 |
+
"Hardware Specs",
|
| 74 |
+
),
|
| 75 |
+
vertical_spacing=0.08,
|
| 76 |
+
horizontal_spacing=0.06,
|
| 77 |
+
)
|
| 78 |
+
|
| 79 |
+
latest = df.iloc[-1]
|
| 80 |
+
indicators = [
|
| 81 |
+
(latest["final_accuracy"], ".1%", "#22c55e", "Accuracy"),
|
| 82 |
+
(latest["predictability_score"], ".2f", "#38bdf8", "Predictability"),
|
| 83 |
+
(latest["memory_rss_mb"], ".1f", "#f97316", "RSS MB"),
|
| 84 |
+
]
|
| 85 |
+
initial_indicator_values = [
|
| 86 |
+
float(df.iloc[0]["final_accuracy"]),
|
| 87 |
+
float(df.iloc[0]["predictability_score"]),
|
| 88 |
+
float(df.iloc[0]["memory_rss_mb"]),
|
| 89 |
+
]
|
| 90 |
+
for idx, (value, fmt, color, title) in enumerate(indicators, start=1):
|
| 91 |
+
fig.add_trace(
|
| 92 |
+
go.Indicator(
|
| 93 |
+
mode="number+delta",
|
| 94 |
+
value=float(value),
|
| 95 |
+
number={"valueformat": fmt, "font": {"size": 24, "color": color}},
|
| 96 |
+
title={"text": title, "font": {"size": 14, "color": "#e2e8f0"}},
|
| 97 |
+
delta={"reference": initial_indicator_values[idx - 1], "relative": False},
|
| 98 |
+
),
|
| 99 |
+
row=1,
|
| 100 |
+
col=idx,
|
| 101 |
+
)
|
| 102 |
+
|
| 103 |
+
fig.add_trace(go.Scatter(x=df["epoch"], y=df["final_accuracy"], mode="lines+markers", line=dict(color="#22c55e", width=3), name="Accuracy", showlegend=False), row=2, col=1)
|
| 104 |
+
fig.add_trace(go.Scatter(x=df["epoch"], y=df["predictability_score"], mode="lines+markers", line=dict(color="#38bdf8", width=3), name="Predictability", showlegend=False), row=2, col=1)
|
| 105 |
+
fig.add_trace(go.Scatter(x=df["epoch"], y=df["final_loss"], mode="lines+markers", line=dict(color="#f97316", width=3), name="Loss", showlegend=False), row=2, col=2)
|
| 106 |
+
fig.add_trace(go.Scatter(x=df["epoch"], y=df["wall_time_sec"], mode="lines+markers", line=dict(color="#a855f7", width=3), name="Wall Time (s)", showlegend=False), row=2, col=3, secondary_y=False)
|
| 107 |
+
fig.add_trace(go.Bar(x=df["epoch"], y=df["memory_rss_mb"], marker_color="#f97316", name="Memory MB", showlegend=False), row=2, col=3, secondary_y=True)
|
| 108 |
+
|
| 109 |
+
fig.add_trace(go.Scatter(x=df["epoch"], y=df["wall_time_sec"], mode="lines+markers", line=dict(color="#a855f7", width=3), name="Wall Time (s)", showlegend=False), row=3, col=1)
|
| 110 |
+
fig.add_trace(go.Scatter(x=df["epoch"], y=df["steps"], mode="lines+markers", line=dict(color="#14b8a6", width=3), name="Training Steps", showlegend=False), row=3, col=2)
|
| 111 |
+
fig.add_trace(go.Bar(x=df["epoch"], y=df["samples_per_sec"], marker_color="#38bdf8", name="Samples/sec", showlegend=False), row=3, col=3)
|
| 112 |
+
|
| 113 |
+
fig.add_trace(go.Bar(x=gate_df["channel"], y=gate_df["gate_scale"], marker_color="#a855f7", name="Gate Scale", showlegend=False), row=4, col=1)
|
| 114 |
+
|
| 115 |
+
corr_df = df[["final_accuracy", "predictability_score", "final_loss", "wall_time_sec", "memory_rss_mb", "samples_per_sec"]].corr()
|
| 116 |
+
fig.add_trace(go.Heatmap(z=corr_df.values, x=corr_df.columns, y=corr_df.index, colorscale="RdBu", zmid=0, showscale=False), row=4, col=2)
|
| 117 |
+
|
| 118 |
+
fig.add_trace(go.Scatter(x=df["final_loss"], y=df["final_accuracy"], mode="markers+text", text=df["epoch"].astype(str), textposition="top center", marker=dict(size=14, color=df["memory_rss_mb"], colorscale="Viridis", showscale=True), name="Accuracy/Loss", showlegend=False), row=4, col=3)
|
| 119 |
+
|
| 120 |
+
fig.add_trace(
|
| 121 |
+
go.Table(
|
| 122 |
+
header=dict(
|
| 123 |
+
values=["<b>Metric</b>", "<b>Value</b>"],
|
| 124 |
+
fill_color="#0f172a",
|
| 125 |
+
font=dict(color="#e2e8f0", size=14),
|
| 126 |
+
align="left",
|
| 127 |
+
height=28,
|
| 128 |
+
),
|
| 129 |
+
cells=dict(
|
| 130 |
+
values=[hardware_table["Metric"], hardware_table["Value"]],
|
| 131 |
+
fill_color="#111827",
|
| 132 |
+
font=dict(color="#e2e8f0", size=12),
|
| 133 |
+
align="left",
|
| 134 |
+
height=24,
|
| 135 |
+
),
|
| 136 |
+
),
|
| 137 |
+
row=5,
|
| 138 |
+
col=1,
|
| 139 |
+
)
|
| 140 |
+
|
| 141 |
+
fig.update_layout(
|
| 142 |
+
template="plotly_dark",
|
| 143 |
+
height=1950,
|
| 144 |
+
width=2000,
|
| 145 |
+
title_text="OpenPeer NTK Trainer Benchmark Dashboard",
|
| 146 |
+
paper_bgcolor="#0f172a",
|
| 147 |
+
plot_bgcolor="#0f172a",
|
| 148 |
+
font=dict(color="#e2e8f0", size=12),
|
| 149 |
+
showlegend=False,
|
| 150 |
+
margin=dict(l=30, r=30, t=90, b=30),
|
| 151 |
+
title_x=0.02,
|
| 152 |
+
)
|
| 153 |
+
fig.update_annotations(font=dict(size=13, color="#e2e8f0"), yshift=10)
|
| 154 |
+
fig.update_yaxes(title_text="Seconds", row=2, col=3, secondary_y=False)
|
| 155 |
+
fig.update_yaxes(title_text="Memory MB", row=2, col=3, secondary_y=True)
|
| 156 |
+
fig.update_xaxes(title_text="Epoch", row=2, col=1)
|
| 157 |
+
fig.update_xaxes(title_text="Epoch", row=2, col=2)
|
| 158 |
+
fig.update_xaxes(title_text="Epoch", row=2, col=3)
|
| 159 |
+
fig.update_xaxes(title_text="Epoch", row=3, col=1)
|
| 160 |
+
fig.update_xaxes(title_text="Epoch", row=3, col=2)
|
| 161 |
+
fig.update_xaxes(title_text="Epoch", row=3, col=3)
|
| 162 |
+
fig.update_xaxes(tickmode="linear", dtick=1, row=2, col=1)
|
| 163 |
+
fig.update_xaxes(tickmode="linear", dtick=1, row=2, col=2)
|
| 164 |
+
fig.update_xaxes(tickmode="linear", dtick=1, row=2, col=3)
|
| 165 |
+
fig.update_xaxes(tickmode="linear", dtick=1, row=3, col=1)
|
| 166 |
+
fig.update_xaxes(tickmode="linear", dtick=1, row=3, col=2)
|
| 167 |
+
fig.update_xaxes(tickmode="linear", dtick=1, row=3, col=3)
|
| 168 |
+
fig.write_html(str(output_path), include_plotlyjs="cdn")
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
def _make_line_chart(df: pd.DataFrame, y: str, title: str, color: str, output_path: Path):
|
| 172 |
+
backend, line_chart, _ = _chart_backend()
|
| 173 |
+
if backend == "openbb" and line_chart is not None:
|
| 174 |
+
fig = line_chart(
|
| 175 |
+
data=df,
|
| 176 |
+
x="steps",
|
| 177 |
+
y=y,
|
| 178 |
+
title=title,
|
| 179 |
+
xtitle="Training steps",
|
| 180 |
+
ytitle=y.replace("_", " ").title(),
|
| 181 |
+
render=False,
|
| 182 |
+
layout_kwargs={
|
| 183 |
+
"template": "plotly_dark",
|
| 184 |
+
"paper_bgcolor": "#0f172a",
|
| 185 |
+
"plot_bgcolor": "#0f172a",
|
| 186 |
+
"font": {"color": "#e2e8f0"},
|
| 187 |
+
},
|
| 188 |
+
scatter_kwargs={"line": {"color": color, "width": 3}},
|
| 189 |
+
)
|
| 190 |
+
_save_figure(fig, output_path)
|
| 191 |
+
return
|
| 192 |
+
|
| 193 |
+
import plotly.express as px
|
| 194 |
+
|
| 195 |
+
fig = px.line(df, x="steps", y=y, markers=True, title=title, template="plotly_dark", color_discrete_sequence=[color])
|
| 196 |
+
fig.update_layout(paper_bgcolor="#0f172a", plot_bgcolor="#0f172a", font=dict(color="#e2e8f0"))
|
| 197 |
+
fig.write_html(str(output_path))
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
def _make_bar_chart(df: pd.DataFrame, x: str, y: str, title: str, color: str, output_path: Path):
|
| 201 |
+
backend, _, bar_chart = _chart_backend()
|
| 202 |
+
if backend == "openbb" and bar_chart is not None:
|
| 203 |
+
fig = bar_chart(
|
| 204 |
+
data=df,
|
| 205 |
+
x=x,
|
| 206 |
+
y=y,
|
| 207 |
+
title=title,
|
| 208 |
+
xtitle=x.replace("_", " ").title(),
|
| 209 |
+
ytitle=y.replace("_", " ").title(),
|
| 210 |
+
render=False,
|
| 211 |
+
colors=[color],
|
| 212 |
+
layout_kwargs={
|
| 213 |
+
"template": "plotly_dark",
|
| 214 |
+
"paper_bgcolor": "#0f172a",
|
| 215 |
+
"plot_bgcolor": "#0f172a",
|
| 216 |
+
"font": {"color": "#e2e8f0"},
|
| 217 |
+
},
|
| 218 |
+
)
|
| 219 |
+
_save_figure(fig, output_path)
|
| 220 |
+
return
|
| 221 |
+
|
| 222 |
+
import plotly.express as px
|
| 223 |
+
|
| 224 |
+
fig = px.bar(df, x=x, y=y, title=title, template="plotly_dark", color_discrete_sequence=[color])
|
| 225 |
+
fig.update_layout(paper_bgcolor="#0f172a", plot_bgcolor="#0f172a", font=dict(color="#e2e8f0"))
|
| 226 |
+
fig.write_html(str(output_path))
|
| 227 |
+
|
| 228 |
+
|
| 229 |
+
def run_benchmark_suite(
|
| 230 |
+
step_counts: list[int],
|
| 231 |
+
batch_size: int = 64,
|
| 232 |
+
seed: int = 0,
|
| 233 |
+
output_dir: str = "artifacts/benchmarks",
|
| 234 |
+
target_accuracy: float = 0.99,
|
| 235 |
+
) -> BenchmarkResult:
|
| 236 |
+
out_dir = Path(output_dir)
|
| 237 |
+
out_dir.mkdir(parents=True, exist_ok=True)
|
| 238 |
+
|
| 239 |
+
rows: list[dict[str, float]] = []
|
| 240 |
+
last_result = None
|
| 241 |
+
|
| 242 |
+
for epoch, steps in enumerate(step_counts, start=1):
|
| 243 |
+
started = time.perf_counter()
|
| 244 |
+
result = run_tinygrad_gate_demo(steps=steps, batch_size=batch_size, seed=seed, target_accuracy=target_accuracy)
|
| 245 |
+
elapsed = time.perf_counter() - started
|
| 246 |
+
memory_rss_mb = result.telemetry[-1].memory_rss_mb if result.telemetry else 0.0
|
| 247 |
+
child_processes = result.telemetry[-1].child_processes if result.telemetry else 0
|
| 248 |
+
thread_count = result.telemetry[-1].thread_count if result.telemetry else 0
|
| 249 |
+
predictability_score = exp(-result.final_loss) * 100.0
|
| 250 |
+
rows.append(
|
| 251 |
+
{
|
| 252 |
+
"epoch": int(epoch),
|
| 253 |
+
"steps": int(result.trained_steps),
|
| 254 |
+
"wall_time_sec": elapsed,
|
| 255 |
+
"samples_per_sec": (steps * batch_size) / max(elapsed, 1e-9),
|
| 256 |
+
"initial_accuracy": result.initial_accuracy,
|
| 257 |
+
"final_accuracy": result.final_accuracy,
|
| 258 |
+
"final_loss": result.final_loss,
|
| 259 |
+
"predictability_score": predictability_score,
|
| 260 |
+
"memory_rss_mb": memory_rss_mb,
|
| 261 |
+
"child_processes": float(child_processes),
|
| 262 |
+
"thread_count": float(thread_count),
|
| 263 |
+
"reached_target": int(1 if result.reached_target else 0),
|
| 264 |
+
"trained_steps": int(result.trained_steps),
|
| 265 |
+
"target_accuracy": result.target_accuracy,
|
| 266 |
+
}
|
| 267 |
+
)
|
| 268 |
+
last_result = result
|
| 269 |
+
|
| 270 |
+
df = pd.DataFrame(rows).sort_values("steps")
|
| 271 |
+
csv_path = out_dir / "gate_benchmarks.csv"
|
| 272 |
+
df.to_csv(csv_path, index=False)
|
| 273 |
+
|
| 274 |
+
if not df.empty and df["final_accuracy"].iloc[-1] < target_accuracy:
|
| 275 |
+
extended_step = int(max(df["steps"].iloc[-1] * 2, 256))
|
| 276 |
+
while df["final_accuracy"].iloc[-1] < target_accuracy and extended_step <= 4096:
|
| 277 |
+
started = time.perf_counter()
|
| 278 |
+
result = run_tinygrad_gate_demo(steps=extended_step, batch_size=batch_size, seed=seed, target_accuracy=target_accuracy)
|
| 279 |
+
elapsed = time.perf_counter() - started
|
| 280 |
+
memory_rss_mb = result.telemetry[-1].memory_rss_mb if result.telemetry else 0.0
|
| 281 |
+
child_processes = result.telemetry[-1].child_processes if result.telemetry else 0
|
| 282 |
+
thread_count = result.telemetry[-1].thread_count if result.telemetry else 0
|
| 283 |
+
predictability_score = exp(-result.final_loss) * 100.0
|
| 284 |
+
df = pd.concat([
|
| 285 |
+
df,
|
| 286 |
+
pd.DataFrame([
|
| 287 |
+
{
|
| 288 |
+
"epoch": int(df["epoch"].iloc[-1] + 1),
|
| 289 |
+
"steps": int(result.trained_steps),
|
| 290 |
+
"wall_time_sec": elapsed,
|
| 291 |
+
"samples_per_sec": (extended_step * batch_size) / max(elapsed, 1e-9),
|
| 292 |
+
"initial_accuracy": result.initial_accuracy,
|
| 293 |
+
"final_accuracy": result.final_accuracy,
|
| 294 |
+
"final_loss": result.final_loss,
|
| 295 |
+
"predictability_score": predictability_score,
|
| 296 |
+
"memory_rss_mb": memory_rss_mb,
|
| 297 |
+
"child_processes": float(child_processes),
|
| 298 |
+
"thread_count": float(thread_count),
|
| 299 |
+
"reached_target": int(1 if result.reached_target else 0),
|
| 300 |
+
"trained_steps": int(result.trained_steps),
|
| 301 |
+
"target_accuracy": result.target_accuracy,
|
| 302 |
+
}
|
| 303 |
+
])
|
| 304 |
+
], ignore_index=True)
|
| 305 |
+
extended_step *= 2
|
| 306 |
+
df.to_csv(csv_path, index=False)
|
| 307 |
+
|
| 308 |
+
chart_paths: list[str] = []
|
| 309 |
+
|
| 310 |
+
gate_df = pd.DataFrame(
|
| 311 |
+
{
|
| 312 |
+
"channel": [f"c{i}" for i in range(len(last_result.learned_gates))] if last_result is not None else [],
|
| 313 |
+
"gate_scale": last_result.learned_gates if last_result is not None else [],
|
| 314 |
+
}
|
| 315 |
+
)
|
| 316 |
+
|
| 317 |
+
dashboard_path = out_dir / "benchmark_dashboard.html"
|
| 318 |
+
_make_dashboard(df, gate_df, dashboard_path)
|
| 319 |
+
chart_paths.append(str(dashboard_path))
|
| 320 |
+
|
| 321 |
+
accuracy_chart = out_dir / "accuracy_curve.html"
|
| 322 |
+
_make_line_chart(df, "final_accuracy", "Gate Controller Accuracy vs Training Steps", "#22c55e", accuracy_chart)
|
| 323 |
+
chart_paths.append(str(accuracy_chart))
|
| 324 |
+
|
| 325 |
+
loss_chart = out_dir / "loss_curve.html"
|
| 326 |
+
_make_line_chart(df, "final_loss", "Gate Controller Loss vs Training Steps", "#f97316", loss_chart)
|
| 327 |
+
chart_paths.append(str(loss_chart))
|
| 328 |
+
|
| 329 |
+
throughput_chart = out_dir / "throughput_curve.html"
|
| 330 |
+
_make_line_chart(df, "samples_per_sec", "Gate Controller Throughput vs Training Steps", "#38bdf8", throughput_chart)
|
| 331 |
+
chart_paths.append(str(throughput_chart))
|
| 332 |
+
|
| 333 |
+
if last_result is not None:
|
| 334 |
+
gate_sample_df = pd.DataFrame(
|
| 335 |
+
{
|
| 336 |
+
"channel": [f"c{i}" for i in range(len(last_result.learned_gate_sample))],
|
| 337 |
+
"gate_scale": last_result.learned_gate_sample,
|
| 338 |
+
}
|
| 339 |
+
)
|
| 340 |
+
gate_chart = out_dir / "learned_gates.html"
|
| 341 |
+
_make_bar_chart(gate_sample_df, "channel", "gate_scale", "Learned Gate Scales", "#a855f7", gate_chart)
|
| 342 |
+
chart_paths.append(str(gate_chart))
|
| 343 |
+
|
| 344 |
+
return BenchmarkResult(csv_path=str(csv_path), chart_paths=chart_paths)
|
src/openpeer_trainer/cli.py
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import argparse
|
| 4 |
+
from dataclasses import asdict
|
| 5 |
+
|
| 6 |
+
from .controller import TrainerConfig, fit_controller
|
| 7 |
+
from .benchmarks import run_benchmark_suite
|
| 8 |
+
from .gui import launch_runtime_gui
|
| 9 |
+
from .smoke import run_tinygrad_gate_demo
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def build_parser() -> argparse.ArgumentParser:
|
| 13 |
+
parser = argparse.ArgumentParser(prog="openpeer-trainer")
|
| 14 |
+
sub = parser.add_subparsers(dest="command", required=True)
|
| 15 |
+
|
| 16 |
+
demo = sub.add_parser("demo", help="run the tinygrad gate-controller smoke demo")
|
| 17 |
+
demo.add_argument("--steps", type=int, default=80)
|
| 18 |
+
demo.add_argument("--batch-size", type=int, default=64)
|
| 19 |
+
demo.add_argument("--seed", type=int, default=0)
|
| 20 |
+
demo.add_argument("--target-accuracy", type=float, default=0.99)
|
| 21 |
+
|
| 22 |
+
fit = sub.add_parser("fit", help="fit an ntkmirror controller on a frozen causal LM")
|
| 23 |
+
fit.add_argument("--model", default="OpenPeerAI/OpenPeerLLM")
|
| 24 |
+
fit.add_argument("--fallback-model", default="sshleifer/tiny-gpt2")
|
| 25 |
+
fit.add_argument("--train-jsonl")
|
| 26 |
+
fit.add_argument("--out", default="runs/openpeer_controller.pt")
|
| 27 |
+
fit.add_argument("--gates", type=int, default=512)
|
| 28 |
+
fit.add_argument("--steps", type=int, default=40)
|
| 29 |
+
fit.add_argument("--demo-mode", action="store_true")
|
| 30 |
+
|
| 31 |
+
bench = sub.add_parser("bench", help="run local gate-controller benchmarks and generate charts")
|
| 32 |
+
bench.add_argument("--steps", type=int, nargs="+", default=[20, 40, 80, 120])
|
| 33 |
+
bench.add_argument("--batch-size", type=int, default=64)
|
| 34 |
+
bench.add_argument("--seed", type=int, default=0)
|
| 35 |
+
bench.add_argument("--output-dir", default="artifacts/benchmarks")
|
| 36 |
+
bench.add_argument("--target-accuracy", type=float, default=0.99)
|
| 37 |
+
|
| 38 |
+
gui = sub.add_parser("gui", help="launch the runtime GUI")
|
| 39 |
+
|
| 40 |
+
return parser
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def main(argv: list[str] | None = None) -> int:
|
| 44 |
+
parser = build_parser()
|
| 45 |
+
args = parser.parse_args(argv)
|
| 46 |
+
|
| 47 |
+
if args.command == "demo":
|
| 48 |
+
result = run_tinygrad_gate_demo(steps=args.steps, batch_size=args.batch_size, seed=args.seed, target_accuracy=args.target_accuracy)
|
| 49 |
+
print("tinygrad gate demo")
|
| 50 |
+
print(f"initial_accuracy={result.initial_accuracy:.3f}")
|
| 51 |
+
print(f"final_accuracy={result.final_accuracy:.3f}")
|
| 52 |
+
print(f"final_loss={result.final_loss:.4f}")
|
| 53 |
+
print(f"target_accuracy={result.target_accuracy:.3f}")
|
| 54 |
+
print(f"reached_target={result.reached_target}")
|
| 55 |
+
print(f"trained_steps={result.trained_steps}")
|
| 56 |
+
print(f"learned_gate_sample={result.learned_gate_sample}")
|
| 57 |
+
return 0
|
| 58 |
+
|
| 59 |
+
if args.command == "bench":
|
| 60 |
+
result = run_benchmark_suite(
|
| 61 |
+
step_counts=args.steps,
|
| 62 |
+
batch_size=args.batch_size,
|
| 63 |
+
seed=args.seed,
|
| 64 |
+
output_dir=args.output_dir,
|
| 65 |
+
target_accuracy=args.target_accuracy,
|
| 66 |
+
)
|
| 67 |
+
print(f"benchmarks_csv={result.csv_path}")
|
| 68 |
+
print(f"charts={result.chart_paths}")
|
| 69 |
+
return 0
|
| 70 |
+
|
| 71 |
+
if args.command == "gui":
|
| 72 |
+
return launch_runtime_gui()
|
| 73 |
+
|
| 74 |
+
config = TrainerConfig(
|
| 75 |
+
model_name=args.model,
|
| 76 |
+
fallback_model_name=args.fallback_model,
|
| 77 |
+
train_jsonl=args.train_jsonl,
|
| 78 |
+
out_path=args.out,
|
| 79 |
+
gates=args.gates,
|
| 80 |
+
steps=args.steps,
|
| 81 |
+
demo_mode=args.demo_mode,
|
| 82 |
+
)
|
| 83 |
+
output_path = fit_controller(config)
|
| 84 |
+
print(f"saved_controller={output_path}")
|
| 85 |
+
print(f"config={asdict(config)}")
|
| 86 |
+
return 0
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
if __name__ == "__main__":
|
| 90 |
+
raise SystemExit(main())
|
src/openpeer_trainer/controller.py
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
@dataclass(slots=True)
|
| 8 |
+
class TrainerConfig:
|
| 9 |
+
model_name: str = "OpenPeerAI/OpenPeerLLM"
|
| 10 |
+
fallback_model_name: str = "sshleifer/tiny-gpt2"
|
| 11 |
+
gates: int = 512
|
| 12 |
+
steps: int = 40
|
| 13 |
+
out_path: str = "runs/openpeer_controller.pt"
|
| 14 |
+
train_jsonl: str | None = None
|
| 15 |
+
device: str = "auto"
|
| 16 |
+
demo_mode: bool = False
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def _build_demo_records() -> list[dict[str, str]]:
|
| 20 |
+
return [
|
| 21 |
+
{"prompt": "Question: 14 + 27 = ?\nAnswer:", "completion": " 41"},
|
| 22 |
+
{"prompt": "Question: 36 + 18 = ?\nAnswer:", "completion": " 54"},
|
| 23 |
+
{"prompt": "Question: 47 + 36 = ?\nAnswer:", "completion": " 83"},
|
| 24 |
+
{"prompt": "Question: 19 + 8 = ?\nAnswer:", "completion": " 27"},
|
| 25 |
+
]
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def fit_controller(config: TrainerConfig) -> str:
|
| 29 |
+
try:
|
| 30 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 31 |
+
from ntkmirror import ForwardFineTuner, load_jsonl_examples
|
| 32 |
+
except ImportError as exc: # pragma: no cover - dependency gate
|
| 33 |
+
raise RuntimeError(
|
| 34 |
+
"ntkmirror mode requires transformers, torch, and ntkmirror to be installed"
|
| 35 |
+
) from exc
|
| 36 |
+
|
| 37 |
+
train_path = Path(config.train_jsonl) if config.train_jsonl else Path("runs/demo_train.jsonl")
|
| 38 |
+
if not train_path.exists():
|
| 39 |
+
train_path.parent.mkdir(parents=True, exist_ok=True)
|
| 40 |
+
import json
|
| 41 |
+
|
| 42 |
+
with train_path.open("w", encoding="utf-8") as handle:
|
| 43 |
+
for record in _build_demo_records():
|
| 44 |
+
handle.write(json.dumps(record) + "\n")
|
| 45 |
+
|
| 46 |
+
model_name = config.model_name
|
| 47 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 48 |
+
|
| 49 |
+
try:
|
| 50 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
|
| 51 |
+
except Exception:
|
| 52 |
+
if not config.demo_mode:
|
| 53 |
+
raise
|
| 54 |
+
model_name = config.fallback_model_name
|
| 55 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 56 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
|
| 57 |
+
|
| 58 |
+
tuner = ForwardFineTuner(model, tokenizer, gates=config.gates)
|
| 59 |
+
tuner.fit(load_jsonl_examples(str(train_path)), steps=config.steps)
|
| 60 |
+
|
| 61 |
+
out_path = Path(config.out_path)
|
| 62 |
+
out_path.parent.mkdir(parents=True, exist_ok=True)
|
| 63 |
+
tuner.save(str(out_path))
|
| 64 |
+
return str(out_path)
|
src/openpeer_trainer/gui.py
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import subprocess
|
| 4 |
+
import sys
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
if __package__ in {None, ""}:
|
| 8 |
+
sys.path.append(str(Path(__file__).resolve().parents[1]))
|
| 9 |
+
from openpeer_trainer.benchmarks import run_benchmark_suite
|
| 10 |
+
from openpeer_trainer.hardware import collect_hardware_specs, hardware_table_rows
|
| 11 |
+
else:
|
| 12 |
+
from .benchmarks import run_benchmark_suite
|
| 13 |
+
from .hardware import collect_hardware_specs, hardware_table_rows
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def run_app() -> None:
|
| 17 |
+
try:
|
| 18 |
+
import streamlit as st
|
| 19 |
+
except ImportError as exc: # pragma: no cover - optional GUI dependency
|
| 20 |
+
raise RuntimeError("Streamlit is required for the runtime GUI. Install with `pip install -e \".[gui]\"`") from exc
|
| 21 |
+
|
| 22 |
+
st.set_page_config(page_title="OpenPeer NTK Trainer", page_icon="🧠", layout="wide")
|
| 23 |
+
|
| 24 |
+
specs = collect_hardware_specs()
|
| 25 |
+
st.title("OpenPeer NTK Trainer Runtime GUI")
|
| 26 |
+
st.caption("Live benchmark dashboard with hardware specs, target-accuracy training, and OpenBB-backed charts when available.")
|
| 27 |
+
|
| 28 |
+
with st.sidebar:
|
| 29 |
+
st.header("Runtime Controls")
|
| 30 |
+
step_text = st.text_input("Step schedule", value="10, 25, 50")
|
| 31 |
+
batch_size = st.slider("Batch size", min_value=8, max_value=256, value=64, step=8)
|
| 32 |
+
seed = st.number_input("Seed", min_value=0, max_value=10_000, value=0, step=1)
|
| 33 |
+
target_accuracy = st.slider("Target accuracy", min_value=0.90, max_value=0.999, value=0.99, step=0.001, format="%.3f")
|
| 34 |
+
output_dir = st.text_input("Output directory", value="artifacts/runtime_gui")
|
| 35 |
+
run_label = st.button("Run benchmark")
|
| 36 |
+
|
| 37 |
+
st.divider()
|
| 38 |
+
st.subheader("Hardware Specs")
|
| 39 |
+
st.write(f"Hostname: {specs.hostname}")
|
| 40 |
+
st.write(f"Platform: {specs.platform}")
|
| 41 |
+
st.write(f"CPU: {specs.cpu_model}")
|
| 42 |
+
st.write(f"Cores: {specs.physical_cores} physical / {specs.logical_cores} logical")
|
| 43 |
+
st.write(f"Memory: {specs.memory_available_gb:.2f} GB free of {specs.memory_total_gb:.2f} GB")
|
| 44 |
+
st.write(f"Disk: {specs.disk_free_gb:.2f} GB free of {specs.disk_total_gb:.2f} GB")
|
| 45 |
+
st.write(f"Python: {specs.python_version}")
|
| 46 |
+
st.write(f"CUDA: {'yes' if specs.cuda_available else 'no'}")
|
| 47 |
+
|
| 48 |
+
if run_label:
|
| 49 |
+
step_counts = [int(part.strip()) for part in step_text.split(",") if part.strip()]
|
| 50 |
+
result = run_benchmark_suite(
|
| 51 |
+
step_counts=step_counts,
|
| 52 |
+
batch_size=batch_size,
|
| 53 |
+
seed=int(seed),
|
| 54 |
+
output_dir=output_dir,
|
| 55 |
+
target_accuracy=float(target_accuracy),
|
| 56 |
+
)
|
| 57 |
+
st.success(f"Saved benchmark artifacts to {result.csv_path}")
|
| 58 |
+
st.session_state["latest_result"] = result
|
| 59 |
+
st.session_state["latest_output_dir"] = output_dir
|
| 60 |
+
|
| 61 |
+
output_dir_path = Path(st.session_state.get("latest_output_dir", output_dir))
|
| 62 |
+
dashboard_path = output_dir_path / "benchmark_dashboard.html"
|
| 63 |
+
csv_path = output_dir_path / "gate_benchmarks.csv"
|
| 64 |
+
|
| 65 |
+
cols = st.columns([1.1, 1.1, 1.1])
|
| 66 |
+
cols[0].metric("Hostname", specs.hostname)
|
| 67 |
+
cols[1].metric("CPU Cores", f"{specs.physical_cores}/{specs.logical_cores}")
|
| 68 |
+
cols[2].metric("Memory Free GB", f"{specs.memory_available_gb:.2f}")
|
| 69 |
+
|
| 70 |
+
st.subheader("Current Hardware")
|
| 71 |
+
st.table(hardware_table_rows(specs))
|
| 72 |
+
|
| 73 |
+
if csv_path.exists():
|
| 74 |
+
import pandas as pd
|
| 75 |
+
|
| 76 |
+
df = pd.read_csv(csv_path)
|
| 77 |
+
st.subheader("Benchmark Data")
|
| 78 |
+
st.dataframe(df, use_container_width=True)
|
| 79 |
+
else:
|
| 80 |
+
st.info("Run a benchmark to populate the table and dashboard.")
|
| 81 |
+
|
| 82 |
+
if dashboard_path.exists():
|
| 83 |
+
st.subheader("Dashboard Preview")
|
| 84 |
+
st.components.v1.html(dashboard_path.read_text(encoding="utf-8"), height=1200, scrolling=True)
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
def launch_runtime_gui() -> int:
|
| 88 |
+
app_path = Path(__file__).resolve()
|
| 89 |
+
command = [sys.executable, "-m", "streamlit", "run", str(app_path)]
|
| 90 |
+
print("Launching runtime GUI at http://localhost:8501")
|
| 91 |
+
completed = subprocess.run(command, check=False)
|
| 92 |
+
return int(completed.returncode)
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
if __name__ == "__main__":
|
| 96 |
+
run_app()
|
src/openpeer_trainer/hardware.py
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import asdict, dataclass
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
import os
|
| 6 |
+
import platform
|
| 7 |
+
import shutil
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
@dataclass(slots=True)
|
| 11 |
+
class HardwareSpecs:
|
| 12 |
+
hostname: str
|
| 13 |
+
platform: str
|
| 14 |
+
cpu_model: str
|
| 15 |
+
physical_cores: int
|
| 16 |
+
logical_cores: int
|
| 17 |
+
memory_total_gb: float
|
| 18 |
+
memory_available_gb: float
|
| 19 |
+
disk_total_gb: float
|
| 20 |
+
disk_free_gb: float
|
| 21 |
+
python_version: str
|
| 22 |
+
cuda_available: bool
|
| 23 |
+
cuda_device: str
|
| 24 |
+
|
| 25 |
+
def to_dict(self) -> dict[str, object]:
|
| 26 |
+
return asdict(self)
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def collect_hardware_specs() -> HardwareSpecs:
|
| 30 |
+
try:
|
| 31 |
+
import psutil
|
| 32 |
+
except Exception: # pragma: no cover - psutil is expected to be present in the venv
|
| 33 |
+
psutil = None
|
| 34 |
+
|
| 35 |
+
hostname = platform.node() or os.environ.get("COMPUTERNAME", "unknown")
|
| 36 |
+
platform_name = f"{platform.system()} {platform.release()}"
|
| 37 |
+
cpu_model = platform.processor() or platform.machine() or "unknown"
|
| 38 |
+
|
| 39 |
+
physical_cores = psutil.cpu_count(logical=False) if psutil else 0
|
| 40 |
+
logical_cores = psutil.cpu_count(logical=True) if psutil else os.cpu_count() or 0
|
| 41 |
+
|
| 42 |
+
memory_total_gb = 0.0
|
| 43 |
+
memory_available_gb = 0.0
|
| 44 |
+
disk_total_gb = 0.0
|
| 45 |
+
disk_free_gb = 0.0
|
| 46 |
+
if psutil:
|
| 47 |
+
memory = psutil.virtual_memory()
|
| 48 |
+
memory_total_gb = memory.total / (1024**3)
|
| 49 |
+
memory_available_gb = memory.available / (1024**3)
|
| 50 |
+
disk = psutil.disk_usage(str(Path.home().anchor or Path.cwd().anchor or Path.cwd()))
|
| 51 |
+
disk_total_gb = disk.total / (1024**3)
|
| 52 |
+
disk_free_gb = disk.free / (1024**3)
|
| 53 |
+
|
| 54 |
+
cuda_available = False
|
| 55 |
+
cuda_device = "cpu"
|
| 56 |
+
try:
|
| 57 |
+
import torch
|
| 58 |
+
|
| 59 |
+
cuda_available = bool(torch.cuda.is_available())
|
| 60 |
+
if cuda_available:
|
| 61 |
+
cuda_device = torch.cuda.get_device_name(0)
|
| 62 |
+
except Exception:
|
| 63 |
+
pass
|
| 64 |
+
|
| 65 |
+
return HardwareSpecs(
|
| 66 |
+
hostname=hostname,
|
| 67 |
+
platform=platform_name,
|
| 68 |
+
cpu_model=cpu_model,
|
| 69 |
+
physical_cores=int(physical_cores or 0),
|
| 70 |
+
logical_cores=int(logical_cores or 0),
|
| 71 |
+
memory_total_gb=memory_total_gb,
|
| 72 |
+
memory_available_gb=memory_available_gb,
|
| 73 |
+
disk_total_gb=disk_total_gb,
|
| 74 |
+
disk_free_gb=disk_free_gb,
|
| 75 |
+
python_version=platform.python_version(),
|
| 76 |
+
cuda_available=cuda_available,
|
| 77 |
+
cuda_device=cuda_device,
|
| 78 |
+
)
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def hardware_table_rows(specs: HardwareSpecs) -> list[dict[str, str]]:
|
| 82 |
+
return [
|
| 83 |
+
{"Metric": "Hostname", "Value": specs.hostname},
|
| 84 |
+
{"Metric": "Platform", "Value": specs.platform},
|
| 85 |
+
{"Metric": "CPU", "Value": specs.cpu_model},
|
| 86 |
+
{"Metric": "Physical Cores", "Value": str(specs.physical_cores)},
|
| 87 |
+
{"Metric": "Logical Cores", "Value": str(specs.logical_cores)},
|
| 88 |
+
{"Metric": "Memory Total (GB)", "Value": f"{specs.memory_total_gb:.2f}"},
|
| 89 |
+
{"Metric": "Memory Available (GB)", "Value": f"{specs.memory_available_gb:.2f}"},
|
| 90 |
+
{"Metric": "Disk Total (GB)", "Value": f"{specs.disk_total_gb:.2f}"},
|
| 91 |
+
{"Metric": "Disk Free (GB)", "Value": f"{specs.disk_free_gb:.2f}"},
|
| 92 |
+
{"Metric": "Python", "Value": specs.python_version},
|
| 93 |
+
{"Metric": "CUDA", "Value": "yes" if specs.cuda_available else "no"},
|
| 94 |
+
{"Metric": "CUDA Device", "Value": specs.cuda_device},
|
| 95 |
+
]
|
src/openpeer_trainer/smoke.py
ADDED
|
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from math import exp
|
| 5 |
+
from time import perf_counter
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
@dataclass(slots=True)
|
| 9 |
+
class StepTelemetry:
|
| 10 |
+
epoch: int
|
| 11 |
+
steps: int
|
| 12 |
+
wall_time_sec: float
|
| 13 |
+
memory_rss_mb: float
|
| 14 |
+
child_processes: int
|
| 15 |
+
thread_count: int
|
| 16 |
+
predictability_score: float
|
| 17 |
+
final_accuracy: float
|
| 18 |
+
final_loss: float
|
| 19 |
+
learned_gate_mean: float
|
| 20 |
+
learned_gate_std: float
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
@dataclass(slots=True)
|
| 24 |
+
class GateDemoResult:
|
| 25 |
+
initial_accuracy: float
|
| 26 |
+
final_accuracy: float
|
| 27 |
+
final_loss: float
|
| 28 |
+
reached_target: bool
|
| 29 |
+
trained_steps: int
|
| 30 |
+
target_accuracy: float
|
| 31 |
+
learned_gates: list[float]
|
| 32 |
+
learned_gate_sample: list[float]
|
| 33 |
+
telemetry: list[StepTelemetry]
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def _process_snapshot() -> tuple[float, int, int]:
|
| 37 |
+
try:
|
| 38 |
+
import psutil
|
| 39 |
+
|
| 40 |
+
process = psutil.Process()
|
| 41 |
+
memory_rss_mb = process.memory_info().rss / (1024 * 1024)
|
| 42 |
+
child_processes = len(process.children(recursive=True))
|
| 43 |
+
thread_count = process.num_threads()
|
| 44 |
+
return memory_rss_mb, child_processes, thread_count
|
| 45 |
+
except Exception:
|
| 46 |
+
return 0.0, 0, 0
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def run_tinygrad_gate_demo(
|
| 50 |
+
steps: int = 80,
|
| 51 |
+
batch_size: int = 64,
|
| 52 |
+
seed: int = 0,
|
| 53 |
+
target_accuracy: float = 0.99,
|
| 54 |
+
) -> GateDemoResult:
|
| 55 |
+
try:
|
| 56 |
+
from tinygrad import Tensor, nn
|
| 57 |
+
from tinygrad.nn.state import get_parameters
|
| 58 |
+
except ImportError as exc: # pragma: no cover - dependency gate
|
| 59 |
+
raise RuntimeError("tinygrad demo requires tinygrad to be installed") from exc
|
| 60 |
+
|
| 61 |
+
Tensor.manual_seed(seed)
|
| 62 |
+
|
| 63 |
+
input_dim = 12
|
| 64 |
+
classes = 2
|
| 65 |
+
samples = 128
|
| 66 |
+
|
| 67 |
+
features = Tensor.randn(samples, input_dim)
|
| 68 |
+
|
| 69 |
+
class GatedProbe:
|
| 70 |
+
def __init__(self) -> None:
|
| 71 |
+
self.base_weights = Tensor.linspace(0.5, 1.5, input_dim).is_param_(False)
|
| 72 |
+
self.log_gates = Tensor.zeros(input_dim)
|
| 73 |
+
|
| 74 |
+
def __call__(self, x: Tensor) -> Tensor:
|
| 75 |
+
score = (x * self.base_weights * self.log_gates.exp()).sum(axis=1)
|
| 76 |
+
return Tensor.stack(-score, score, dim=1)
|
| 77 |
+
|
| 78 |
+
teacher = GatedProbe()
|
| 79 |
+
teacher.log_gates = Tensor.linspace(-0.25, 0.75, input_dim).is_param_(False)
|
| 80 |
+
labels = teacher(features).argmax(-1)
|
| 81 |
+
|
| 82 |
+
student = GatedProbe()
|
| 83 |
+
optimizer = nn.optim.SGD(get_parameters(student), lr=0.8)
|
| 84 |
+
|
| 85 |
+
def accuracy(model: GatedProbe) -> float:
|
| 86 |
+
logits = model(features)
|
| 87 |
+
pred = logits.argmax(-1)
|
| 88 |
+
return float((pred == labels).sum().item()) / samples
|
| 89 |
+
|
| 90 |
+
initial_accuracy = accuracy(student)
|
| 91 |
+
|
| 92 |
+
telemetry: list[StepTelemetry] = []
|
| 93 |
+
start_time = perf_counter()
|
| 94 |
+
Tensor.training = True
|
| 95 |
+
reached_target = False
|
| 96 |
+
trained_steps = 0
|
| 97 |
+
for epoch in range(1, steps + 1):
|
| 98 |
+
batch_x = features
|
| 99 |
+
batch_y = labels
|
| 100 |
+
optimizer.zero_grad()
|
| 101 |
+
loss = student(batch_x).sparse_categorical_crossentropy(batch_y).backward()
|
| 102 |
+
optimizer.step()
|
| 103 |
+
trained_steps = epoch
|
| 104 |
+
|
| 105 |
+
if epoch == steps or epoch % max(1, steps // 8) == 0:
|
| 106 |
+
current_logits = student(features)
|
| 107 |
+
current_loss = float(current_logits.sparse_categorical_crossentropy(labels).item())
|
| 108 |
+
current_accuracy = accuracy(student)
|
| 109 |
+
memory_rss_mb, child_processes, thread_count = _process_snapshot()
|
| 110 |
+
learned_gates = [float(x) for x in student.log_gates.exp().tolist()]
|
| 111 |
+
telemetry.append(
|
| 112 |
+
StepTelemetry(
|
| 113 |
+
epoch=epoch,
|
| 114 |
+
steps=epoch,
|
| 115 |
+
wall_time_sec=perf_counter() - start_time,
|
| 116 |
+
memory_rss_mb=memory_rss_mb,
|
| 117 |
+
child_processes=child_processes,
|
| 118 |
+
thread_count=thread_count,
|
| 119 |
+
predictability_score=float(exp(-current_loss) * 100.0),
|
| 120 |
+
final_accuracy=current_accuracy,
|
| 121 |
+
final_loss=current_loss,
|
| 122 |
+
learned_gate_mean=sum(learned_gates) / max(len(learned_gates), 1),
|
| 123 |
+
learned_gate_std=(
|
| 124 |
+
(sum((x - (sum(learned_gates) / max(len(learned_gates), 1))) ** 2 for x in learned_gates) / max(len(learned_gates), 1))
|
| 125 |
+
** 0.5
|
| 126 |
+
),
|
| 127 |
+
)
|
| 128 |
+
)
|
| 129 |
+
if current_accuracy >= target_accuracy:
|
| 130 |
+
reached_target = True
|
| 131 |
+
break
|
| 132 |
+
|
| 133 |
+
Tensor.training = False
|
| 134 |
+
final_logits = student(features)
|
| 135 |
+
final_loss = float(final_logits.sparse_categorical_crossentropy(labels).item())
|
| 136 |
+
final_accuracy = accuracy(student)
|
| 137 |
+
learned_gates = [float(x) for x in student.log_gates.exp().tolist()]
|
| 138 |
+
gate_sample = learned_gates[:8]
|
| 139 |
+
|
| 140 |
+
return GateDemoResult(
|
| 141 |
+
initial_accuracy=initial_accuracy,
|
| 142 |
+
final_accuracy=final_accuracy,
|
| 143 |
+
final_loss=final_loss,
|
| 144 |
+
reached_target=reached_target,
|
| 145 |
+
trained_steps=trained_steps,
|
| 146 |
+
target_accuracy=target_accuracy,
|
| 147 |
+
learned_gates=learned_gates,
|
| 148 |
+
learned_gate_sample=gate_sample,
|
| 149 |
+
telemetry=telemetry,
|
| 150 |
+
)
|