Spaces:

dystrio
/

gpu-optimizer

Running

App Files Files Community

dystrio commited on Jan 1

Commit

c5be449

verified ·

1 Parent(s): 6459e10

Update index.html

Browse files

Files changed (1) hide show

index.html +50 -1

index.html CHANGED Viewed

@@ -18,6 +18,54 @@
             <p class="subtitle">GPU Placement Advisor for PyTorch/NCCL Workloads</p>
         </header>
         <!-- Step 1: Authentication -->
         <section class="card">
             <div class="card-title">
@@ -141,4 +189,5 @@
     <script src="app.js"></script>
 </body>
-</html>

             <p class="subtitle">GPU Placement Advisor for PyTorch/NCCL Workloads</p>
         </header>
+        <!-- How It Works (collapsible) -->
+        <details class="help-section">
+            <summary class="help-toggle">📖 How It Works</summary>
+            <div class="help-content">
+                <div class="help-block">
+                    <h3>What is Dystrio?</h3>
+                    <p>Dystrio analyzes your PyTorch distributed training communication patterns and generates
+                    Kubernetes pod affinity rules to co-locate GPUs that talk the most.</p>
+                </div>
+                <div class="help-block">
+                    <h3>How do I get a PyTorch trace?</h3>
+                    <p>Add this to your training script:</p>
+                    <pre><code>from torch.profiler import profile, ProfilerActivity
+with profile(
+    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
+    record_shapes=True,
+    with_stack=True
+) as prof:
+    # Your training step here
+    model(inputs)
+prof.export_chrome_trace("trace.json")</code></pre>
+                    <p>Upload the resulting <code>trace.json</code> file.</p>
+                </div>
+                <div class="help-block">
+                    <h3>What is Session ID / Multi-Run?</h3>
+                    <p><strong>Single run:</strong> Leave Session ID empty. You'll get recommendations based on one trace.</p>
+                    <p><strong>Multi-run (recommended):</strong> Use the same Session ID across multiple uploads.
+                    Dystrio tracks which communication patterns are <em>stable</em> vs <em>noisy</em>,
+                    giving you higher-confidence recommendations.</p>
+                    <p>Example: Upload 3 traces from different training runs with Session ID "llama-70b-training"
+                    → Dystrio identifies consistent patterns and escalates confidence from LOW → HIGH.</p>
+                </div>
+                <div class="help-block">
+                    <h3>How do I use the output?</h3>
+                    <ol>
+                        <li>Copy the generated Kubernetes YAML</li>
+                        <li>Add the <code>affinity:</code> block to your Pod spec</li>
+                        <li>Deploy – Kubernetes will schedule communicating pods together</li>
+                    </ol>
+                </div>
+            </div>
+        </details>
         <!-- Step 1: Authentication -->
         <section class="card">
             <div class="card-title">
     <script src="app.js"></script>
 </body>
+</html>