Buckets:

hf-doc-build/doc-dev / transformers /pr_33913 /it /perf_hardware.html
download
raw
26 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Hardware ottimizzato per l’addestramento&quot;,&quot;local&quot;:&quot;hardware-ottimizzato-per-laddestramento&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;GPU&quot;,&quot;local&quot;:&quot;gpu&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Potenza e Raffreddamento&quot;,&quot;local&quot;:&quot;potenza-e-raffreddamento&quot;,&quot;sections&quot;:[],&quot;depth&quot;:3},{&quot;title&quot;:&quot;Connettività multi-GPU&quot;,&quot;local&quot;:&quot;connettività-multi-gpu&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;NVlink&quot;,&quot;local&quot;:&quot;nvlink&quot;,&quot;sections&quot;:[],&quot;depth&quot;:4}],&quot;depth&quot;:3}],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/transformers/pr_33913/it/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/entry/start.cef8f86b.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/scheduler.36a0863c.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/singletons.39cd3b38.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/index.733708bb.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/paths.64312358.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/entry/app.d399d34f.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/index.9c13489a.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/nodes/0.7859fb6f.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/nodes/17.24a3a2e6.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/CodeBlock.05d8ec32.js">
<link rel="modulepreload" href="/docs/transformers/pr_33913/it/_app/immutable/chunks/EditOnGithub.e88f2b7b.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Hardware ottimizzato per l’addestramento&quot;,&quot;local&quot;:&quot;hardware-ottimizzato-per-laddestramento&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;GPU&quot;,&quot;local&quot;:&quot;gpu&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Potenza e Raffreddamento&quot;,&quot;local&quot;:&quot;potenza-e-raffreddamento&quot;,&quot;sections&quot;:[],&quot;depth&quot;:3},{&quot;title&quot;:&quot;Connettività multi-GPU&quot;,&quot;local&quot;:&quot;connettività-multi-gpu&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;NVlink&quot;,&quot;local&quot;:&quot;nvlink&quot;,&quot;sections&quot;:[],&quot;depth&quot;:4}],&quot;depth&quot;:3}],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="hardware-ottimizzato-per-laddestramento" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#hardware-ottimizzato-per-laddestramento"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Hardware ottimizzato per l’addestramento</span></h1> <p data-svelte-h="svelte-1ead13v">L’hardware utilizzato per eseguire l’addestramento del modello e l’inferenza può avere un grande effetto sulle prestazioni. Per un analisi approfondita delle GPUs, assicurati di dare un’occhiata all’eccellente <a href="https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/" rel="nofollow">blog post</a> di Tim Dettmer.</p> <p data-svelte-h="svelte-1br1fvm">Diamo un’occhiata ad alcuni consigli pratici per la configurazione della GPU.</p> <h2 class="relative group"><a id="gpu" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#gpu"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>GPU</span></h2> <p data-svelte-h="svelte-yqi27l">Quando si addestrano modelli più grandi ci sono essenzialmente tre opzioni:</p> <ul data-svelte-h="svelte-1nmki5w"><li>GPUs piu’ grandi</li> <li>Piu’ GPUs</li> <li>Piu’ CPU e piu’ NVMe (scaricato da <a href="main_classes/deepspeed#nvme-support">DeepSpeed-Infinity</a>)</li></ul> <p data-svelte-h="svelte-1ytyi1x">Iniziamo dal caso in cui ci sia una singola GPU.</p> <h3 class="relative group"><a id="potenza-e-raffreddamento" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#potenza-e-raffreddamento"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Potenza e Raffreddamento</span></h3> <p data-svelte-h="svelte-1yjfvva">Se hai acquistato una costosa GPU di fascia alta, assicurati di darle la potenza corretta e un raffreddamento sufficiente.</p> <p data-svelte-h="svelte-zy6waq"><strong>Potenza</strong>:</p> <p data-svelte-h="svelte-1g53k7p">Alcune schede GPU consumer di fascia alta hanno 2 e talvolta 3 prese di alimentazione PCI-E a 8 pin. Assicurati di avere tanti cavi PCI-E a 8 pin indipendenti da 12 V collegati alla scheda quante sono le prese. Non utilizzare le 2 fessure a un’estremità dello stesso cavo (noto anche come cavo a spirale). Cioè se hai 2 prese sulla GPU, vuoi 2 cavi PCI-E a 8 pin che vanno dall’alimentatore alla scheda e non uno che abbia 2 connettori PCI-E a 8 pin alla fine! In caso contrario, non otterrai tutte le prestazioni ufficiali.</p> <p data-svelte-h="svelte-tpu5y0">Ciascun cavo di alimentazione PCI-E a 8 pin deve essere collegato a una guida da 12 V sul lato dell’alimentatore e può fornire fino a 150 W di potenza.</p> <p data-svelte-h="svelte-1w4tr9k">Alcune altre schede possono utilizzare connettori PCI-E a 12 pin e questi possono fornire fino a 500-600 W di potenza.</p> <p data-svelte-h="svelte-frcoas">Le schede di fascia bassa possono utilizzare connettori a 6 pin, che forniscono fino a 75 W di potenza.</p> <p data-svelte-h="svelte-uvs3kl">Inoltre vuoi un alimentatore (PSU) di fascia alta che abbia una tensione stabile. Alcuni PSU di qualità inferiore potrebbero non fornire alla scheda la tensione stabile di cui ha bisogno per funzionare al massimo.</p> <p data-svelte-h="svelte-1d6nft9">E ovviamente l’alimentatore deve avere abbastanza Watt inutilizzati per alimentare la scheda.</p> <p data-svelte-h="svelte-110coev"><strong>Raffreddamento</strong>:</p> <p data-svelte-h="svelte-17ayxfb">Quando una GPU si surriscalda, inizierà a rallentare e non fornirà le prestazioni mssimali e potrebbe persino spegnersi se diventasse troppo calda.</p> <p data-svelte-h="svelte-yst7a4">È difficile dire l’esatta temperatura migliore a cui aspirare quando una GPU è molto caricata, ma probabilmente qualsiasi cosa al di sotto di +80°C va bene, ma più bassa è meglio - forse 70-75°C è un intervallo eccellente in cui trovarsi. È probabile che il rallentamento inizi a circa 84-90°C. Ma oltre alla limitazione delle prestazioni, una temperatura molto elevata prolungata è probabile che riduca la durata di una GPU.</p> <p data-svelte-h="svelte-1xbhyar">Diamo quindi un’occhiata a uno degli aspetti più importanti quando si hanno più GPU: la connettività.</p> <h3 class="relative group"><a id="connettività-multi-gpu" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#connettività-multi-gpu"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Connettività multi-GPU</span></h3> <p data-svelte-h="svelte-1ep8son">Se utilizzi più GPU, il modo in cui le schede sono interconnesse può avere un enorme impatto sul tempo totale di allenamento. Se le GPU si trovano sullo stesso nodo fisico, puoi eseguire:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->nvidia-smi topo -m<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1ytlzkk">e ti dirà come sono interconnesse le GPU. Su una macchina con doppia GPU e collegata a NVLink, molto probabilmente vedrai qualcosa del tipo:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --> <span class="hljs-attribute">GPU0</span> GPU1 CPU Affinity NUMA Affinity
<span class="hljs-attribute">GPU0</span> X NV2 <span class="hljs-number">0</span>-<span class="hljs-number">23</span> N/A
<span class="hljs-attribute">GPU1</span> NV2 X <span class="hljs-number">0</span>-<span class="hljs-number">23</span> N/A<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1n0iwi9">su una macchina diversa senza NVLink potremmo vedere:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --> <span class="hljs-attribute">GPU0</span> GPU1 CPU Affinity NUMA Affinity
<span class="hljs-attribute">GPU0</span> X PHB <span class="hljs-number">0</span>-<span class="hljs-number">11</span> N/A
<span class="hljs-attribute">GPU1</span> PHB X <span class="hljs-number">0</span>-<span class="hljs-number">11</span> N/A<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-pl67un">Il rapporto include questa legenda:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --> X = Self
SYS = Connection traversing PCIe <span class="hljs-keyword">as</span> well <span class="hljs-keyword">as</span> <span class="hljs-keyword">the</span> SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe <span class="hljs-keyword">as</span> well <span class="hljs-keyword">as</span> <span class="hljs-keyword">the</span> interconnect between PCIe Host Bridges <span class="hljs-keyword">within</span> <span class="hljs-keyword">a</span> NUMA node
PHB = Connection traversing PCIe <span class="hljs-keyword">as</span> well <span class="hljs-keyword">as</span> <span class="hljs-keyword">a</span> PCIe Host Bridge (typically <span class="hljs-keyword">the</span> CPU)
PXB = Connection traversing multiple PCIe bridges (<span class="hljs-keyword">without</span> traversing <span class="hljs-keyword">the</span> PCIe Host Bridge)
PIX = Connection traversing <span class="hljs-keyword">at</span> most <span class="hljs-keyword">a</span> single PCIe bridge
NV<span class="hljs-comment"># = Connection traversing a bonded set of # NVLinks</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-poa7qe">Quindi il primo rapporto <code>NV2</code> ci dice che le GPU sono interconnesse con 2 NVLinks e nel secondo report <code>PHB</code> abbiamo una tipica configurazione PCIe+Bridge a livello di consumatore.</p> <p data-svelte-h="svelte-1mtptpa">Controlla che tipo di connettività hai sulla tua configurazione. Alcuni di questi renderanno la comunicazione tra le carte più veloce (es. NVLink), altri più lenta (es. PHB).</p> <p data-svelte-h="svelte-pvckob">A seconda del tipo di soluzione di scalabilità utilizzata, la velocità di connettività potrebbe avere un impatto maggiore o minore. Se le GPU devono sincronizzarsi raramente, come in DDP, l’impatto di una connessione più lenta sarà meno significativo. Se le GPU devono scambiarsi messaggi spesso, come in ZeRO-DP, una connettività più veloce diventa estremamente importante per ottenere un addestramento più veloce.</p> <h4 class="relative group"><a id="nvlink" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#nvlink"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>NVlink</span></h4> <p data-svelte-h="svelte-19nbh14"><a href="https://en.wikipedia.org/wiki/NVLink" rel="nofollow">NVLink</a> è un collegamento di comunicazione a corto raggio multilinea seriale basato su cavo sviluppato da Nvidia.</p> <p data-svelte-h="svelte-48x68i">Ogni nuova generazione fornisce una larghezza di banda più veloce, ad es. ecco una citazione da <a href="https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf" rel="nofollow">Nvidia Ampere GA102 GPU Architecture</a>:</p> <blockquote data-svelte-h="svelte-14khom1"><p>Third-Generation NVLink®
GA102 GPUs utilize NVIDIA’s third-generation NVLink interface, which includes four x4 links,
with each link providing 14.0625 GB/sec bandwidth in each direction between two GPUs. Four
links provide 56.25 GB/sec bandwidth in each direction, and 112.5 GB/sec total bandwidth
between two GPUs. Two RTX 3090 GPUs can be connected together for SLI using NVLink.
(Note that 3-Way and 4-Way SLI configurations are not supported.)</p></blockquote> <p data-svelte-h="svelte-o2ocg3">Quindi più <code>X</code> si ottiene nel rapporto di <code>NVX</code> nell’output di <code>nvidia-smi topo -m</code>, meglio è. La generazione dipenderà dall’architettura della tua GPU.</p> <p data-svelte-h="svelte-rs31fi">Confrontiamo l’esecuzione di un training del modello di linguaggio openai-community/gpt2 su un piccolo campione di wikitext</p> <p data-svelte-h="svelte-8m003l">I risultati sono:</p> <table data-svelte-h="svelte-1hvmx1i"><thead><tr><th>NVlink</th> <th align="right">Time</th></tr></thead> <tbody><tr><td>Y</td> <td align="right">101s</td></tr> <tr><td>N</td> <td align="right">131s</td></tr></tbody></table> <p data-svelte-h="svelte-1l5ijyg">Puoi vedere che NVLink completa l’addestramento circa il 23% più velocemente. Nel secondo benchmark utilizziamo <code>NCCL_P2P_DISABLE=1</code> per dire alle GPU di non utilizzare NVLink.</p> <p data-svelte-h="svelte-16pifow">Ecco il codice benchmark completo e gli output:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># DDP w/ NVLink</span>
<span class="hljs-built_in">rm</span> -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train \
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{<span class="hljs-string">&#x27;train_runtime&#x27;</span>: 101.9003, <span class="hljs-string">&#x27;train_samples_per_second&#x27;</span>: 1.963, <span class="hljs-string">&#x27;epoch&#x27;</span>: 0.69}
<span class="hljs-comment"># DDP w/o NVLink</span>
<span class="hljs-built_in">rm</span> -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 torchrun \
--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{<span class="hljs-string">&#x27;train_runtime&#x27;</span>: 131.4367, <span class="hljs-string">&#x27;train_samples_per_second&#x27;</span>: 1.522, <span class="hljs-string">&#x27;epoch&#x27;</span>: 0.69}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-pittpg">Hardware: 2x TITAN RTX 24GB each + NVlink with 2 NVLinks (<code>NV2</code> in <code>nvidia-smi topo -m</code>)
Software: <code>pytorch-1.8-to-be</code> + <code>cuda-11.0</code> / <code>transformers==4.3.0.dev0</code></p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/it/perf_hardware.md" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_728xau = {
assets: "/docs/transformers/pr_33913/it",
base: "/docs/transformers/pr_33913/it",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/transformers/pr_33913/it/_app/immutable/entry/start.cef8f86b.js"),
import("/docs/transformers/pr_33913/it/_app/immutable/entry/app.d399d34f.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 17],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
26 kB
·
Xet hash:
4376a2d2f4af7385e07208076f39b356b617d3e21902f436a80de34f0e89d3b0

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.