Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Trainer API로 모델 미세 조정하기","local":"fine-tuning-a-model-with-the-trainer-api","sections":[{"title":"훈련","local":"training","sections":[],"depth":3},{"title":"평가","local":"evaluation","sections":[],"depth":3},{"title":"고급 훈련 기능","local":"advanced-training-features","sections":[],"depth":3},{"title":"섹션 퀴즈","local":"section-quiz","sections":[{"title":"1. Trainer 에서 <code> processing_class </code> 매개변수의 목적은 무엇인가요?","local":"1-trainer-에서-code-processingclass-code-매개변수의-목적은-무엇인가요","sections":[],"depth":3},{"title":"2. 훈련 중 평가가 얼마나 자주 발생하는지를 제어하는 TrainingArguments 매개변수는 무엇인가요?","local":"2-훈련-중-평가가-얼마나-자주-발생하는지를-제어하는-trainingarguments-매개변수는-무엇인가요","sections":[],"depth":3},{"title":"3. TrainingArguments에서 <code> fp16=True </code> 는 무엇을 활성화하나요?","local":"3-trainingarguments에서-code-fp16true-code-는-무엇을-활성화하나요","sections":[],"depth":3},{"title":"4. Trainer에서 <code> compute_metrics </code> 함수의 역할은 무엇인가요?","local":"4-trainer에서-code-computemetrics-code-함수의-역할은-무엇인가요","sections":[],"depth":3},{"title":"5. Trainer에 <code> eval_dataset </code> 을 제공하지 않으면 어떻게 되나요?","local":"5-trainer에-code-evaldataset-code-을-제공하지-않으면-어떻게-되나요","sections":[],"depth":3},{"title":"6. 그레이디언트 누적이란 무엇이며 어떻게 활성화하나요?","local":"6-그레이디언트-누적이란-무엇이며-어떻게-활성화하나요","sections":[],"depth":3}],"depth":2}],"depth":1}"> | |
| <link href="/docs/course/pr_1069/ko/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/entry/start.667ba29b.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/scheduler.37c15a92.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/singletons.8de3d8e2.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/index.18351ede.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/paths.6047fff5.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/entry/app.65e35200.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/index.2bf4358c.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/nodes/0.9169bc8c.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/nodes/22.e0b3784e.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/Tip.363c041f.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/Youtube.1e50a667.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/CodeBlock.4e987730.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/CourseFloatingBanner.9ff4c771.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/DocNotebookDropdown.efc1fb7c.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/Question.668688bc.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/FrameworkSwitchCourse.8d4d4ab6.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1069/ko/_app/immutable/chunks/getInferenceSnippets.24b50994.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Trainer API로 모델 미세 조정하기","local":"fine-tuning-a-model-with-the-trainer-api","sections":[{"title":"훈련","local":"training","sections":[],"depth":3},{"title":"평가","local":"evaluation","sections":[],"depth":3},{"title":"고급 훈련 기능","local":"advanced-training-features","sections":[],"depth":3},{"title":"섹션 퀴즈","local":"section-quiz","sections":[{"title":"1. Trainer 에서 <code> processing_class </code> 매개변수의 목적은 무엇인가요?","local":"1-trainer-에서-code-processingclass-code-매개변수의-목적은-무엇인가요","sections":[],"depth":3},{"title":"2. 훈련 중 평가가 얼마나 자주 발생하는지를 제어하는 TrainingArguments 매개변수는 무엇인가요?","local":"2-훈련-중-평가가-얼마나-자주-발생하는지를-제어하는-trainingarguments-매개변수는-무엇인가요","sections":[],"depth":3},{"title":"3. TrainingArguments에서 <code> fp16=True </code> 는 무엇을 활성화하나요?","local":"3-trainingarguments에서-code-fp16true-code-는-무엇을-활성화하나요","sections":[],"depth":3},{"title":"4. Trainer에서 <code> compute_metrics </code> 함수의 역할은 무엇인가요?","local":"4-trainer에서-code-computemetrics-code-함수의-역할은-무엇인가요","sections":[],"depth":3},{"title":"5. Trainer에 <code> eval_dataset </code> 을 제공하지 않으면 어떻게 되나요?","local":"5-trainer에-code-evaldataset-code-을-제공하지-않으면-어떻게-되나요","sections":[],"depth":3},{"title":"6. 그레이디언트 누적이란 무엇이며 어떻게 활성화하나요?","local":"6-그레이디언트-누적이란-무엇이며-어떻게-활성화하나요","sections":[],"depth":3}],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <div class="bg-white leading-none border border-gray-100 rounded-lg flex p-0.5 w-56 text-sm mb-4"><a class="flex justify-center flex-1 py-1.5 px-2.5 focus:outline-none !no-underline rounded-l bg-red-50 dark:bg-transparent text-red-600" href="?fw=pt"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><defs><clipPath id="a"><rect x="3.05" y="0.5" width="25.73" height="31" fill="none"></rect></clipPath></defs><g clip-path="url(#a)"><path d="M24.94,9.51a12.81,12.81,0,0,1,0,18.16,12.68,12.68,0,0,1-18,0,12.81,12.81,0,0,1,0-18.16l9-9V5l-.84.83-6,6a9.58,9.58,0,1,0,13.55,0ZM20.44,9a1.68,1.68,0,1,1,1.67-1.67A1.68,1.68,0,0,1,20.44,9Z" fill="#ee4c2c"></path></g></svg> Pytorch </a><a class="flex justify-center flex-1 py-1.5 px-2.5 focus:outline-none !no-underline rounded-r text-gray-500 filter grayscale" href="?fw=tf"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="0.94em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 274"><path d="M145.726 42.065v42.07l72.861 42.07v-42.07l-72.86-42.07zM0 84.135v42.07l36.43 21.03V105.17L0 84.135zm109.291 21.035l-36.43 21.034v126.2l36.43 21.035v-84.135l36.435 21.035v-42.07l-36.435-21.034V105.17z" fill="#E55B2D"></path><path d="M145.726 42.065L36.43 105.17v42.065l72.861-42.065v42.065l36.435-21.03v-84.14zM255.022 63.1l-36.435 21.035v42.07l36.435-21.035V63.1zm-72.865 84.135l-36.43 21.035v42.07l36.43-21.036v-42.07zm-36.43 63.104l-36.436-21.035v84.135l36.435-21.035V210.34z" fill="#ED8E24"></path><path d="M145.726 0L0 84.135l36.43 21.035l109.296-63.105l72.861 42.07L255.022 63.1L145.726 0zm0 126.204l-36.435 21.03l36.435 21.036l36.43-21.035l-36.43-21.03z" fill="#F8BF3C"></path></svg> TensorFlow </a></div> <h1 class="relative group"><a id="fine-tuning-a-model-with-the-trainer-api" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#fine-tuning-a-model-with-the-trainer-api"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Trainer API로 모델 미세 조정하기</span></h1> <div class="flex space-x-1 absolute z-10 right-0 top-0"><a href="https://discuss.huggingface.co/t/chapter-3-questions" target="_blank"><img alt="Ask a Question" class="!m-0" src="https://img.shields.io/badge/Ask%20a%20question-ffcb4c.svg?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgLTEgMTA0IDEwNiI+PGRlZnM+PHN0eWxlPi5jbHMtMXtmaWxsOiMyMzFmMjA7fS5jbHMtMntmaWxsOiNmZmY5YWU7fS5jbHMtM3tmaWxsOiMwMGFlZWY7fS5jbHMtNHtmaWxsOiMwMGE5NGY7fS5jbHMtNXtmaWxsOiNmMTVkMjI7fS5jbHMtNntmaWxsOiNlMzFiMjM7fTwvc3R5bGU+PC9kZWZzPjx0aXRsZT5EaXNjb3Vyc2VfbG9nbzwvdGl0bGU+PGcgaWQ9IkxheWVyXzIiPjxnIGlkPSJMYXllcl8zIj48cGF0aCBjbGFzcz0iY2xzLTEiIGQ9Ik01MS44NywwQzIzLjcxLDAsMCwyMi44MywwLDUxYzAsLjkxLDAsNTIuODEsMCw1Mi44MWw1MS44Ni0uMDVjMjguMTYsMCw1MS0yMy43MSw1MS01MS44N1M4MCwwLDUxLjg3LDBaIi8+PHBhdGggY2xhc3M9ImNscy0yIiBkPSJNNTIuMzcsMTkuNzRBMzEuNjIsMzEuNjIsMCwwLDAsMjQuNTgsNjYuNDFsLTUuNzIsMTguNEwzOS40LDgwLjE3YTMxLjYxLDMxLjYxLDAsMSwwLDEzLTYwLjQzWiIvPjxwYXRoIGNsYXNzPSJjbHMtMyIgZD0iTTc3LjQ1LDMyLjEyYTMxLjYsMzEuNiwwLDAsMS0zOC4wNSw0OEwxOC44Niw4NC44MmwyMC45MS0yLjQ3QTMxLjYsMzEuNiwwLDAsMCw3Ny40NSwzMi4xMloiLz48cGF0aCBjbGFzcz0iY2xzLTQiIGQ9Ik03MS42MywyNi4yOUEzMS42LDMxLjYsMCwwLDEsMzguOCw3OEwxOC44Niw4NC44MiwzOS40LDgwLjE3QTMxLjYsMzEuNiwwLDAsMCw3MS42MywyNi4yOVoiLz48cGF0aCBjbGFzcz0iY2xzLTUiIGQ9Ik0yNi40Nyw2Ny4xMWEzMS42MSwzMS42MSwwLDAsMSw1MS0zNUEzMS42MSwzMS42MSwwLDAsMCwyNC41OCw2Ni40MWwtNS43MiwxOC40WiIvPjxwYXRoIGNsYXNzPSJjbHMtNiIgZD0iTTI0LjU4LDY2LjQxQTMxLjYxLDMxLjYxLDAsMCwxLDcxLjYzLDI2LjI5YTMxLjYxLDMxLjYxLDAsMCwwLTQ5LDM5LjYzbC0zLjc2LDE4LjlaIi8+PC9nPjwvZz48L3N2Zz4="></a> <a href="https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter3/section3.ipynb" target="_blank"><img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter3/section3.ipynb" target="_blank"><img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"></a></div> <iframe class="w-full xl:w-4/6 h-80" src="https://www.youtube-nocookie.com/embed/nvBXf7s7vTI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <p data-svelte-h="svelte-knkc89">🤗 Transformers는 <code>Trainer</code> 클래스를 제공합니다. 이 클래스를 사용하면 사전 학습된 모델을 여러분의 데이터셋에 맞춰 최신 기법으로 쉽게 미세 조정할 수 있습니다. 이전 섹션에서 데이터 전처리 작업을 모두 마쳤다면 이제 몇 단계만 거치면 <code>Trainer</code>를 정의할 수 있습니다. 가장 어려운 부분은 <code>Trainer.train()</code>을 실행할 환경을 준비하는 과정일 수 있습니다. 이 작업은 CPU에서 매우 느리게 실행되기 때문입니다. 만약 GPU가 없다면 <a href="https://colab.research.google.com/" rel="nofollow">Google Colab</a>에서 무료로 제공하는 GPU나 TPU를 이용할 수 있습니다.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1d8z8sh">📚 <strong>훈련 리소스</strong>: 훈련을 시작하기 전에 포괄적인 <a href="https://huggingface.co/docs/transformers/main/en/training" rel="nofollow">🤗 Transformers 훈련 가이드</a>를 숙지하고 <a href="https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu" rel="nofollow">미세 조정 쿡북</a>의 실용적인 예제를 살펴보세요.</p></div> <p data-svelte-h="svelte-hs8a7l">아래 코드 예시는 이전 섹션의 코드를 모두 실행했다는 가정하에 작동합니다. 즉, 다음 사항들이 필요합니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset | |
| <span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer, DataCollatorWithPadding | |
| raw_datasets = load_dataset(<span class="hljs-string">"glue"</span>, <span class="hljs-string">"mrpc"</span>) | |
| checkpoint = <span class="hljs-string">"bert-base-uncased"</span> | |
| tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">tokenize_function</span>(<span class="hljs-params">example</span>): | |
| <span class="hljs-keyword">return</span> tokenizer(example[<span class="hljs-string">"sentence1"</span>], example[<span class="hljs-string">"sentence2"</span>], truncation=<span class="hljs-literal">True</span>) | |
| tokenized_datasets = raw_datasets.<span class="hljs-built_in">map</span>(tokenize_function, batched=<span class="hljs-literal">True</span>) | |
| data_collator = DataCollatorWithPadding(tokenizer=tokenizer)<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="training" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#training"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>훈련</span></h3> <p data-svelte-h="svelte-1enyha7"><code>Trainer</code>를 정의하기 전 첫 번째 단계는 <code>Trainer</code>가 훈련 및 평가에 사용할 모든 하이퍼파라미터를 담을<code>TrainingArguments</code> 클래스를 정의하는 것입니다. 필수로 제공해야 하는 유일한 인수는 훈련된 모델과 중간 체크포인트가 저장될 디렉토리입니다. 나머지는 기본값으로 둘 수 있으며, 기본적인 미세 조정 작업에는 충분한 설정입니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments | |
| training_args = TrainingArguments(<span class="hljs-string">"test-trainer"</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-18au6nv">훈련 중에 모델을 Hub에 자동으로 업로드하려면 <code>TrainingArguments</code>에서 <code>push_to_hub=True</code>를 전달하세요. 이 기능에 대해서는 <a href="/course/chapter4/3">Chapter 4</a>에서 자세히 알아보겠습니다.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-oywe3r">🚀 <strong>고급 설정</strong>: 사용 가능한 모든 훈련 인수와 최적화 전략에 대한 자세한 정보는 <a href="https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments" rel="nofollow">TrainingArguments 문서</a>와 <a href="https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu" rel="nofollow">훈련 구성 쿡북</a>을 참고하세요.</p></div> <p data-svelte-h="svelte-nt9c4b">두 번째 단계는 모델을 정의하는 것입니다. <a href="/course/chapter2">이전 챕터</a>에서와 같이 두 개의 라벨과 함께 <code>AutoModelForSequenceClassification</code> 클래스를 사용하겠습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForSequenceClassification | |
| model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=<span class="hljs-number">2</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-s3sck9"><a href="/course/chapter2">Chapter 2</a>와 달리 이 사전 훈련된 모델을 인스턴스화하면 경고 메시지가 나타나는 것을 확인할 수 있습니다. 이는 BERT가 문장 쌍 분류를 위해 사전 훈련되지 않았기 때문에, 사전 훈련된 모델의 헤드가 제거되고 시퀀스 분류에 적합한 새로운 헤드가 추가되었기 때문입니다. 경고는 일부 가중치(제거된 사전 훈련 헤드에 해당하는 가중치)가 사용되지 않았고, 일부 다른 가중치(새로운 헤드용)가 무작위로 초기화되었다는 것을 나타냅니다. 마지막으로 모델을 훈련시키라는 메시지가 나오는데, 바로 지금부터 그 작업을 시작하겠습니다.</p> <p data-svelte-h="svelte-4ai6qh">모델이 준비되면, 지금까지 구성한 모든 객체(<code>model</code>, <code>training_args</code>, 훈련 및 검증 데이터셋, <code>data_collator</code>, <code>processing_class</code>)를 전달하여 <code>Trainer</code>를 정의할 수 있습니다. <code>processing_class</code> 매개변수는 비교적 최근에 추가된 기능으로, <code>Trainer</code>에게 어떤 토크나이저를 사용해 데이터를 처리할지 알려줍니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer | |
| trainer = Trainer( | |
| model, | |
| training_args, | |
| train_dataset=tokenized_datasets[<span class="hljs-string">"train"</span>], | |
| eval_dataset=tokenized_datasets[<span class="hljs-string">"validation"</span>], | |
| data_collator=data_collator, | |
| processing_class=tokenizer, | |
| )<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1m2yas3"><code>processing_class</code>에 토크나이저를 전달하면, <code>Trainer</code>가 기본적으로 <code>DataCollatorWithPadding</code>을 <code>data_collator</code>로 사용합니다. 따라서 이 경우에는 <code>data_collator=data_collator</code> 줄을 생략할 수 있지만, 데이터 처리 파이프라인의 중요한 부분을 보여드리기 위해 코드에 포함했습니다.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1vnjsqm">📖 <strong>더 자세히 알아보기</strong>: Trainer 클래스와 그 매개변수에 대한 자세한 내용은 <a href="https://huggingface.co/docs/transformers/main/en/main_classes/trainer" rel="nofollow">Trainer API 문서</a>를 방문하고 <a href="https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu" rel="nofollow">훈련 쿡북 레시피</a>에서 고급 사용 패턴을 살펴보세요.</p></div> <p data-svelte-h="svelte-13y4n8x">데이터셋에서 모델을 미세 조정하려면 <code>Trainer</code>의 <code>train()</code> 메소드를 호출하기만 하면 됩니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->trainer.train()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1vb1ztl">이렇게 하면 미세 조정이 시작됩니다(GPU에서는 몇 분 정도 소요됩니다). 500단계마다 훈련 손실이 출력되지만, 모델의 성능이 얼마나 좋은지(또는 나쁜지)는 알려주지 않습니다. 그 이유는 다음과 같습니다.</p> <ol data-svelte-h="svelte-1kijd9m"><li><code>TrainingArguments</code>에서 <code>eval_strategy</code>를 <code>"steps"</code> (매 <code>eval_steps</code>마다 평가) 또는 <code>"epoch"</code> (각 에포크 종료 시 평가)로 설정하지 않았습니다.</li> <li>평가 중에 메트릭을 계산하기 위한 <code>compute_metrics()</code> 함수를 <code>Trainer</code>에 제공하지 않았습니다. 이 함수가 없으면 평가에서 손실 값만 출력되는데, 이 값만으로는 성능을 파악하기 어렵습니다.</li></ol> <h3 class="relative group"><a id="evaluation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#evaluation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>평가</span></h3> <p data-svelte-h="svelte-1kov8l9">이제 유용한 <code>compute_metrics()</code> 함수를 어떻게 만들고 다음 훈련 시 사용할 수 있는지 알아보겠습니다. 이 함수는 <code>EvalPrediction</code> 객체(<code>predictions</code> 필드와 <code>label_ids</code> 필드를 갖는 명명된 튜플)를 입력받습니다. 그리고 각 메트릭의 이름을 키(문자열)로, 성능을 값(부동소수점)으로 갖는 딕셔너리를 반환해야 합니다. 모델의 예측값을 얻기 위해 <code>Trainer.predict()</code>를 사용할 수 있습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->predictions = trainer.predict(tokenized_datasets[<span class="hljs-string">"validation"</span>]) | |
| <span class="hljs-built_in">print</span>(predictions.predictions.shape, predictions.label_ids.shape)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->(<span class="hljs-number">408</span>, <span class="hljs-number">2</span>) (<span class="hljs-number">408</span>,)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-44hwg3"><code>predict()</code> 메소드의 출력은 <code>predictions</code>, <code>label_ids</code>, <code>metrics</code> 세 개의 필드가 있는 또 다른 명명된 튜플입니다. <code>metrics</code> 필드에는 전달된 데이터셋에 대한 손실 값과 시간 관련 메트릭(총 예측 시간, 평균 예측 시간)만 포함됩니다. 하지만 <code>compute_metrics()</code> 함수를 완성하여 <code>Trainer</code>에 전달하면, 이 필드에 <code>compute_metrics()</code>가 반환하는 메트릭들도 함께 포함됩니다.</p> <p data-svelte-h="svelte-1f3lia8">보시다시피, <code>predictions</code>는 408 x 2 모양의 2차원 배열입니다 (408은 predict()에 전달한 데이터셋의 샘플 개수입니다). 이 값들은 <code>predict()</code>에 전달한 데이터셋의 각 샘플에 대한 로짓입니다 (<a href="/course/chapter2">이전 챕터</a>에서 보았듯이 모든 Transformer 모델은 로짓을 반환합니다). 이 로짓을 우리가 가진 레이블과 비교할 수 있는 예측값으로 변환하려면, 두 번째 축에서 최댓값을 가진 인덱스를 구해야 합니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np | |
| preds = np.argmax(predictions.predictions, axis=-<span class="hljs-number">1</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-4of9x9">이제 이 <code>preds</code>를 라벨과 비교할 수 있습니다. <code>compute_metric()</code> 함수를 빌드하기 위해 🤗 <a href="https://github.com/huggingface/evaluate/" rel="nofollow">Evaluate</a> 라이브러리의 메트릭을 활용하겠습니다. 데이터셋을 로드했던 것처럼, MRPC 데이터셋과 관련된 메트릭도 <code>evaluate.load()</code> 함수로 쉽게 로드할 수 있습니다. 반환된 객체의 <code>compute()</code> 메서드를 사용해 메트릭을 계산할 수 있습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> evaluate | |
| metric = evaluate.load(<span class="hljs-string">"glue"</span>, <span class="hljs-string">"mrpc"</span>) | |
| metric.compute(predictions=preds, references=predictions.label_ids)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->{<span class="hljs-string">'accuracy'</span>: <span class="hljs-number">0.8578431372549019</span>, <span class="hljs-string">'f1'</span>: <span class="hljs-number">0.8996539792387542</span>}<!-- HTML_TAG_END --></pre></div> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-3mjkop">다양한 평가 메트릭과 전략에 대해 알아보려면 <a href="https://huggingface.co/docs/evaluate/" rel="nofollow">🤗 Evaluate 문서</a>를 참고하세요.</p></div> <p data-svelte-h="svelte-1ylr8sj">모델 헤드의 가중치가 무작위로 초기화되기 때문에 얻게 되는 결과는 조금씩 다를 수 있습니다. 결과를 보면 우리 모델이 검증 세트에서 85.78%의 정확도와 89.97%의 F1 점수를 달성했음을 볼 수 있습니다. 이 두 가지는 GLUE 벤치마크의 MRPC 데이터셋에서 결과를 평가하는 데 사용되는 메트릭입니다. <a href="https://arxiv.org/pdf/1810.04805.pdf" rel="nofollow">BERT 논문</a>에서는 기본 모델의 F1 점수를 88.9로 보고하였습니다. 당시에는 <code>uncased</code> 모델을 사용했지만, 우리는 현재 <code>cased</code> 모델을 사용하고 있기 때문에 더 나은 결과가 나온 것입니다.</p> <p data-svelte-h="svelte-2jb62l">이 모든 것을 종합하면, 다음처럼 <code>compute_metrics()</code> 함수를 다음과 같이 정의할 수 있습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">def</span> <span class="hljs-title function_">compute_metrics</span>(<span class="hljs-params">eval_preds</span>): | |
| metric = evaluate.load(<span class="hljs-string">"glue"</span>, <span class="hljs-string">"mrpc"</span>) | |
| logits, labels = eval_preds | |
| predictions = np.argmax(logits, axis=-<span class="hljs-number">1</span>) | |
| <span class="hljs-keyword">return</span> metric.compute(predictions=predictions, references=labels)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-vtoh1f">각 에폭이 끝날 때마다 메트릭이 출력되도록, 이 <code>compute_metrics()</code> 함수가 포함하여 <code>Trainer</code>를 새로 정의해 보겠습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->training_args = TrainingArguments(<span class="hljs-string">"test-trainer"</span>, eval_strategy=<span class="hljs-string">"epoch"</span>) | |
| model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=<span class="hljs-number">2</span>) | |
| trainer = Trainer( | |
| model, | |
| training_args, | |
| train_dataset=tokenized_datasets[<span class="hljs-string">"train"</span>], | |
| eval_dataset=tokenized_datasets[<span class="hljs-string">"validation"</span>], | |
| data_collator=data_collator, | |
| processing_class=tokenizer, | |
| compute_metrics=compute_metrics, | |
| )<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-14c26ht">참고로, 우리는 <code>eval_strategy</code>를 <code>"epoch"</code>으로 설정한 새로운 <code>TrainingArguments</code>와 새로운 모델을 생성합니다. 이렇게 하지 않으면 이미 훈련된 모델의 훈련을 계속하게 될 겁니다. 새로운 훈련을 시작하려면 다음을 실행하세요.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->trainer.train()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1yytiyu">이번에는 훈련 손실 외에도 각 에폭이 끝날 때마다 검증 손실과 메트릭이 함께 출력될 겁니다. 앞서 말했듯이 모델 헤드의 무작위 초기화 때문에 여러분이 얻는 정확도/F1 점수는 우리가 얻은 결과와 약간 다를 수 있지만, 비슷한 범위에 있을 겁니다.</p> <h3 class="relative group"><a id="advanced-training-features" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#advanced-training-features"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>고급 훈련 기능</span></h3> <p data-svelte-h="svelte-1lmwzzc"><code>Trainer</code>는 현대 딥러닝의 모범 사례들을 쉽게 활용할 수 있도록 다양한 내장 기능을 제공합니다.</p> <p data-svelte-h="svelte-1h0qf0f"><strong>혼합 정밀도 훈련</strong>: 더 빠른 훈련과 메모리 사용량 감소를 위해 훈련 인수에서 <code>fp16=True</code>를 설정하세요.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->training_args = TrainingArguments( | |
| <span class="hljs-string">"test-trainer"</span>, | |
| eval_strategy=<span class="hljs-string">"epoch"</span>, | |
| fp16=<span class="hljs-literal">True</span>, <span class="hljs-comment"># 혼합 정밀도 활성화</span> | |
| )<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-pmu9yq"><strong>그레이디언트 누적</strong>: GPU 메모리가 부족할 때 더 큰 배치 크기로 학습하는 효과를 낼 수 있습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->training_args = TrainingArguments( | |
| <span class="hljs-string">"test-trainer"</span>, | |
| eval_strategy=<span class="hljs-string">"epoch"</span>, | |
| per_device_train_batch_size=<span class="hljs-number">4</span>, | |
| gradient_accumulation_steps=<span class="hljs-number">4</span>, <span class="hljs-comment"># 유효 배치 크기 = 4 * 4 = 16</span> | |
| )<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-umpq8u"><strong>학습률 스케줄링</strong>: Trainer는 기본적으로 선형 감소 방식을 사용하지만, 사용자 맞춤 설정이 가능합니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->training_args = TrainingArguments( | |
| <span class="hljs-string">"test-trainer"</span>, | |
| eval_strategy=<span class="hljs-string">"epoch"</span>, | |
| learning_rate=<span class="hljs-number">2e-5</span>, | |
| lr_scheduler_type=<span class="hljs-string">"cosine"</span>, <span class="hljs-comment"># 다른 스케줄러 시도</span> | |
| )<!-- HTML_TAG_END --></pre></div> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1avfr8p">🎯 <strong>성능 최적화</strong>: 분산 훈련, 메모리 최적화, 하드웨어별 최적화를 포함한 고급 훈련 기술에 대해서는 <a href="https://huggingface.co/docs/transformers/main/en/performance" rel="nofollow">🤗 Transformers 성능 가이드</a>를 살펴보세요.</p></div> <p data-svelte-h="svelte-kboajh"><code>Trainer</code>는 여러 GPU 또는 TPU에서 즉시 작동하며 분산 훈련을 위한 많은 옵션을 제공합니다. 이와 관련된 모든 내용은 Chapter 10에서 다루겠습니다.</p> <p data-svelte-h="svelte-1p27re0">이것으로 <code>Trainer</code> API를 사용한 미세 조정 소개를 마칩니다. 대부분의 일반적인 NLP 작업에 대한 예제는 <a href="/course/chapter7">Chapter 7</a>에서 다룰 예정이며, 다음으로는 순수 PyTorch 코드로 동일한 작업을 수행하는 방법을 살펴보겠습니다.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1j34o6j">📝 <strong>더 많은 예제</strong>: <a href="https://huggingface.co/docs/transformers/main/en/notebooks" rel="nofollow">🤗 Transformers 노트북</a>에 있는 방대한 자료를 확인해 보세요.</p></div> <h2 class="relative group"><a id="section-quiz" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#section-quiz"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>섹션 퀴즈</span></h2> <p data-svelte-h="svelte-1f5akmu">Trainer API와 미세 조정 개념에 대한 이해를 테스트해보세요.</p> <h3 class="relative group"><a id="1-trainer-에서-code-processingclass-code-매개변수의-목적은-무엇인가요" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#1-trainer-에서-code-processingclass-code-매개변수의-목적은-무엇인가요"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>1. Trainer 에서 <code> processing_class </code> 매개변수의 목적은 무엇인가요?</span></h3> <div><form><label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="0"> <!-- HTML_TAG_START -->사용할 모델 아키텍처를 지정합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="1"> <!-- HTML_TAG_START -->데이터 처리에 사용할 토크나이저를 Trainer에 알려줍니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="2"> <!-- HTML_TAG_START -->훈련을 위한 배치 크기를 결정합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="3"> <!-- HTML_TAG_START -->평가 빈도를 제어합니다.<!-- HTML_TAG_END --></label> <div class="flex flex-row items-center mt-3"><button class="btn px-4 mr-4" type="submit" disabled>Submit</button> </div></form></div> <h3 class="relative group"><a id="2-훈련-중-평가가-얼마나-자주-발생하는지를-제어하는-trainingarguments-매개변수는-무엇인가요" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#2-훈련-중-평가가-얼마나-자주-발생하는지를-제어하는-trainingarguments-매개변수는-무엇인가요"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>2. 훈련 중 평가가 얼마나 자주 발생하는지를 제어하는 TrainingArguments 매개변수는 무엇인가요?</span></h3> <div><form><label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="0"> <!-- HTML_TAG_START -->eval_frequency<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="1"> <!-- HTML_TAG_START -->eval_strategy<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="2"> <!-- HTML_TAG_START -->evaluation_steps<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="3"> <!-- HTML_TAG_START -->do_eval<!-- HTML_TAG_END --></label> <div class="flex flex-row items-center mt-3"><button class="btn px-4 mr-4" type="submit" disabled>Submit</button> </div></form></div> <h3 class="relative group"><a id="3-trainingarguments에서-code-fp16true-code-는-무엇을-활성화하나요" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#3-trainingarguments에서-code-fp16true-code-는-무엇을-활성화하나요"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>3. TrainingArguments에서 <code> fp16=True </code> 는 무엇을 활성화하나요?</span></h3> <div><form><label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="0"> <!-- HTML_TAG_START -->더 빠른 훈련을 위한 16비트 정수 정밀도<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="1"> <!-- HTML_TAG_START -->더 빠른 훈련과 메모리 사용량 감소를 위한 16비트 부동소수점 수를 사용한 혼합 정밀도 훈련<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="2"> <!-- HTML_TAG_START -->정확히 16 에포크 동안 훈련<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="3"> <!-- HTML_TAG_START -->분산 훈련을 위한 16개 GPU 사용<!-- HTML_TAG_END --></label> <div class="flex flex-row items-center mt-3"><button class="btn px-4 mr-4" type="submit" disabled>Submit</button> </div></form></div> <h3 class="relative group"><a id="4-trainer에서-code-computemetrics-code-함수의-역할은-무엇인가요" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#4-trainer에서-code-computemetrics-code-함수의-역할은-무엇인가요"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>4. Trainer에서 <code> compute_metrics </code> 함수의 역할은 무엇인가요?</span></h3> <div><form><label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="0"> <!-- HTML_TAG_START -->훈련 중 손실을 계산합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="1"> <!-- HTML_TAG_START -->로짓을 예측으로 변환하고 정확도 및 F1과 같은 평가 메트릭을 계산합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="2"> <!-- HTML_TAG_START -->사용할 옵티마이저를 결정합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="3"> <!-- HTML_TAG_START -->훈련 데이터를 전처리합니다.<!-- HTML_TAG_END --></label> <div class="flex flex-row items-center mt-3"><button class="btn px-4 mr-4" type="submit" disabled>Submit</button> </div></form></div> <h3 class="relative group"><a id="5-trainer에-code-evaldataset-code-을-제공하지-않으면-어떻게-되나요" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#5-trainer에-code-evaldataset-code-을-제공하지-않으면-어떻게-되나요"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>5. Trainer에 <code> eval_dataset </code> 을 제공하지 않으면 어떻게 되나요?</span></h3> <div><form><label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="0"> <!-- HTML_TAG_START -->훈련이 오류와 함께 실패합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="1"> <!-- HTML_TAG_START -->Trainer가 자동으로 훈련 데이터를 평가용으로 분할합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="2"> <!-- HTML_TAG_START -->훈련 중 평가 메트릭을 얻을 수 없지만 훈련은 여전히 작동합니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="3"> <!-- HTML_TAG_START -->모델이 평가를 위해 훈련 데이터를 사용합니다.<!-- HTML_TAG_END --></label> <div class="flex flex-row items-center mt-3"><button class="btn px-4 mr-4" type="submit" disabled>Submit</button> </div></form></div> <h3 class="relative group"><a id="6-그레이디언트-누적이란-무엇이며-어떻게-활성화하나요" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#6-그레이디언트-누적이란-무엇이며-어떻게-활성화하나요"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>6. 그레이디언트 누적이란 무엇이며 어떻게 활성화하나요?</span></h3> <div><form><label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="0"> <!-- HTML_TAG_START -->그레이디언트를 디스크에 저장하는 것으로, save_gradients=True로 활성화됩니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="1"> <!-- HTML_TAG_START -->업데이트 전에 여러 배치에 걸쳐 그레이디언트를 누적하는 것으로, gradient_accumulation_steps로 활성화됩니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="2"> <!-- HTML_TAG_START -->그레이디언트 계산을 가속화하는 것으로, fp16과 함께 자동으로 활성화됩니다.<!-- HTML_TAG_END --></label> <label class="block"><input autocomplete="off" class="form-input -mt-1.5 mr-2" name="choice" type="checkbox" value="3"> <!-- HTML_TAG_START -->그레이디언트 오버플로우를 방지하는 것으로, gradient_clipping=True로 활성화됩니다.<!-- HTML_TAG_END --></label> <div class="flex flex-row items-center mt-3"><button class="btn px-4 mr-4" type="submit" disabled>Submit</button> </div></form></div> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-3aouuq">💡 <strong>핵심 요점:</strong></p> <ul data-svelte-h="svelte-56lnro"><li><code>Trainer</code> API는 대부분의 훈련 복잡성을 처리하는 높은 수준의 인터페이스를 제공합니다.</li> <li><code>processing_class</code>는 적절한 데이터 처리를 위해 토크나이저를 저장하는 데 사용됩니다.</li> <li><code>TrainingArguments</code>는 학습률, 배치 크기, 평가 전략, 최적화 등 훈련의 모든 측면을 제어합니다.</li> <li><code>compute_metrics</code>를 사용하면 훈련 손실 외에 사용자 정의 평가 메트릭을 활용할 수 있습니다.</li> <li>혼합 정밀도(<code>fp16=True</code>)와 그레이디언트 누적과 같은 최신 기능은 훈련 효율성을 크게 향상시킬 수 있습니다.</li></ul></div> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/course/blob/main/chapters/ko/chapter3/3.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_5pcgyl = { | |
| assets: "/docs/course/pr_1069/ko", | |
| base: "/docs/course/pr_1069/ko", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/course/pr_1069/ko/_app/immutable/entry/start.667ba29b.js"), | |
| import("/docs/course/pr_1069/ko/_app/immutable/entry/app.65e35200.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 22], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 71.8 kB
- Xet hash:
- 76fc77ef71d17d714b7df9b3d0dd4ba4b5112a6d5b5bb420cb2e402fde731e90
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.