| --- |
| license: apache-2.0 |
| language: |
| - ko |
| base_model: |
| - beomi/kcbert-base |
| pipeline_tag: token-classification |
| tags: |
| - Korean |
| - PII |
| - KoreanPII |
| - PIIMasking |
| - Anonymization |
| - Privacy |
| --- |
| |
| # Korean-PII-Masking-BERT |
|
|
| **GitHub Repository**: [alphagyuu/Korean-PII-Masking-BERT](https://github.com/alphagyuu/Korean-PII-Masking-BERT) |
|
|
| Korean-PII-Masking-BERT is a token classification model fine-tuned on KcBERTโs **TokenClassifier** using a processed version of "Korean SNS" dataset from **AI-Hub**. |
|
|
| ## ๐ฅ๏ธ Python Implementation |
| - **Tokenizer**: |
| ```python |
| BertTokenizer.from_pretrained('beomi/kcbert-base', do_lower_case=False) |
| ``` |
| - **Model**: |
| ```python |
| TFBertForTokenClassification.from_pretrained('alphagyuu/Korean-PII-Masking-BertForTokenClassification') |
| ``` |
|
|
|
|
|
|
| - **LabelMap**: |
| ```python |
| LabelMAP = { |
| 'O': 'LABEL0', |
| 'B-URL': 'LABEL1', |
| 'I-URL': 'LABEL2', |
| 'B-๊ณ์ ': 'LABEL3', |
| 'I-๊ณ์ ': 'LABEL4', |
| 'B-๊ธ์ต': 'LABEL5', |
| 'I-๊ธ์ต': 'LABEL6', |
| 'B-๋ฒํธ': 'LABEL7', |
| 'I-๋ฒํธ': 'LABEL8', |
| 'B-์์': 'LABEL9', |
| 'I-์์': 'LABEL10', |
| 'B-์ ์': 'LABEL11', |
| 'I-์ ์': 'LABEL12', |
| 'B-์ด๋ฆ': 'LABEL13', |
| 'I-์ด๋ฆ': 'LABEL14', |
| 'B-์ฃผ์': 'LABEL15', |
| 'I-์ฃผ์': 'LABEL16' |
| } |
| ``` |
|
|