Instructions to use modelling101/CodeBERT-SO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use modelling101/CodeBERT-SO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="modelling101/CodeBERT-SO")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("modelling101/CodeBERT-SO") model = AutoModelForSequenceClassification.from_pretrained("modelling101/CodeBERT-SO") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -15,6 +15,8 @@ Repository for CodeBERT, fine-tuned on Stack Overflow snippets with respect to N
|
|
| 15 |
## Training Objective
|
| 16 |
This model is initialized with [CodeBERT-base](https://huggingface.co/microsoft/codebert-base) and trained to classify whether a user will drop out given their posts and code snippets.
|
| 17 |
## Training Regime
|
|
|
|
|
|
|
| 18 |
Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
|
| 19 |
A random 20% sample of the entire dataset was used as the validation set.
|
| 20 |
## Performance
|
|
|
|
| 15 |
## Training Objective
|
| 16 |
This model is initialized with [CodeBERT-base](https://huggingface.co/microsoft/codebert-base) and trained to classify whether a user will drop out given their posts and code snippets.
|
| 17 |
## Training Regime
|
| 18 |
+
Preprocessing methods for input texts include unicode normalisation (NFC form), removal of extraneous whitespaces, removal of punctuations (except within links), lowercasing and removal of stopwords.
|
| 19 |
+
Code snippets were also removed of their in-line comments or docstrings (cf. the main manuscript).
|
| 20 |
Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
|
| 21 |
A random 20% sample of the entire dataset was used as the validation set.
|
| 22 |
## Performance
|