How to use huggingface/CodeBERTa-language-id with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="huggingface/CodeBERTa-language-id")
# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("huggingface/CodeBERTa-language-id") model = AutoModelForSequenceClassification.from_pretrained("huggingface/CodeBERTa-language-id")
It would be nice if the model could be updated to detect not only programming languages but also the kind of content in a file. Like css and json isnt recognized...
· Sign up or log in to comment