librarian-bots/arxiv-metadata-snapshot
Viewer • Updated • 3.03M • 338k • 14
How to use Ian-Khalzov/article-topic-service-scibert with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="Ian-Khalzov/article-topic-service-scibert") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Ian-Khalzov/article-topic-service-scibert")
model = AutoModelForSequenceClassification.from_pretrained("Ian-Khalzov/article-topic-service-scibert")SciBERT text classifier for scientific article topic prediction from article title and abstract.
Balanced 12-class subset built from librarian-bots/arxiv-metadata-snapshot.
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_id = "Ian-Khalzov/article-topic-service-scibert"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
text = "Title: Large language models for scientific document classification\n\nAbstract: We study..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.inference_mode():
probs = torch.softmax(model(**inputs).logits[0], dim=-1)
predicted_label = model.config.id2label[int(probs.argmax())]
print(predicted_label)
The current baseline is strongest on physics-heavy classes and weakest on the broad Machine Learning category, where topical overlap with AI, NLP, CV, and Statistics remains high.