CV Embedding & Recruitment Classification Models

This directory contains the models used for the fictional IT recruitment classification application.

Embedding Model (all-MiniLM-L6-v2)

We utilize the sentence-transformers/all-MiniLM-L6-v2 model to encode candidate CVs into 384-dimensional dense vectors.

Why all-MiniLM-L6-v2?

  1. Efficiency and Speed: It is extremely fast and lightweight, making it ideal for rapid inference in web applications without requiring a GPU.
  2. Quality of Sentence Embeddings: Despite its small size, it performs exceptionally well on Semantic Textual Similarity (STS) tasks. It maps sentences and paragraphs to a 384 dimensional dense vector space, perfectly capturing the semantic meaning of candidate CVs and their alignment with job descriptions.
  3. Versatility: It can be used both as a feature extractor for downstream classification tasks (as we did here) and for finding the most similar historical candidates through cosine similarity.

Classifiers

Based on the embeddings generated by the model above, three distinct classifiers were trained:

  • Random Forest (rf_model.pkl): An ensemble method that provides robust predictions and can be interpreted via feature importance.
  • Support Vector Machine (svm_model.pkl): A linear kernel SVM that excels in high-dimensional spaces like our 384-dimensional text embeddings.
  • PyTorch Neural Network (nn_model.pt): A Multi-Layer Perceptron (MLP) with a hidden layer and dropout. This model typically achieves the highest accuracy and F1 score for this specific task and is used as the primary prediction engine in the application.

Files

  • cv_embeddings.pt: The pre-computed embeddings for the entire 10,000 dataset, used for fast semantic search (k-nearest neighbors) during inference.
  • cv_metadata.json: The raw text data and labels corresponding to the pre-computed embeddings.
  • best_model_info.json: Specifies which classifier performed best during training.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support