Model Card for se-bert

Provides Generic Software Engineering LM from "Enhancing Automated Software Traceability by Transfer Learning from Open-World Data"

Model Details

The following language models is trained on the Git Corpus and Git Links from 2016 to 2021. The data contains 4 types of records including Comments, Issues, Pull Requests, and Commits.

Uses

This model is intended to be a good set of starting weights for various software engineering tasks including:

requirements classification
traceability link prediction
retrieval / search

Training, Evaluation, and Results

Please see cited paper for complete details on training method.

Technical Specifications

Model Architecture and Objective

MLM model trained on SE Corpus (See Above).

Hardware

1 GPU with CUDA 10.2 or 11.1

Software

Python >= 3.7 pytorch/1.1.0

Citation [optional]

BibTeX:

@misc{lin2022enhancing, title={Enhancing Automated Software Traceability by Transfer Learning from Open-World Data}, author={Jinfeng Lin and Amrit Poudel and Wenhao Yu and Qingkai Zeng and Meng Jiang and Jane Cleland-Huang}, year={2022}, eprint={2207.01084}, archivePrefix={arXiv}, primaryClass={cs.SE} }

Model Card Authors [optional]

Jinfeng Lin, Amrit Poudel, Wenhao Yu, Qingkai Zeng, Jane Cleland-Huang

Model Card Contact

Alberto Rodriguez (arodri39@nd.edu)

Downloads last month: 23

Safetensors

Model size

0.1B params

Tensor type

F32

Paper for thearod5/se-bert

Enhancing Automated Software Traceability by Transfer Learning from Open-World Data

Paper • 2207.01084 • Published Jul 3, 2022