| --- |
| language: |
| - en |
| tags: |
| - biology |
| - dna |
| - genomics |
| - metagenomics |
| - classifier |
| - awd-lstm |
| - transfer-learning |
| license: mit |
| pipeline_tag: text-classification |
| library_name: pytorch |
| --- |
| |
| # LookingGlass Reading Frame Classifier |
|
|
| Identifies the correct reading frame start position (1, 2, 3, -1, -2, or -3) for DNA reads. Note: currently only intended for prokaryotic sequences with low proportions of noncoding DNA. |
|
|
| This is a **pure PyTorch implementation** fine-tuned from the LookingGlass base model. |
|
|
| ## Links |
|
|
| - **Paper**: [Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter](https://doi.org/10.1038/s41467-022-30070-8) (Nature Communications, 2022) |
| - **GitHub**: [ahoarfrost/LookingGlass](https://github.com/ahoarfrost/LookingGlass) |
| - **Base Model**: [HoarfrostLab/lookingglass-v1](https://huggingface.co/HoarfrostLab/lookingglass-v1) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{hoarfrost2022deep, |
| title={Deep learning of a bacterial and archaeal universal language of life |
| enables transfer learning and illuminates microbial dark matter}, |
| author={Hoarfrost, Adrienne and Aptekmann, Ariel and Farfanuk, Gaetan and Bromberg, Yana}, |
| journal={Nature Communications}, |
| volume={13}, |
| number={1}, |
| pages={2606}, |
| year={2022}, |
| publisher={Nature Publishing Group} |
| } |
| ``` |
|
|
| ## Model |
|
|
| | | | |
| |---|---| |
| | Architecture | LookingGlass encoder + classification head | |
| | Encoder | AWD-LSTM (3-layer, unidirectional) | |
| | Classes | 6 classes: 1, 2, 3, -1, -2, -3 | |
| | Parameters | ~17M | |
|
|
| ## Installation |
|
|
| ```bash |
| pip install torch |
| git clone https://huggingface.co/HoarfrostLab/LGv1_ReadingFrameClassifier |
| cd LGv1_ReadingFrameClassifier |
| ``` |
|
|
| ## Usage |
|
|
| ```python |
| from lookingglass_classifier import LookingGlassClassifier, LookingGlassTokenizer |
| |
| model = LookingGlassClassifier.from_pretrained('.') |
| tokenizer = LookingGlassTokenizer() |
| model.eval() |
| |
| inputs = tokenizer(["GATTACA", "ATCGATCGATCG"], return_tensors=True) |
| |
| # Get predictions |
| predictions = model.predict(inputs['input_ids']) |
| print(predictions) # tensor([class_idx, class_idx]) |
| |
| # Get probabilities |
| probs = model.predict_proba(inputs['input_ids']) |
| print(probs.shape) # torch.Size([2, 6]) |
| |
| # Get raw logits |
| logits = model(inputs['input_ids']) |
| print(logits.shape) # torch.Size([2, 6]) |
| ``` |
|
|
| ## License |
|
|
| MIT License |
|
|