Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ Typical downstream tasks (with finetuning heads):
|
|
| 23 |
- Protein-only regression/classification.
|
| 24 |
- PSI (**protein-small molecule interactions**) prediction when combined with a SMILES encoder.
|
| 25 |
|
| 26 |
-
GitHub code: [foldvision_github](https://github.com/
|
| 27 |
|
| 28 |
## Model Details
|
| 29 |
|
|
@@ -33,18 +33,6 @@ GitHub code: [foldvision_github](https://github.com/<YOUR_ORG_OR_USER>/foldvisio
|
|
| 33 |
- Input channels: 5 atom-type channels (`C`, `N`, `S`, `O`, `P`)
|
| 34 |
- Output: `(B, 1024)` embedding
|
| 35 |
|
| 36 |
-
## Intended Use
|
| 37 |
-
|
| 38 |
-
Use this model to compute protein structure embeddings for:
|
| 39 |
-
- similarity and retrieval workflows,
|
| 40 |
-
- downstream supervised tasks (classification/regression),
|
| 41 |
-
- multimodal PSI pipelines with a molecule language model.
|
| 42 |
-
|
| 43 |
-
## Out-of-Scope Use
|
| 44 |
-
|
| 45 |
-
- Clinical decision making.
|
| 46 |
-
- Any safety-critical use without task-specific validation.
|
| 47 |
-
- Interpretation as direct biochemical or medical truth without experimental verification.
|
| 48 |
|
| 49 |
## Input and Preprocessing
|
| 50 |
|
|
@@ -84,34 +72,6 @@ FoldVision pipelines support repeated runs with random 3D rotations (test-time a
|
|
| 84 |
- per-run predictions can be used to inspect spread/uncertainty,
|
| 85 |
- averaged predictions are recommended for reporting.
|
| 86 |
|
| 87 |
-
## Training and Evaluation Data
|
| 88 |
-
|
| 89 |
-
Please document here the exact datasets used for pretraining and downstream evaluation.
|
| 90 |
-
|
| 91 |
-
Example datasets referenced in this repository:
|
| 92 |
-
- PTEN activity
|
| 93 |
-
- SPOT
|
| 94 |
-
- Davis
|
| 95 |
-
- small dummy data files for smoke tests (not representative for benchmarking)
|
| 96 |
-
|
| 97 |
-
## Metrics
|
| 98 |
-
|
| 99 |
-
Report the official metrics from your manuscript for your release version.
|
| 100 |
-
|
| 101 |
-
Suggested metrics by task:
|
| 102 |
-
- Regression: Spearman, Pearson, MAE, RMSE, R2
|
| 103 |
-
- Binary: Accuracy, MCC, ROC-AUC
|
| 104 |
-
|
| 105 |
-
## Limitations
|
| 106 |
-
|
| 107 |
-
- Performance depends strongly on preprocessing consistency.
|
| 108 |
-
- Rotational augmentation can change single-run outputs; use multi-run means for stability.
|
| 109 |
-
- Generalization to new protein families/domains must be validated per task.
|
| 110 |
-
|
| 111 |
-
## Risks and Biases
|
| 112 |
-
|
| 113 |
-
- Dataset composition can bias performance across protein classes.
|
| 114 |
-
- Downstream labels and splits can introduce benchmark-specific bias.
|
| 115 |
|
| 116 |
## Citation
|
| 117 |
|
|
|
|
| 23 |
- Protein-only regression/classification.
|
| 24 |
- PSI (**protein-small molecule interactions**) prediction when combined with a SMILES encoder.
|
| 25 |
|
| 26 |
+
GitHub code: [foldvision_github](https://github.com/AlexanderKroll/foldvision)
|
| 27 |
|
| 28 |
## Model Details
|
| 29 |
|
|
|
|
| 33 |
- Input channels: 5 atom-type channels (`C`, `N`, `S`, `O`, `P`)
|
| 34 |
- Output: `(B, 1024)` embedding
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
## Input and Preprocessing
|
| 38 |
|
|
|
|
| 72 |
- per-run predictions can be used to inspect spread/uncertainty,
|
| 73 |
- averaged predictions are recommended for reporting.
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
## Citation
|
| 77 |
|