Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -10,6 +10,11 @@ The GitHub with the implementation and requirements.txt can be found [here](http
|
|
| 10 |
[ESM++](https://github.com/Synthyra/ESMplusplus) is a faithful implementation of [ESMC](https://www.evolutionaryscale.ai/blog/esm-cambrian) ([license](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement)) that allows for batching and standard Huggingface compatibility without requiring the ESM Python package.
|
| 11 |
The small version corresponds to the 300 million parameter version of ESMC.
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
## Use with 🤗 transformers
|
| 15 |
```python
|
|
@@ -119,7 +124,7 @@ For a more thourough example of fine-tuning, check out our example script [here]
|
|
| 119 |
|
| 120 |
|
| 121 |
## Returning attention maps
|
| 122 |
-
|
| 123 |
ESM++ has the option to ```output_attentions```, which will calculate attention manually. This is much slower, so do not use unless you need the attention maps.
|
| 124 |
|
| 125 |
```python
|
|
|
|
| 10 |
[ESM++](https://github.com/Synthyra/ESMplusplus) is a faithful implementation of [ESMC](https://www.evolutionaryscale.ai/blog/esm-cambrian) ([license](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement)) that allows for batching and standard Huggingface compatibility without requiring the ESM Python package.
|
| 11 |
The small version corresponds to the 300 million parameter version of ESMC.
|
| 12 |
|
| 13 |
+
## Attention backend defaults
|
| 14 |
+
Flex Attention with a block mask that ignores pad tokens is the default attention backend. If Flex Attention is unavailable, ESM++ falls back to native PyTorch attention.
|
| 15 |
+
|
| 16 |
+
For throughput and memory efficiency, `torch.compile(...)` is heavily recommended, especially when using Flex Attention.
|
| 17 |
+
|
| 18 |
|
| 19 |
## Use with 🤗 transformers
|
| 20 |
```python
|
|
|
|
| 124 |
|
| 125 |
|
| 126 |
## Returning attention maps
|
| 127 |
+
Flex Attention with a pad-token block mask is used by default for attention calculations, and native PyTorch attention is the fallback. Optimized attention paths do not return attention maps directly.
|
| 128 |
ESM++ has the option to ```output_attentions```, which will calculate attention manually. This is much slower, so do not use unless you need the attention maps.
|
| 129 |
|
| 130 |
```python
|