| | --- |
| | language: |
| | - en |
| | - fr |
| | - ro |
| | - de |
| | license: apache-2.0 |
| | library_name: transformers |
| | datasets: |
| | - c4 |
| | --- |
| | |
| | # Model Card for EncT5 |
| |
|
| | EncT5 is a variant of T5 that utilizes mainly the encoder for non-autoregressive (ie. classification and regression) |
| | tasks. The model is from the paper [Fine-tuning T5 Encoder for Non-autoregressive Tasks](https://arxiv.org/abs/2110.08426) |
| | by Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li |
| |
|
| |
|
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | EncT5 uses the same base weights at T5, but **must be fine-tuning before use**. There are several special features |
| | to EncT5: |
| |
|
| | 1. There are less decoder layers (a single decoder layer by default), and so has fewer parameters/resources than the |
| | standard T5. |
| | 3. There is a separate decoder word embedding, with the decoder input ids being predefined constants. During |
| | fine-tuning, the decoder embedding learns to use these constants as "prompts" to the encoder for the corresponding |
| | classification/regression tasks. |
| | 5. There is a classification head on top of the decoder output. |
| |
|
| | Research has shown that this model can be more efficient and usable over T5 and BERT for non-autoregressive |
| | tasks such as classification and regression. |
| |
|
| | - **Developed by:** Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li. See the |
| | [associated paper](https://arxiv.org/abs/2110.08426). |
| | - **Model type:** Language Model |
| | - **Language(s) (NLP):** English, French, Romanian, German |
| | - **License:** Apache 2.0 |
| | - **Based on model:** [T5](https://huggingface.co/google-t5/t5-base) |
| | - **Repository:** [Github repro](https://github.com/hackyon/EncT5) |
| | - **Paper:** [Fine-tuning T5 Encoder for Non-autoregressive Tasks](https://arxiv.org/abs/2110.08426) |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Use the code below to get started with the model. |
| |
|
| | ```python |
| | model = AutoModelForSequenceClassification.from_pretrained("hackyon/enct5-base", trust_remote_code=True) |
| | # Fine-tune the model before use. |
| | ``` |
| |
|
| | See the [github repro](https://github.com/hackyon/EncT5) for a more comprehensive guide. |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | The weights of this model are directly copied from [t5-base](https://huggingface.co/google-t5/t5-base). |
| |
|
| | ### Training Procedure |
| |
|
| | This model **must be fine-tuned** before use. The decoder word embedding and classification head are both untrained. |
| |
|
| |
|