Update README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,7 @@ Custom decoder-only transformer:
|
|
| 31 |
- **Tokens seen:** ~4B
|
| 32 |
- **Steps:** 30,000
|
| 33 |
- **Optimizer:** AdamW (lr=3e-4, cosine decay to 3e-5)
|
| 34 |
-
- **Hardware:** Single A100
|
| 35 |
|
| 36 |
## Installation
|
| 37 |
|
|
@@ -57,4 +57,5 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
|
|
| 57 |
|
| 58 |
## License
|
| 59 |
|
| 60 |
-
Model weights: MIT.
|
|
|
|
|
|
| 31 |
- **Tokens seen:** ~4B
|
| 32 |
- **Steps:** 30,000
|
| 33 |
- **Optimizer:** AdamW (lr=3e-4, cosine decay to 3e-5)
|
| 34 |
+
- **Hardware:** Single A100 40GB
|
| 35 |
|
| 36 |
## Installation
|
| 37 |
|
|
|
|
| 57 |
|
| 58 |
## License
|
| 59 |
|
| 60 |
+
Model weights: MIT.
|
| 61 |
+
Training data: This work uses the FineWeb-Edu dataset, available under the Open Data Commons Attribution License (ODC-By 1.0).
|