utter-project/TowerVision-2B
Image-Text-to-Text • 3B • Updated
• 370 • 5
Extending Tower capabilities to the vision modality
Note TowerVision with a 2B instruct backbone (TowerPlus), trained on the full dataset
Note TowerVision with a 9B instruct backbone (TowerPlus), trained on the full dataset
Note TowerVideo builds upon TowerVision-2B and is further trained with larger video data