Multimodal models that understand images. On-device image description, visual question answering, and scene understanding — all running on a phone.