Generate vivid images from text prompts in seconds
Segment objects in images using click‑based masks
Multimodal Instruction-based Editing and Generation
Generate expressive speech from text and voice reference
Gemini 2.0 native image generation co-doodling
Transcribe audio recordings into written text
Answer questions about your images
Ask questions about images and get text answers
Segment objects in images directly in your browser