Multimodal capabilities¶
The LLM Mesh provides multimodal capabilities to handle:
Image inputs. Images can be mixed with text in LLM queries, in order to answer queries like “please describe what is in this image”
Image outputs. The LLM Mesh can generate images.
Image input¶
Supported models¶
Multimodal input with images is supported with the following providers:
OpenAI (GPT-4o)
Azure OpenAI (GPT-4o)
Vertex Gemini Pro
Bedrock Claude 3
Bedrock Claude 3.5
API¶
Image input is available in the LLM Mesh API.
For more details, please see LLM Mesh
Answers¶
Image input is available in Dataiku Answers
Image output¶
Note
Image output is a feature that is available in Private Preview. Please contact your Dataiku Customer Success Manager or Sales Engineer to get access to this capability
Supported models¶
Image generation is supported with the following providers:
OpenAI (DALL-E 3)
Azure OpenAI (DALL-E 3)
Google Vertex (Imagen 1 and Imagen 2)
Stability AI (Stable Image Core, Stable Diffusion 3.0, Stable Diffusion 3.0 Turbo)
Bedrock Titan Image Generator
Bedrock Stable Diffusion XL 1
API¶
Image output is available in the LLM Mesh API.