Multimodal capabilities

The LLM Mesh provides multimodal capabilities to handle:

  • Image inputs. Images can be mixed with text in LLM queries, in order to answer queries like “please describe what is in this image”

  • Image outputs. The LLM Mesh can generate images.

Image input

Supported models

Multimodal input with images is supported with the following providers:

  • OpenAI (GPT-4o)

  • Azure OpenAI (GPT-4o)

  • Vertex Gemini Pro

  • Bedrock Claude 3

  • Bedrock Claude 3.5

API

Image input is available in the LLM Mesh API.

For more details, please see LLM Mesh

Answers

Image input is available in Dataiku Answers

Image output

Note

Image output is a feature that is available in Private Preview. Please contact your Dataiku Customer Success Manager or Sales Engineer to get access to this capability

Supported models

Image generation is supported with the following providers:

  • OpenAI (DALL-E 3)

  • Azure OpenAI (DALL-E 3)

  • Google Vertex (Imagen 1 and Imagen 2)

  • Stability AI (Stable Image Core, Stable Diffusion 3.0, Stable Diffusion 3.0 Turbo)

  • Bedrock Titan Image Generator

  • Bedrock Stable Diffusion XL 1

API

Image output is available in the LLM Mesh API.