Multimodal capabilities¶

The LLM Mesh provides multimodal capabilities to handle:

Image inputs. Images can be mixed with text in LLM queries, in order to answer queries like “please describe what is in this image”
Image outputs. The LLM Mesh can generate images.

Image input¶

Multimodal input with images is supported with the following providers:

Image input is available in the LLM Mesh API.

For more details, please see LLM Mesh

Image input is available in the Chat UIs

Image generation is supported with the following providers:

Bedrock Amazon Titan Image Generator G1 supports image-to-image with the following image edition modes:

Bedrock Stability AI SDXL 1.0 also supports image-to-image with the following image edition modes:

Stability AI Stable Diffusion 3 Large supports image-to-image mode with prompt.

The Stability AI connection also supports CONTROLNET_SKETCH and CONTROLNET_STRUCTURE image edition modes.

Image output is available through the LLM Mesh API. Note that some parameters are not supported by all providers / models.

For more details, please see LLM Mesh