Use multimodal options in the answer generation node

You can now fully utilize image, file, and audio analysis as well as image and file generation in the Answer Generation node. For each node in the app, you can freely combine the tools and models you want, and within a single conversation flow you can smoothly perform everything from running document/FAQ/web-based RAG agents to essential work tasks like image/file analysis and output generation.

1. How to configure the node

  1. Click the Answer Generation node within the interactive app.

  2. Select RAG Agent as the execution type.

  3. Select the base model.

    • This model is used by the RAG Agent for intent classification to determine which tool a question should be routed to when a question is received. Also, this base model is used when generating text responses for image generation.

  4. Select the search sources. (You must select one or more search sources)

  1. Enable the required multimodal options.

2. How to set multimodal options

The multimodal options are Analysisand Generation divided into two categories. Below are the tools and conditions available for each category.


2-1. Analysis

Using the analysis options allows files, images, and audio to be input within the conversation and analyzed by the model. The conditions for each tool are as follows. File analysis

If you turn this tool ON, you can upload files within the conversation and have them analyzed by the specified model. When OFF, files cannot be attached within the conversation.

You can upload up to 5 files at a time, and the total size cannot exceed 100MB.

  • Example: If one file is 100MB, only 1 file can be uploaded

  • If a file is 1MB, you can upload up to 5 files even if the total does not exceed 100MB.

The file extensions that can be submitted are as follows.

Image analysis

If you turn this tool ON, you can upload images within the conversation and have them analyzed by the specified model. When OFF, image attachment is not possible.

You can upload up to 5 images at a time, and the total size cannot exceed 50MB.

  • Example: If one image is 50MB, only 1 image can be uploaded

  • If an image is 1MB, you can upload up to 5 images even if the total does not exceed 50MB.

The image extensions that can be submitted are as follows.

Audio analysis

If you turn this tool ON, you can upload audio files within the conversation and have them analyzed by the specified model. When OFF, audio attachment is not possible.

You can upload up to 10 audio files at a time, and the total size cannot exceed 100MB.

  • Example: If one audio file is 100MB, only 1 file can be uploaded

  • If a file is 1MB, you can upload up to 5 files even if the total does not exceed 100MB.

  • Audio file size does not necessarily correlate with playback time. For stable processing, we recommend uploading voice files under 1 hour.

The image extensions that can be submitted are as follows.

2-2. Generation

Using the generation options allows the model to generate files or images within the conversation. The conditions for each tool are as follows.

File generation

If you turn this tool ON, files can be generated in response to questions during the conversation. When OFF, file generation is not possible, and if file generation is requested during the conversation the agent will display a message stating "File generation is currently unavailable."

The following four extensions are supported for generation.

Image generation

If you turn this tool ON, images can be generated in response to questions during the conversation. When OFF, image generation is not possible, and if image generation is requested during the conversation the agent will display a message stating "Image generation is currently unavailable."

Image editing is not currently provided. We plan to offer the ability to edit or regenerate generated and uploaded images in the future.

4. Conversation history

Through the dashboard's conversation history you can check files and images uploaded by users and generated content.

5. Check consumed credits

Used credits can be checked in Payments > Answer Generation.

6. Notices

3-1. App preview

  • Currently only the Works screen supports uploading multiple images and multiple files; this is not supported in the SDK.

  • Therefore, in the app preview (SDK screen) the file attachment icon will not be displayed, and you cannot properly test the multimodal features.

Please run tests according to the following procedure to address this.

  1. First, publish the app and designate yourself as an app viewer.

  2. Proceed with verification.

  3. When verification is complete, adjust the access permissions list and republish the app.

By following this guide you can safely test multimodal features.

3-2. How to upload files within the conversation

To attach images, files, or audio to the conversation, use the attachment button at the bottom right > 'Upload in conversation' to select multiple files and send them along with the chat. Drag-and-drop and copy/paste are also supported. The transfer limits for each type are as follows.

  • Images: up to 50MB, up to 5 files

  • Files: up to 100MB, up to 5 files

  • Audio: up to 100MB, up to 5 files

3-3. Credit cautions

Within token allowance, the agent stores existing conversations, files, audio, images, etc. in memory and consults them for each question. Therefore, when you input a new image or question, other files previously uploaded in the conversation may also be referenced, and credits may be charged in that case.

If you want to work based only on the file you just uploaded, independent of existing content, click 'Start new conversation' to refresh the conversation.

3-4. Difference between personal document store and shared document store

Documents uploaded to the existing document store are organized and stored through the DI pipeline and then used as reference materials when the model generates answers. However, files or materials uploaded during a conversation are sent directly to the model without a separate pipeline and used immediately for answer generation; note that unlike the document store, they cannot be referenced in other conversations and do not go through the RAG process.

Last updated