[Gai] Content Generation User Manual

This manual is designed for users already familiar with Quick Start, providing more in-depth guidance on advanced AI generation functionalities. We will elaborate on enhanced multimodal conversations, system instruction management, resource management, parameter adjustment, and best practices.

T01-Enhanced Multimodal Conversation: Efficient Single-Turn Interaction

The Enhanced Multimodal Conversation interface (referred to as "Multimodal Conversation interface") is central to the Gai application. It is specifically designed for efficient, high-quality single-turn contextual interactions, supporting rich text input, multiple file attachment types, and precise generation parameter control to help you obtain desired content in a single pass.

Enhanced Multimodal Conversation

Interface Functionality Details:

  1. Prompt Input Field: For entering your core prompt and detailed instructions.
  2. Prompt Load/Save:
    • Prompt ID (2): Displays the ID of the currently loaded prompt.
    • Rename Prompt (3): Allows renaming the ID of a loaded prompt.
    • Save Prompt (12): Saves the current text in the input box as a prompt to the prompt library.
  3. System Instruction Integration:
    • Loaded Instruction (4): Displays the currently associated system instruction.
    • Enable Instruction Toggle (5): Controls whether the loaded system instruction is active for the current conversation.
    • Instruction Button (9): Accesses the system instruction library to select and load a preset instruction.
  4. Attachment Management:
    • Local Attachment Button (8): Attaches files stored locally (images, audio, video, PDFs, etc.). Refer to T3-Your Local File Resource Management for details.
    • Remote Attachment Button (?): Attaches network file resources. Refer to T4-Your Network File Resource Management for details.
    • Attached Files List (15): Displays the currently associated local files.
  5. Content Generation & Statistics:
    • Generate Button (6): Triggers AI content generation.
    • Token Count Button (11): Estimates the number of tokens consumed by the current input content (excluding thinking mode and generated output).
    • Status Button (13): Displays detailed statistics for the current conversation, including total token consumption.
  6. Output & Distribution:
    • AI-Generated Content (16): Displays model-generated text or image previews.
    • ShareTo Button (14): One-click distribution of generated content to other system applications (e.g., email, notes).
  7. Auxiliary Functions:
    • Buffer Button (13): Quickly copies the last input text content to the system clipboard.

Version Update (v1.2.1) New Features:

  • Parameter Settings (10): Integrated into the current conversation interface, allowing adjustment of AI model, safety level, and generation parameters specifically for the current single-turn interaction. Please note that parameter modifications made in this interface, if not saved as a preset, will only apply to the current conversation.
  • Save to Directory: Allows saving the generated conversation content to a specified local directory.
  • Share File: For AI-generated images or specific file formats, enables one-click distribution to system applications.

Best Practices:

  1. Scenario Presets: Begin by defining the AI's role, context, tone, and task objective through system instructions (refer to T2-Your System Instruction Set) to establish clear boundaries and style for the generated content.
  2. Iterative Refinement: Combine text input and multimodal attachments, continuously adjusting prompts (Prompt Engineering) and system instructions until the desired generation effect is achieved.
  3. Workflow Reusability: Use "Save Prompt (12)" to store optimized prompts in the library, facilitating future reuse and quick initiation of similar tasks.
  4. Fine-tuning Parameters: Utilize the in-interface parameter settings to fine-tune generation parameters for specific tasks, achieving granular control.

T02-Content Generation Catalog Panel: Function Overview & Quick Access

By clicking the back button (labeled 0 in the image above) in the top-left corner of the Multimodal Conversation interface, you will access the Gai application's Content Generation Catalog Panel. This panel serves as a centralized management hub for various AI generation-related functionalities.

Content Generation Catalog Panel

Panel Components:

  • A. Main Feature Access Buttons: Quickly access primary AI generation functions, such as Enhanced Multimodal Conversation and Image Generation.
  • B. Usage Statistics: Displays an overview of your AI usage.
  • C. Last Workspace State: Used to quickly load and reuse the prompts and parameter settings from your last generation operation, facilitating iterative optimization.
  • D. Auxiliary Functions Menu: Provides access to more advanced management features and help documentation.

Auxiliary Functions Menu List:

  1. System Instruction Management: Unified management of your custom system instruction sets, including creation, editing, import, and export.
  2. Prompt Library Management: Centralized management of all saved prompts, supporting loading, editing, copying, and deletion.
  3. Conversation Log Management: Manages the history of single-turn contextual interactions.
  4. Session Log Management: Manages the history of multi-turn contextual sessions (Chat mode).
  5. Restore Workspace State: Quickly loads the detailed state of the last generation operation (prompt, attachments, parameters, etc.) to continue debugging or reuse.

Note: The log management features in this menu allow you to load records to the workspace interface for viewing or modification.


T1-Your Prompt Library: Efficient Management and Reusability of Core Instructions

The prompt library is where you store and manage your carefully designed prompts, helping to improve workflow efficiency and the consistency of generated content.

Access by clicking "Prompt Library Management" from the Catalog Panel.

Prompt Library

Interface Functionality Details:

  • A. Prompt Entry Collapsed: Displays brief information about the prompt.
  • B. Prompt Entry Expanded: Displays the full content of the prompt and operation options.

  • Action Menu: When expanded, displays specific actions for that prompt entry.

  • Expand/View (2): Displays the full text content of the prompt.
  • Collapse (3): Collapses the prompt entry, hiding detailed text.
  • Load to Editor (4): Loads the prompt into the Multimodal Conversation interface, where you can modify and rename it (changes take effect after saving).
  • Copy Text (5): Copies the text content of the prompt to the system clipboard.
  • Delete (6): Removes the prompt entry from the library.

T2-Your System Instruction Set: Defining AI Behavioral Paradigms

System instructions are a set of predefined directives that the model prioritizes before processing user prompts. They are crucial for controlling the model's behavior, role, and output style, significantly enhancing the quality and consistency of generated content.

Example:

You are Nyanko-sensei from the anime "Natsume's Book of Friends". Your real name is "Madara", and "Nyanko-sensei," "San-san," and "saisai" are affectionate names given by fans. You live in this super popular fantasy anime world.

Common Uses of System Instructions:

  • Define Role/Persona: Assign a specific identity or professional field to the AI (e.g., expert, customer service, creator).
  • Standardize Output Format: Specify particular output formats like Markdown or YAML.
  • Set Style and Tone: Control the level of detail, formality, and target audience of the generated content.
  • Clarify Task Goals and Rules: Guide the model to complete specific tasks, such as returning only code snippets.
  • Provide Additional Context: Supplement the model with background knowledge or specific constraints.
  • Specify Response Language: Force the model to reply in a certain language (e.g., "Please reply in Simplified Chinese.").

Multilingual Communication Tip: If you are prompting in a language other than English, consider adding to your system instructions:

Unless specifically requested by the user for a concise reply, all questions should be answered in full detail. Please respond in the same language as the query. If the query is in Simplified Chinese, the response language should also be consistent. Or a more concise instruction: Please respond in Simplified Chinese.

System Instruction Editor

Access by clicking "System Instruction Management" from the Catalog Panel.

Interface Functionality Details:

  1. System Instruction Text Editor (1): For entering or editing system instruction content.
  2. Save Instruction (2): Saves the current instruction from the editor to the instruction library.
  3. Saved Instruction ID (3): Displays the unique identifier of the saved instruction in the library.
  4. Rename Instruction (4): Allows renaming the ID of an instruction after loading it into the editor from the library.
  5. Load Instruction (5): Selects and loads a specified instruction from the instruction library to the editor.
  6. One-Click Distribute (6): Distributes the current instruction content to other system applications.

Additional Features: * Instruction Import and Export: Supports importing or exporting instruction sets for backup or sharing with other users.

Best Practices: Proficiently mastering the creation of system instructions and adjustment of generation parameters (i.e., "Instruction Crafting") is key to achieving advanced AI applications. While preset scenarios are convenient for beginners, a deep understanding and customization of system instructions are essential for advancing from a novice to an advanced user.


T3-Your Local File Resource Management: Integrating Multimodal Input

Local file resource management allows you to attach locally stored files such as PDFs, images, audio, and video as multimodal inputs. A copy of these files will be stored in the application's resource manager.

Limitations: Total file size cannot exceed 5MB. This feature is only available for desktop and mobile versions; the web version is not supported.

From the Multimodal Conversation interface, click the "Local Attachment Button (10)" to access resource management.

Local File Resource

Interface Functionality Details:

  1. Add Local File (1): Selects files from your device and adds them to the resource manager.
  2. File Selection Checkbox (2): Ticks one or more files to attach to the current conversation.
  3. Return to Conversation Interface (3): Returns to the Multimodal Conversation interface after confirming selection.
  4. Item Action Options (4): When expanded, allows operations on that file item.
  5. Delete Item (5): Deletes the file item and its copy in the application's resource manager.
  6. Open File (6): Opens the file using the system's default application for viewing or editing.
  7. Distribute File (7): One-click distribution of this file to other system applications.

Best Practices: To maintain clean and efficient resource management, it is recommended to rename files meaningfully before adding them.


T4-Your Network File Resource Management: Handling Large and Shared Files

Network file resource management is suitable for reusing larger files (over 5MB) multiple times. This feature is currently exclusive to registered users with a serveonly user channel.

Limitations: Uploaded network files are valid for 48 hours, after which the system will automatically clear expired files. Supports one to multiple file attachments.

From the Multimodal Conversation interface, click the "Remote Attachment Button (11)" to access resource management.

Network File Resource

Interface Functionality Details:

  1. Upload Local File (1): Selects local files and uploads them as network resources.
  2. File Selection Checkbox (2): Ticks one or more network files to attach to the current conversation.
  3. Return to Conversation Interface (3): Returns to the Multimodal Conversation interface after confirming selection.
  4. File Display Name (4): Displays the name of the network file.
  5. File Expiration Time (5): Displays the remaining valid time for the network file.

Best Practices: When uploading files, the system creates a copy. It is recommended to rename local files before uploading to ensure clarity and readability in the network resource list.


T5-Generated Content Logging: Reviewing and Reusing Historical Interactions

The log management feature records all your AI generation history, including detailed information for both single-turn conversations and multi-turn sessions, allowing you to review, reuse, or analyze past interactions.

Access by clicking "Conversation Log Management" or "Session Log Management" from the Catalog Panel.

generation history

Interface Functionality Details:

  • A. Log Entry Expanded: Displays the detailed content of the log.
  • B. Log Entry Collapsed: Displays brief information about the log.

  • Creation Time (1): Records the timestamp of the content generation or session interaction.

  • Category (2): Indicates the log type, such as GTXT (generate text), GIMG (generate image).
  • User Channel Identifier (3): Indicates the user channel used for this interaction (L: Local, T: Trial, F: Forwarding).
  • Token Statistics (4): Displays the total token consumption for input and output during this interaction.
  • Load Entry (5): Loads this log record into the Multimodal Conversation interface to reuse its prompt, parameters, and attachment information.
  • Copy Entry (6): Copies the text content from the log to the system clipboard.
  • Delete Entry (7): Deletes the entry from the log records.

T6-Text Content Generation Parameter Adjustment: Fine-grained Control of Model Output

Text content generation parameter adjustment allows you to fine-tune the AI model's behavior, output characteristics, and content safety filtering level. Please note that any parameter modifications made in this interface, if not saved as a preset, will only apply to the current conversation.

From the Multimodal Conversation interface, click the "Parameter Settings" button to access the adjustment interface.

text content generation parameter

A. Safety Level Settings: Filtering Undesirable Content

Safety levels control the sensitivity filtering of model output content. The model prioritizes safety and defaults to filtering CSAM (Child Sexual Abuse Material) and PII (Personally Identifiable Information); this portion is not adjustable.

Adjustable Safety Level Options:

  • unspecified (None): Basic filtering, only filters non-adjustable parts.
  • low: Filters potentially harmful content.
  • medium: Moderately filters harmful content.
  • high: Strictly filters all potentially harmful content.

Corresponding Content Types:

  1. Hate Speech: Negative or harmful comments targeting identity or protected attributes.
  2. Dangerous Content: Promotes or enables access to harmful goods, services, and activities.
  3. Explicit Sexual Content: Contains references to sexual acts or other obscene content.
  4. Harassment Content: Malicious, intimidating, bullying, or abusive comments targeting others.

B. Generation Parameter Settings: Adjusting Model Behavior

  • MaxOutputTokens: Limits the maximum number of tokens for a single text generation.
  • Temperature: Controls the randomness of token selection.
    • Lower temperatures: Tend to select tokens with the highest probability, suitable for scenarios requiring factual, accurate, or deterministic responses. A temperature of 0 typically selects the highest probability token.
    • Higher temperatures: Increase the randomness of token selection, potentially leading to more diverse or unexpected creative results.
  • StopSequences: When the model generates a specified word or phrase combination, it will stop generating further response. The sequence itself will not be included in the response. Up to five stop sequences can be set.
  • Top-p (Nucleus Sampling): Adjusts how the model selects output tokens. The system selects tokens in descending order of probability until the cumulative probability sum of the selected tokens reaches the Top-p value.
    • Example: If tokens A, B, and C have probabilities of 0.3, 0.2, and 0.1 respectively, and Top-p is 0.5, the model will choose from A and B (further determined by temperature). Setting Top-p to 0 yields the least varied results.

Enable and Save: If you understand and need to adjust parameters, toggle the enable switch (5) on, then click Save Settings (6) for them to take effect on the current conversation.

Version Update (v1.2.0) New Features:

  • New Models: Introduces new models like 2.0 Flash Preview Image Gen, supporting the generation of preview images from text prompts.
  • Dynamic Reasoning: For complex tasks, dynamic deep reasoning mode can be enabled, currently only supported by 2.5 series models.
  • Reasoning Output: When generating content, you can choose to also output the model's internal reasoning process.

T7-Image Generation & Parameter Adjustment: Transforming Text into Visuals

The image generation feature allows you to create images using text prompts. This functionality is currently exclusive to registered users with a serveonly user channel.

From the Catalog Panel, click the "Text-to-Image" button to access the image generation interface.

文本绘图主界面

Interface Functionality Details:

  1. Image Prompt Input (1): Enter the text prompt describing your desired image.
  2. Generate Image (2): Triggers the AI to generate an image based on the prompt.
  3. Image Preview & Distribution (3): Generated images will appear as thumbnails here; click to view the full image. Full images support one-click distribution to system applications (e.g., email, photo album).
  4. Prompt Archive ID (4): Displays the ID of the currently saved image prompt.
  5. Rename Prompt (5): Allows renaming the ID of a prompt after saving it.
  6. Save Prompt (6): Saves the current image prompt to the prompt library.
  7. Image Generation Settings (7): Accesses the image generation parameter adjustment interface.
  8. Last Input Text Buffer (8): Caches the text from the last entered prompt.

Image Generation Best Practices:

To achieve high-quality images, prioritize Subject, Context, and Style in your prompts:

  • Subject: Clearly define the central object of the image.
  • Context: Describe the environment, background, or situation of the subject.
  • Style: Define the artistic style, tone, or visual characteristics of the image.

Example: Prompt: A sketch, a modern Chinese-style beautiful girl, traveling through the prosperous Tang Dynasty, surrounded by a bustling ancient street market. * Subject: Modern beautiful girl * Context: Traveling through the prosperous Tang Dynasty, surrounded by a bustling ancient street market. * Style: Sketch, Chinese style.


Image Generation Parameter Settings:

Image Generation Parameter

If the enable switch (8) is turned on, the current parameter settings will take effect upon returning.

  1. Model ID (modelId): Select the AI model for image generation; fast models typically generate faster.
  2. Number of Generations (gen images): Sets the quantity of images generated in a single operation (up to 4 images can be generated).
  3. Aspect Ratio (aspectRatio): Defines the aspect ratio of the generated image.
    • Options: 1:1 (square), 4:3 (traditional), 16:9 (widescreen).
  4. Prompt Language (language): Specifies the language used for the prompt.
    • Options: English, Chinese (Simplified, Traditional), Japanese, Korean, Hindi, Spanish, Portuguese.
  5. Safety Settings (safetySetting): Adjusts the sensitivity filtering for image content.
    • Options: some (moderate filtering), few (filters suspected harmful content), most (filters all possible harmful content).
  6. Person Generation Control (personGeneration): Controls the generation of people or faces in the image.
    • Options: all (allows people of any age), adult (allows only adults), dont (prohibits the generation of people or faces).
  7. Negative Prompt (negativePrompt): Enter descriptions of content you do not wish to appear in the image, forcing the model to avoid generating it.
  8. Enable Current Parameter Settings (8): Activates or deactivates the parameter settings in the current interface.
  9. Save (9): Saves the current parameter settings as a preset for quick loading next time.
  10. Copy Parameters as JSON (10): Exports the current parameter settings as JSON format text.

Note: Each image generation operation typically consumes a certain number of tokens. For example, generating up to 4 images at once may consume approximately 500 tokens per image.