AI-NEWS · 2024年 10月 26日

What You Need Is Not an AI Agent, But an AI-Friendly Workflow

## AI Agents and Workflows

### Overview
The concept of AI agents is gaining popularity, but some argue they are overhyped with limited real-world applications. Andrew Ng's example of multi-agent translation shows how three agents (direct translation, review, paraphrasing) can improve quality. However, similar results can be achieved by guiding LLMs through a "Chain of Thought" (CoT) process.

### Key Points
- **Using AI Effectively**: The core principle is designing an AI-friendly workflow rather than relying solely on AI agents.
- **Avoid Anthropomorphizing AI**: Instead of applying human methods to AI, leverage the strengths of LLMs using CoT processes.
- **AI as a Copilot**: Use AI to assist humans in decision-making and handle well-defined tasks without complex decisions.

### Designing an AI-Friendly Workflow
1. **Avoid Limiting AI Solutions to Human Methods**:
   - Break down tasks into steps suitable for AI, like direct translation, reflection, and paraphrasing.
   - Avoid mimicking human job roles (project managers, developers) as it doesn't fit well with how AI can work effectively.

2. **Don't Rely Solely on AI for Decisions—Use AI to Assist**:
   - Use AI for simpler tasks like sentiment analysis and response generation in handling customer reviews.
   - Examples include GitHub Copilot for code generation, which improves software workflows without complex decisions.

3. **Combine AI Models and Tools Across Domains for Better Solutions**:
   - Integrate LLMs with domain-specific tools to create efficient solutions.
   
4. **Return to the Root Problem—AI Is Just a Tool**:
   - Apply first principles thinking: define the core problem, break it down into essential components, and create a solution from the ground up.

### Examples of AI-Optimized Workflows
1. **Converting PDFs to Markdown**
   - Use PyMuPDF to detect images, figures, and tables.
   - Convert pages into images with red boxes marking figures and tables.
   - GPT-4 interprets marked images and generates corresponding Markdown content.

2. **Translating Comics**
   - Detect speech bubbles using a specialized model.
   - Extract text from these bubbles using OCR.
   - Remove original text, translate it using GPT-4, and re-insert the translated text into the original layout.

### Conclusion
To maximize AI's potential, design workflows that fit its strengths. Whether using an AI agent or LLMs, focus on solving core problems effectively by leveraging AI as a tool rather than an end in itself.

Source:https://baoyu.io/blog/ai-agent/what-you-need-is-ai-friendly-workflow