很多 Agent 系统失败,并不是因为大语言模型(LLM,Large Language Model)完全不会回答,而是因为系统没有把关键控制问题设计清楚:
- 中间状态放在哪里?
- 下一步由谁决定?
- 工具调用失败后怎么办?
- 什么时候继续,什么时候停止?
- 哪些动作允许自动执行,哪些动作必须审批?
- 历史信息如何进入当前决策?
- 输出质量由谁验证?
把这些问题放在一起看,Agent 架构的本质就很清楚了:Agent architecture 不是 prompt engineering,也不是某个框架的 DSL(领域特定语言),而是控制流设计。
一个可落地的 Agent 系统通常围绕三类对象展开:
| 对象 | 解决的问题 | 常见实现 |
|---|---|---|
| State | 系统现在知道什么、做过什么、下一步依赖什么 | Pydantic 模型、会话状态、黑板、记忆库 |
| Router | 系统下一步该走哪条路径 | 固定步骤、条件分支、动态调度、人工审批 |
| Evaluator | 当前结果是否可信、是否需要重试或终止 | critic、verifier、规则校验、人工审核 |
用这个视角看 17 种 Agent 架构,它们并不是 17 个互不相关的名词,而是一步步给系统增加控制能力。
flowchart TD
A[Single Generation<br/>单次生成] --> B[Reflection<br/>生成-批评-修订]
B --> C[Tool Use<br/>接入外部工具]
C --> D[ReAct<br/>观察-行动循环]
D --> E[Planning<br/>显式计划]
E --> F[PEV<br/>计划-执行-验证]
F --> G[Multi-Agent<br/>角色分工]
G --> H[Blackboard<br/>共享黑板动态调度]
G --> I[Meta-Controller<br/>入口路由]
G --> J[Ensemble<br/>并行冗余]
D --> K[Memory<br/>长期记忆]
K --> L[Graph Memory<br/>关系推理]
E --> M[Tree-of-Thoughts<br/>搜索树]
D --> N[Mental Loop<br/>行动前模拟]
N --> O[Dry-Run<br/>副作用闸门]
O --> P[Metacognitive<br/>边界感知]
B --> Q[Self-Improvement<br/>迭代改进]
M --> R[Cellular Automata<br/>去中心化涌现]
用 agno 抽象 Agent 控制流
不同框架写法不同,但核心抽象差异不大。以 agno 为例,一个 Agent 可以看成一次状态变换,一个 Workflow 可以看成显式控制流。
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.workflow.v2 import Workflow, Step, Router, Loop
agent = Agent(
model=OpenAIChat(id="gpt-5-mini"),
tools=[...],
instructions="...",
response_model=SomePydanticModel,
)
workflow = Workflow(
name="my_flow",
steps=[
Step(name="plan", agent=planner_agent),
Loop(
name="execute_and_verify",
steps=[executor_step, verifier_step],
end_condition=lambda outputs: outputs[-1].content.is_done,
),
Step(name="synthesize", agent=synthesizer_agent),
],
)
workflow.run(message="...")
这段结构里已经包含了 Agent 系统最常用的控制部件:
| agno 对象 | 架构含义 |
|---|---|
response_model | 定义结构化状态空间 |
Agent | 封装一次模型推理或工具调用能力 |
tools | 连接外部世界 |
Step | 一个确定的状态变换节点 |
Workflow.steps | 固定顺序控制流 |
Router | 条件分支或动态路由 |
Loop | 循环与终止条件 |
workflow_session_state | 显式共享状态 |
拆 Agent 架构时,可以统一问六个问题:
- 它解决上一种模式的什么不足?
- 它新增了哪些状态字段?
- 它的拓扑是线性链、循环、分叉、黑板、树搜索,还是网格涌现?
- 它的路由规则是固定的、条件式的,还是由调度器动态决定?
- 它最容易在哪里失败?
- 什么场景下该升级或换成别的架构?
17 种架构总览
| 架构 | 新增能力 | 典型问题 | 适合场景 |
|---|---|---|---|
| Reflection | 批评与修订 | 单次输出质量不稳 | 代码生成、文案润色 |
| Tool Use | 外部工具接口 | 模型无法访问实时世界 | 搜索、数据库、API 查询 |
| ReAct | 观察-行动循环 | 工具结果不能驱动后续决策 | 多跳问答、研究助手 |
| Planning | 显式计划状态 | 缺少全局步骤控制 | 复杂任务拆解 |
| PEV | 验证驱动重规划 | 执行失败会静默传播 | 工具不稳定的流程 |
| Multi-Agent | 角色分工 | 单 Agent 角色冲突 | 研究、分析、写作流水线 |
| Blackboard | 共享状态动态调度 | 固定流水线不够灵活 | 多专家协作 |
| Meta-Controller | 入口路由 | 请求类型差异大 | 智能客服、任务分诊 |
| Ensemble | 并行冗余 | 单一答案偏差大 | 高风险判断、事实核查 |
| Episodic/Semantic Memory | 长期记忆 | 系统跨轮失忆 | 个性化助手、知识助手 |
| Graph Memory | 关系推理 | 向量检索难做多跳关系查询 | 企业知识图谱 |
| Tree-of-Thoughts | 搜索树 | 线性推理无法回溯 | 谜题、组合优化 |
| Mental Loop | 行动前模拟 | 真实试错成本高 | 交易、机器人、配置变更 |
| Dry-Run | 副作用闸门 | 危险动作不能直接执行 | 发邮件、下单、删数据 |
| Metacognitive | 自我边界建模 | 系统不知道自己不该答什么 | 医疗、法律、金融 |
| Self-Improvement | 迭代质量循环 | 一次反思不够 | 内容生产、营销文案 |
| Cellular Automata | 去中心化涌现 | 中央控制太笨重 | 路径扩散、局部规则系统 |
1. Reflection:最小质量闭环
Reflection 解决的是单次生成质量不稳定的问题。它不让模型一次性完成所有事,而是把任务拆成三个阶段:
flowchart LR
U[用户请求] --> G[Generator<br/>生成草稿]
G --> C[Critic<br/>批评检查]
C --> R[Refiner<br/>根据意见修订]
R --> O[最终输出]
它的关键变化不是“模型会反思”,而是生成和评估被拆成了两个不同职责。
State 设计
Reflection 至少需要三个状态:
from pydantic import BaseModel, Field
from typing import List
class DraftCode(BaseModel):
code: str = Field(description="Python code to solve the request.")
explanation: str = Field(description="Brief explanation of the code.")
class Critique(BaseModel):
has_errors: bool
is_efficient: bool
suggested_improvements: List[str]
critique_summary: str
class RefinedCode(BaseModel):
refined_code: str
refinement_summary: str
draft、critique、refined_code 被显式建模后,中间产物不再只是藏在上下文里的文本,而是可以被后续步骤稳定读取的数据。
agno 写法
model = OpenAIChat(id="gpt-5-mini")
generator = Agent(
name="generator",
model=model,
response_model=DraftCode,
instructions="Write Python code and explain it briefly.",
)
critic = Agent(
name="critic",
model=model,
response_model=Critique,
instructions="Review code for bugs, inefficiency and style issues.",
)
refiner = Agent(
name="refiner",
model=model,
response_model=RefinedCode,
instructions="Rewrite code according to the critique.",
)
def generate_step(si):
draft = generator.run(si.message).content
si.workflow_session_state["draft"] = draft
return draft
def critique_step(si):
draft = si.workflow_session_state["draft"]
return critic.run(f"Review this code:\n```python\n{draft.code}\n```").content
def refine_step(si):
draft = si.workflow_session_state["draft"]
critique = si.previous_step_output.content
return refiner.run(
f"Original code:\n{draft.code}\n\nCritique:\n{critique.model_dump_json()}"
).content
reflection_wf = Workflow(
name="reflection",
session_state={"draft": None},
steps=[
Step(name="generate", executor=generate_step),
Step(name="critique", executor=critique_step),
Step(name="refine", executor=refine_step),
],
)
边界和失败模式
Reflection 是线性流程,没有 Router,也没有重试路径。Critic 指出了问题,并不代表 Refiner 一定修好了问题。它适合给输出加一道质量检查,但不适合处理需要持续行动、工具交互或失败恢复的任务。
当系统需要访问外部世界时,就需要 Tool Use;当系统需要根据观察持续调整行动时,就需要 ReAct。
2. Tool Use:让文本系统接入结构化世界
没有工具的 LLM 被限制在参数记忆和上下文窗口里。Tool Use 给 Agent 加上函数、搜索、数据库、API(应用程序接口)等外部能力。
sequenceDiagram
participant U as 用户
participant M as 模型
participant T as 工具
U->>M: 提问
M->>T: 生成工具调用
T-->>M: 返回观察结果
M-->>U: 综合答案
State 变化
Tool Use 的状态通常是一条事件日志:
用户输入
-> 模型决定调用工具
-> 工具参数
-> 工具返回
-> 模型综合回答
在 agno 里,这条日志通常由 Agent 内部维护。工程重点从“怎么让模型知道工具”转为“工具接口如何被稳定序列化和反序列化”。
agno 写法
from agno.tools.duckduckgo import DuckDuckGoTools
def get_stock_price(symbol: str) -> str:
"""Return the latest stock price for a symbol."""
return f"The current price of {symbol.upper()} is $172.35."
tool_agent = Agent(
model=OpenAIChat(id="gpt-5-mini"),
tools=[get_stock_price, DuckDuckGoTools()],
instructions="Use tools when the question needs real-time data.",
show_tool_calls=True,
)
tool_agent.run("What is Apple's current stock price?")
概念上,Agent 内部执行的是这样的循环:
while True:
response = model.invoke(messages)
messages.append(response)
if not response.tool_calls:
break
for call in response.tool_calls:
result = tool_registry[call.name](**call.args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": result,
})
失败模式
Tool Use 常见故障集中在边界层:
| 故障 | 表现 |
|---|---|
| 工具名幻觉 | 模型调用不存在的函数 |
| 参数错误 | 字段名、类型、枚举值不匹配 |
| 返回格式混乱 | 工具返回无法被模型正确理解 |
| 综合错误 | 工具结果正确,但模型解释错了 |
Tool Use 解决了“能不能接触外部世界”,但还没有解决“能不能持续根据观察调整行动”。这个缺口由 ReAct 补上。
3. ReAct:观察驱动行动
ReAct 是 Reasoning + Acting 的组合。它的关键不只是调用工具,而是让每次工具返回都进入下一轮决策。
flowchart LR
Q[问题] --> T[Thought<br/>判断需要什么]
T --> A[Action<br/>调用工具]
A --> O[Observation<br/>获得结果]
O --> T
T --> F[Final Answer]
State 语义变化
Tool Use 里的消息序列更像调用日志;ReAct 里的消息序列变成了行动轨迹:
当前问题
-> 当前推理
-> 工具调用
-> 观察结果
-> 新推理
-> 新工具调用
...
这条轨迹就是 Agent 的工作记忆。
agno 写法
from agno.tools.duckduckgo import DuckDuckGoTools
from agno.tools.yfinance import YFinanceTools
react_agent = Agent(
model=OpenAIChat(id="gpt-5-mini"),
tools=[
DuckDuckGoTools(),
YFinanceTools(stock_price=True, company_news=True),
],
instructions=[
"You are a research assistant.",
"Think step by step.",
"After each tool observation, decide what is still missing.",
],
reasoning=True,
markdown=True,
show_tool_calls=True,
)
react_agent.print_response(
"Based on the latest news, should I be worried about AAPL next quarter?",
stream=True,
)
为什么 ReAct 常作为起点
很多实际任务并不需要复杂的全局计划,只需要:
- 查一点信息;
- 根据结果判断下一步;
- 再查或回答。
搜索、多跳问答、信息收集、简单数据分析都能用这个模式覆盖。
失败模式
ReAct 的问题是局部贪心。它每一步只看当前观察,很容易重复搜索、走弯路,或者在需要全局顺序约束的任务里漏掉关键步骤。
当任务必须先拆步骤、再按步骤执行时,需要 Planning。
4. Planning:把控制流写成数据结构
Planning 解决 ReAct 的局部贪心问题。它让模型先生成一份计划,再由工作流按计划执行。
flowchart LR
U[用户任务] --> P[Planner<br/>生成计划]
P --> S[(plan steps)]
S --> E[Executor<br/>逐步执行]
E --> S
E --> Z[Synthesizer<br/>汇总答案]
State 设计
from typing import List
from pydantic import BaseModel, Field
class Plan(BaseModel):
steps: List[str] = Field(description="Ordered atomic steps.")
会话状态中还需要保存计划和中间结果:
session_state = {
"plan": [],
"intermediate": [],
}
agno 写法
planner = Agent(
name="planner",
model=OpenAIChat(id="gpt-5-mini"),
response_model=Plan,
instructions="Break the request into ordered atomic steps.",
)
executor = Agent(
name="executor",
model=OpenAIChat(id="gpt-5-mini"),
tools=[DuckDuckGoTools()],
instructions="Answer exactly one sub-question using tools.",
)
synthesizer = Agent(
name="synthesizer",
model=OpenAIChat(id="gpt-5-mini"),
instructions="Combine intermediate findings into a final answer.",
)
def plan_step(si):
plan = planner.run(si.message).content
si.workflow_session_state["plan"] = list(plan.steps)
si.workflow_session_state["intermediate"] = []
return plan
def execute_one_step(si):
state = si.workflow_session_state
question = state["plan"].pop(0)
answer = executor.run(question).content
state["intermediate"].append(f"Q: {question}\nA: {answer}")
return answer
def synthesize_step(si):
notes = "\n\n".join(si.workflow_session_state["intermediate"])
return synthesizer.run(
f"Question: {si.message}\n\nNotes:\n{notes}\n\nFinal answer:"
).content
def plan_empty(outputs):
return len(planning_wf.session_state["plan"]) == 0
planning_wf = Workflow(
name="planning",
session_state={"plan": [], "intermediate": []},
steps=[
Step(name="plan", executor=plan_step),
Loop(
name="execute_all",
steps=[Step(name="execute_one", executor=execute_one_step)],
end_condition=plan_empty,
),
Step(name="synthesize", executor=synthesize_step),
],
)
Planning 的关键变化是:下一步不再临时决定,而是先被物化成 plan 数据结构。
失败模式
Planning 更可追踪,但也更乐观。一旦计划本身错了,后续执行会沿着错误路线稳定推进。它提高了可预测性,却降低了适应性。
工具不可靠、数据可能缺失、执行结果需要验真时,需要 PEV。
5. PEV:把验证接进主回路
PEV 是 Plan → Execute → Verify。它不再默认执行结果正确,而是把每一步执行结果都交给 verifier 判断。
flowchart TD
P[Plan] --> E[Execute]
E --> V[Verify]
V -->|通过| N[执行下一步]
V -->|失败且可重试| E
V -->|多次失败| RP[Replan]
RP --> E
N --> D{计划为空?}
D -->|否| E
D -->|是| S[Synthesize]
State 设计
class VerificationResult(BaseModel):
is_successful: bool
reasoning: str
有了 verification_result,控制流从:
plan -> execute -> next
变成:
plan -> execute -> verify -> continue / retry / replan
核心代码
def flaky_web_search(query: str) -> str:
"""An intentionally unreliable search tool."""
if "employee count" in query.lower():
return "Error: API endpoint is unavailable."
return f"Mock search result for: {query}"
verifier = Agent(
name="verifier",
model=OpenAIChat(id="gpt-5-mini"),
response_model=VerificationResult,
instructions=(
"Decide whether the observation answers the sub-question. "
"Treat errors, empty results and irrelevant text as failures."
),
)
executor = Agent(
name="executor",
model=OpenAIChat(id="gpt-5-mini"),
tools=[flaky_web_search],
instructions="Answer one sub-question using tools.",
)
def execute_step(si):
state = si.workflow_session_state
question = state["plan"][0]
observation = executor.run(question).content
state["last_question"] = question
state["last_observation"] = observation
return observation
def verify_step(si):
state = si.workflow_session_state
verdict = verifier.run(
f"Sub-question: {state['last_question']}\n"
f"Observation:\n{state['last_observation']}"
).content
if verdict.is_successful:
state["plan"].pop(0)
state["intermediate"].append(
f"Q: {state['last_question']}\nA: {state['last_observation']}"
)
state["retries"] = 0
else:
state["retries"] = state.get("retries", 0) + 1
state["last_verdict"] = verdict
return verdict
Router 根据验证结果决定下一步:
from agno.workflow.v2 import Router
replan_step = Step(name="replan", executor=plan_step)
noop_step = Step(name="noop", executor=lambda si: "continue")
def route_after_verify(si):
state = si.workflow_session_state
if not state["plan"]:
return [noop_step]
failed = not state["last_verdict"].is_successful
too_many_retries = state["retries"] >= 2
if failed and too_many_retries:
state["retries"] = 0
return [replan_step]
return [noop_step]
PEV 的价值
PEV 让错误不会静默传播。普通 Planning 里,一次工具失败会污染后续答案;PEV 会在局部识别失败,并把流程导向重试或重规划。
失败模式
| 问题 | 说明 |
|---|---|
| 成本增加 | 每步多一次验证 |
| Verifier 误判 | 验证器本身也是模型时可能出错 |
| 验证太重 | 简单任务里会拖慢系统 |
| 验证难于执行 | 有些任务判断结果对错比生成结果更难 |
6. Multi-Agent:把认知角色拆成图节点
单个 Agent 承担太多角色时,prompt 会越来越臃肿,角色之间还会冲突。Multi-Agent 的核心不是“多个模型更热闹”,而是把角色边界写进架构。
flowchart LR
U[用户请求] --> N[新闻分析师]
U --> T[技术分析师]
U --> F[财务分析师]
N --> W[报告撰写者]
T --> W
F --> W
W --> O[最终报告]
State 设计
session_state = {
"news": None,
"technical": None,
"financial": None,
"final_report": None,
}
不同角色写入不同字段,这比把所有内容塞进一个上下文更容易调试和评估。
agno Workflow 写法
news_analyst = Agent(
name="news_analyst",
model=OpenAIChat(id="gpt-5-mini"),
tools=[DuckDuckGoTools()],
instructions="Produce a concise section on recent financial news.",
)
technical_analyst = Agent(
name="technical_analyst",
model=OpenAIChat(id="gpt-5-mini"),
tools=[YFinanceTools()],
instructions="Analyze price action and indicators.",
)
financial_analyst = Agent(
name="financial_analyst",
model=OpenAIChat(id="gpt-5-mini"),
tools=[YFinanceTools(income_statements=True, key_financial_ratios=True)],
instructions="Analyze fundamentals.",
)
report_writer = Agent(
name="report_writer",
model=OpenAIChat(id="gpt-5-mini"),
instructions="Compose a final investment memo.",
)
def news_step(si):
out = news_analyst.run(si.message).content
si.workflow_session_state["news"] = out
return out
def technical_step(si):
out = technical_analyst.run(si.message).content
si.workflow_session_state["technical"] = out
return out
def financial_step(si):
out = financial_analyst.run(si.message).content
si.workflow_session_state["financial"] = out
return out
def write_step(si):
s = si.workflow_session_state
return report_writer.run(
f"News:\n{s['news']}\n\n"
f"Technical:\n{s['technical']}\n\n"
f"Financial:\n{s['financial']}"
).content
multi_agent_wf = Workflow(
name="multi_agent",
session_state={},
steps=[
Step(name="news", executor=news_step),
Step(name="technical", executor=technical_step),
Step(name="financial", executor=financial_step),
Step(name="write", executor=write_step),
],
)
agno 也可以用 Team(mode="coordinate") 表达固定协作:
from agno.team import Team
analysts_team = Team(
name="analysts",
mode="coordinate",
model=OpenAIChat(id="gpt-5-mini"),
members=[news_analyst, technical_analyst, financial_analyst],
instructions=[
"Dispatch sub-tasks to the right analyst.",
"Collect their outputs and synthesize a final memo.",
],
)
失败模式
固定 Multi-Agent 流水线的缺点是顺序写死。如果中途发现新闻信息不足,流程不会自动回到新闻分析师;如果技术分析不必要,也不会自动跳过。
角色分工解决了“谁负责什么”,但没有解决“什么时候该让谁上场”。这个问题由 Blackboard 处理。
7. Blackboard:共享状态驱动动态调度
Blackboard 把系统中心从固定步骤迁移到共享工作区。每个专家围绕同一块黑板读写,controller 根据黑板当前内容决定下一个激活谁。
flowchart TD
BB[(Blackboard<br/>共享工作区)]
C[Controller<br/>调度器] --> BB
BB --> C
C --> N[News Agent]
C --> T[Technical Agent]
C --> F[Financial Agent]
C --> W[Writer Agent]
N --> BB
T --> BB
F --> BB
W --> BB
C -->|FINISH| O[输出]
State 设计
from typing import Optional
from pydantic import BaseModel, Field
class BlackboardState(BaseModel):
user_request: str
blackboard: dict
next_agent: Optional[str] = None
is_complete: bool = False
调度器输出结构
class ControllerDecision(BaseModel):
next_agent: str = Field(
description="One of: news, technical, financial, writer, FINISH"
)
reasoning: str
核心代码
import json
controller = Agent(
name="controller",
model=OpenAIChat(id="gpt-5-mini"),
response_model=ControllerDecision,
instructions=(
"Inspect the blackboard and decide which specialist should run next. "
"Return FINISH if the report is ready."
),
)
SPECIALISTS = {
"news": news_analyst,
"technical": technical_analyst,
"financial": financial_analyst,
"writer": report_writer,
}
def controller_step(si):
state = si.workflow_session_state
snapshot = json.dumps(state["blackboard"], ensure_ascii=False, indent=2)
decision = controller.run(
f"Request: {state['user_request']}\n\nBlackboard:\n{snapshot}"
).content
state["next_agent"] = decision.next_agent
return decision
def specialist_step(si):
state = si.workflow_session_state
if state["next_agent"] == "FINISH":
return "finished"
agent = SPECIALISTS[state["next_agent"]]
piece = agent.run(
f"Request: {state['user_request']}\n"
f"Blackboard:\n{json.dumps(state['blackboard'], ensure_ascii=False)}"
).content
state["blackboard"][state["next_agent"]] = piece
return piece
def blackboard_done(outputs):
return blackboard_wf.session_state["next_agent"] == "FINISH"
blackboard_wf = Workflow(
name="blackboard",
session_state={
"user_request": "",
"blackboard": {},
"next_agent": None,
},
steps=[
Loop(
name="blackboard_loop",
steps=[
Step(name="controller", executor=controller_step),
Step(name="specialist", executor=specialist_step),
],
end_condition=blackboard_done,
)
],
)
失败模式
Blackboard 用灵活性换来了调度复杂度:
- controller 决策不稳定;
- 黑板内容冲突或变脏;
- 多个专家重复劳动;
- 终止条件不清晰导致循环过长。
如果只需要入口分诊,不需要持续动态调度,Meta-Controller 更简单。
8. Meta-Controller:一次性路由
Meta-Controller 像分诊台。请求进来后,它只选择一个最合适的子 Agent,然后把任务交出去。
flowchart LR
U[用户请求] --> M[Meta-Controller]
M -->|通用问答| G[Generalist]
M -->|研究任务| R[Researcher]
M -->|代码任务| C[Coder]
G --> O[输出]
R --> O
C --> O
agno 写法
generalist = Agent(
name="generalist",
model=OpenAIChat(id="gpt-5-mini"),
role="Handles general Q&A.",
instructions="Answer general knowledge questions directly.",
)
researcher = Agent(
name="researcher",
model=OpenAIChat(id="gpt-5-mini"),
role="Handles research-heavy queries.",
tools=[DuckDuckGoTools()],
instructions="Use search for research tasks.",
)
coder = Agent(
name="coder",
model=OpenAIChat(id="gpt-5-mini"),
role="Handles Python coding tasks.",
instructions="Write, explain and debug Python code.",
)
meta = Team(
name="meta_controller",
mode="route",
model=OpenAIChat(id="gpt-5-mini"),
members=[generalist, researcher, coder],
instructions=(
"Choose exactly one member based on the request type. "
"Do not solve the task yourself."
),
)
和 Blackboard 的区别
| 架构 | 路由次数 | 控制中心 | 适合场景 |
|---|---|---|---|
| Meta-Controller | 一次 | 入口分类器 | 请求类型明确 |
| Blackboard | 多次 | 共享状态 + 调度器 | 中间状态会改变调度策略 |
Meta-Controller 最大风险是第一跳路由错误。因为它只选一次,选错后整个请求会走错方向。
9. Ensemble:用冗余降低单次偏差
Multi-Agent 解决分工,Ensemble 解决可靠性。它让多个 Agent 面对同一个问题,各自独立给出结论,再由 aggregator 汇总。
flowchart LR
U[同一个问题] --> A1[Agent A]
U --> A2[Agent B]
U --> A3[Agent C]
A1 --> AG[Aggregator]
A2 --> AG
A3 --> AG
AG --> O[综合结论]
agno 写法
from agno.workflow.v2 import Parallel
class FinalRecommendation(BaseModel):
final_recommendation: str
confidence_score: float
synthesis_summary: str
identified_opportunities: List[str]
identified_risks: List[str]
bullish = Agent(
name="bullish",
model=OpenAIChat(id="gpt-5-mini"),
instructions="Argue from a growth-oriented bullish perspective.",
)
value = Agent(
name="value",
model=OpenAIChat(id="gpt-5-mini"),
instructions="Analyze margin of safety and valuation risk.",
)
quant = Agent(
name="quant",
model=OpenAIChat(id="gpt-5-mini"),
instructions="Focus on quantitative ratios and trends.",
)
aggregator = Agent(
name="aggregator",
model=OpenAIChat(id="gpt-5-mini"),
response_model=FinalRecommendation,
instructions=(
"Synthesize different views. Preserve important disagreements "
"instead of averaging them away."
),
)
def run_view(agent):
def step(si):
out = agent.run(si.message).content
si.workflow_session_state.setdefault("views", {})[agent.name] = out
return out
return step
def aggregate_step(si):
views = si.workflow_session_state["views"]
body = "\n\n".join(f"[{name}]\n{view}" for name, view in views.items())
return aggregator.run(f"Question: {si.message}\n\nViews:\n{body}").content
ensemble_wf = Workflow(
name="ensemble",
session_state={"views": {}},
steps=[
Parallel(
name="parallel_views",
steps=[
Step(name="bullish", executor=run_view(bullish)),
Step(name="value", executor=run_view(value)),
Step(name="quant", executor=run_view(quant)),
],
),
Step(name="aggregate", executor=aggregate_step),
],
)
使用边界
Ensemble 适合高风险判断、事实核查、复杂研究结论。它的代价也很直接:成本随 Agent 数量上升,多个 Agent 还可能共享相同偏见。Aggregator 不应该强行制造一致,而应该保留关键分歧。
10. Episodic + Semantic Memory:长期状态进入系统
没有记忆的 Agent 只能依赖当前上下文。长期助手需要记住用户偏好、历史事件和稳定事实。
| 记忆类型 | 记住什么 | 常见存储 |
|---|---|---|
| Episodic Memory | 发生过的事件、用户偏好、历史互动 | 向量库、会话摘要 |
| Semantic Memory | 稳定事实、知识条目、实体属性 | 文档库、KV、结构化知识库 |
flowchart LR
U[用户输入] --> A[Agent]
A --> M[(Episodic Memory)]
A --> K[(Semantic Knowledge)]
M --> A
K --> A
A --> O[回答]
A -->|抽取偏好/事件| M
agno 写法
from agno.memory.v2.memory import Memory
from agno.memory.v2.db.sqlite import SqliteMemoryDb
from agno.knowledge.text import TextKnowledgeBase
from agno.vectordb.lancedb import LanceDb, SearchType
from agno.embedder.openai import OpenAIEmbedder
from agno.storage.sqlite import SqliteStorage
memory = Memory(
db=SqliteMemoryDb(
table_name="user_memories",
db_file="tmp/memory.db",
),
model=OpenAIChat(id="gpt-5-mini"),
)
knowledge = TextKnowledgeBase(
path="data/facts",
vector_db=LanceDb(
table_name="facts",
uri="tmp/lancedb",
search_type=SearchType.hybrid,
embedder=OpenAIEmbedder(id="text-embedding-3-small"),
),
)
mem_agent = Agent(
name="memorized_agent",
model=OpenAIChat(id="gpt-5-mini"),
memory=memory,
enable_agentic_memory=True,
enable_user_memories=True,
knowledge=knowledge,
search_knowledge=True,
add_history_to_messages=True,
num_history_responses=5,
storage=SqliteStorage(
table_name="sessions",
db_file="tmp/sessions.db",
),
markdown=True,
)
mem_agent.print_response(
"I'm allergic to peanuts and prefer low-carb meals. Remember that.",
user_id="alice",
)
mem_agent.print_response(
"Suggest a dinner plan for me.",
user_id="alice",
)
失败模式
记忆让系统更有连续性,也让错误变得持久:
- 错误偏好被写入;
- 相似但无关的历史被召回;
- 旧事实没有过期机制;
- 自动抽取质量差导致记忆库变脏。
向量记忆擅长“找相似内容”,但不擅长“沿关系多跳推理”。这类问题需要 Graph Memory。
11. Graph / World-Model Memory:从相似召回到关系推理
向量检索回答的是“哪段内容最像当前问题”。图记忆回答的是“实体之间有什么关系”。
flowchart LR
T[非结构化文本] --> E[实体与关系抽取]
E --> G[(Knowledge Graph)]
Q[自然语言问题] --> C[Text-to-Cypher]
C --> G
G --> R[查询结果]
R --> A[自然语言答案]
数据结构
class Node(BaseModel):
id: str = Field(description="Entity name or identifier.")
type: str = Field(description="Entity type, e.g. Person or Company.")
class Relationship(BaseModel):
source: Node
target: Node
type: str = Field(description="Relationship verb in ALL_CAPS.")
properties: dict = Field(default_factory=dict)
class KnowledgeGraph(BaseModel):
relationships: List[Relationship]
抽取与查询
graph_maker = Agent(
name="graph_maker",
model=OpenAIChat(id="gpt-5-mini"),
response_model=KnowledgeGraph,
instructions=(
"Extract entities and relationships from text. "
"Relationship type should be an ALL_CAPS verb."
),
)
kg = graph_maker.run(
"Tim Cook is the CEO of Apple. Apple acquired Beats in 2014."
).content
查询可以接入 Neo4j:
from agno.tools.neo4j import Neo4jTools
graph_query_agent = Agent(
name="graph_query",
model=OpenAIChat(id="gpt-5-mini"),
tools=[
Neo4jTools(
url="bolt://localhost:7687",
user="neo4j",
password="password",
)
],
instructions=(
"Answer questions over a Neo4j knowledge graph. "
"Generate Cypher, run it, then synthesize the answer."
),
)
graph_query_agent.print_response(
"Which companies did Apple acquire and in which year?"
)
失败模式
Graph Memory 的难点不只在 LLM:
| 环节 | 风险 |
|---|---|
| 实体抽取 | 同名实体混淆 |
| 关系抽取 | 边类型错误 |
| Schema 设计 | 图结构无法支持目标查询 |
| Text-to-Cypher | 查询语句错误 |
| 答案综合 | 查询结果正确但解释错误 |
12. Tree-of-Thoughts:把推理变成搜索
Tree-of-Thoughts(ToT)适合路径会分叉、需要回溯的问题。它不是让模型“想得更长”,而是让系统同时保留多条候选路径。
flowchart TD
R[初始状态] --> A[候选思路 A]
R --> B[候选思路 B]
R --> C[候选思路 C]
A --> A1[扩展 A1]
A --> A2[扩展 A2]
B --> B1[扩展 B1]
B --> B2[扩展 B2]
C --> C1[扩展 C1]
A2 --> S[找到解]
State 设计
class ToTState(BaseModel):
problem: str
active_paths: list
solution: list | None = None
这里的状态不再是一个当前答案,而是多条候选路径。
程序控制搜索,LLM 只生成候选
以“狼、羊、白菜过河”这类谜题为例,搜索控制应由代码负责,LLM 只负责提出候选动作。这样可以避免模型在深层搜索中丢状态。
class PuzzleState(BaseModel):
left_bank: frozenset = Field(
default_factory=lambda: frozenset({"wolf", "goat", "cabbage"})
)
right_bank: frozenset = Field(default_factory=frozenset)
boat_location: str = "left"
move_description: str = "Initial state."
class Config:
arbitrary_types_allowed = True
def is_valid(self) -> bool:
dangerous_pairs = [("wolf", "goat"), ("goat", "cabbage")]
unguarded_bank = (
self.left_bank if self.boat_location == "right"
else self.right_bank
)
return not any(
{a, b}.issubset(unguarded_bank)
for a, b in dangerous_pairs
)
def is_goal(self) -> bool:
return self.right_bank == frozenset({"wolf", "goat", "cabbage"})
搜索框架:
def expand(state: PuzzleState) -> list[PuzzleState]:
"""
Generate valid next states.
LLM can propose moves, but code must validate them.
"""
...
def solve_with_tot(initial: PuzzleState, max_depth: int = 10):
active_paths = [[initial]]
for _ in range(max_depth):
new_paths = []
for path in active_paths:
current = path[-1]
for next_state in expand(current):
if next_state in path:
continue
new_path = path + [next_state]
if next_state.is_goal():
return new_path
new_paths.append(new_path)
active_paths = new_paths
return None
失败模式
ToT 的主要成本是组合爆炸。它适合必须搜索和回溯的任务,不适合普通问答。很多场景用 ReAct 或 Planning 更便宜、更稳定。
13. Mental Loop / Simulator:行动前先模拟
有些任务不能在真实世界里随便试错,例如交易、机器人控制、生产配置变更。Mental Loop 把真实执行前移到模拟环境里评估。
flowchart LR
U[任务] --> A[Agent]
A --> S[simulate_action<br/>模拟执行]
S --> J{结果可接受?}
J -->|否| H[保持/换策略]
J -->|是| E[execute_action<br/>真实执行]
E --> O[状态更新]
模拟器设计
import copy
import numpy as np
from pydantic import BaseModel, Field
class Portfolio(BaseModel):
cash: float = 10000.0
shares: int = 0
def value(self, price: float) -> float:
return self.cash + self.shares * price
class MarketSimulator(BaseModel):
day: int = 0
price: float = 100.0
volatility: float = 0.1
drift: float = 0.01
portfolio: Portfolio = Field(default_factory=Portfolio)
def step(self, action: str, amount: float = 0.0):
if action == "buy":
count = int(amount)
cost = count * self.price
if self.portfolio.cash >= cost:
self.portfolio.shares += count
self.portfolio.cash -= cost
elif action == "sell":
count = min(int(amount), self.portfolio.shares)
self.portfolio.shares -= count
self.portfolio.cash += count * self.price
self.price *= 1 + float(np.random.normal(self.drift, self.volatility))
self.day += 1
REAL = MarketSimulator()
两类工具:模拟与提交
def simulate_action(action: str, amount: float, horizon: int = 5) -> str:
sim = copy.deepcopy(REAL)
sim.step(action, amount)
for _ in range(horizon - 1):
sim.step("hold")
return f"Simulated value after {horizon} days: ${sim.portfolio.value(sim.price):.2f}"
def execute_action(action: str, amount: float) -> str:
REAL.step(action, amount)
return f"Executed {action} {amount}. Portfolio value: ${REAL.portfolio.value(REAL.price):.2f}"
Agent 必须先模拟,再决定是否执行:
trader = Agent(
model=OpenAIChat(id="gpt-5-mini"),
tools=[simulate_action, execute_action],
instructions=[
"Before calling execute_action, call simulate_action.",
"If simulation is worse than holding, do not execute.",
],
show_tool_calls=True,
)
失败模式
Mental Loop 的上限通常不在 LLM,而在 simulator 的保真度。模拟器和现实差距越大,系统越容易做出“模拟里正确、现实里危险”的决策。
14. Dry-Run Harness:把副作用关进闸门
Tool Use 让 Agent 能做事,Dry-Run 负责限制它不能随便做危险的事。任何会产生副作用的动作,例如发邮件、发帖、下单、删数据、改配置,都应该先预演,再审批,再执行。
flowchart LR
U[用户意图] --> P[生成执行草案]
P --> D[Dry Run<br/>只预览不执行]
D --> A{审批通过?}
A -->|否| X[终止]
A -->|是| E[真实执行]
E --> O[执行结果]
工具自带 dry_run 参数
import datetime
import hashlib
from typing import List
def publish_post(content: str, hashtags: List[str], dry_run: bool = True) -> str:
ts = datetime.datetime.now().isoformat()
full_text = f"{content}\n\n" + " ".join(f"#{tag}" for tag in hashtags)
if dry_run:
return f"[DRY RUN @ {ts}] Would publish:\n---\n{full_text}\n---"
post_id = hashlib.md5(full_text.encode()).hexdigest()[:8]
return f"[LIVE @ {ts}] Published id={post_id}"
Workflow 中插入审批节点
proposer = Agent(
name="proposer",
model=OpenAIChat(id="gpt-5-mini"),
tools=[publish_post],
instructions=[
"When asked to publish, call publish_post with dry_run=True.",
"After preview, stop and wait for approval.",
"Do not call dry_run=False by yourself.",
],
show_tool_calls=True,
)
def propose_step(si):
return proposer.run(si.message).content
def approve_step(si):
preview = si.previous_step_output.content
print(f"\n--- PREVIEW ---\n{preview}\n---")
decision = input("Approve and go live? (y/n): ").strip().lower()
si.workflow_session_state["approved"] = decision == "y"
return decision
def commit_step(si):
if not si.workflow_session_state.get("approved"):
return "Rejected. Nothing was published."
return proposer.run(
"Publish the approved post by calling publish_post with dry_run=False."
).content
dry_run_wf = Workflow(
name="dry_run",
session_state={"approved": False},
steps=[
Step(name="propose", executor=propose_step),
Step(name="approve", executor=approve_step),
Step(name="commit", executor=commit_step),
],
)
失败模式
| 风险 | 说明 |
|---|---|
| 审批瓶颈 | 人工确认会降低自动化程度 |
| 预演与执行不一致 | dry-run 结果和真实执行环境不同 |
| 预览泄漏 | preview 内容本身可能包含敏感信息 |
Dry-Run 的目标不是让 Agent 更聪明,而是让副作用变成可审查的控制流节点。
15. Metacognitive Agent:系统知道自己该不该做
Metacognitive Agent 不急着回答,而是先判断:这个问题是否在能力范围内,风险是否可接受,是否需要工具,是否应该转给人类。
flowchart TD
U[用户问题] --> S[Self Model<br/>能力与风险判断]
S --> R{策略}
R -->|直接回答| A[Answer]
R -->|使用工具| T[Tool Answer]
R -->|升级处理| H[Human Escalation]
State 设计
from typing import Optional
class MetacognitiveAnalysis(BaseModel):
confidence: float = Field(description="0.0 to 1.0")
strategy: str = Field(description="reason_directly | use_tool | escalate")
reasoning: str
tool_to_use: Optional[str] = None
系统还需要一份自我模型:
AGENT_SELF_MODEL = {
"knowledge_domains": ["general health", "nutrition", "exercise"],
"tools_available": ["symptom_checker"],
"confidence_threshold": 0.7,
"high_risk_topics": [
"prescription dosage",
"emergency medical advice",
],
}
路由实现
self_model_agent = Agent(
name="self_model",
model=OpenAIChat(id="gpt-5-mini"),
response_model=MetacognitiveAnalysis,
instructions=(
f"Your self-model is: {AGENT_SELF_MODEL}. "
"Estimate confidence and choose one strategy: "
"reason_directly, use_tool or escalate."
),
)
def symptom_checker(symptoms: str) -> str:
return f"Reference information for: {symptoms}"
responder = Agent(
model=OpenAIChat(id="gpt-5-mini"),
tools=[symptom_checker],
)
def analyze_self_step(si):
analysis = self_model_agent.run(si.message).content
si.workflow_session_state["analysis"] = analysis
return analysis
def route_strategy(si):
analysis = si.workflow_session_state["analysis"]
if analysis.strategy == "reason_directly":
return [
Step(
name="answer",
executor=lambda s: responder.run(s.message).content,
)
]
if analysis.strategy == "use_tool":
return [
Step(
name="tool_answer",
executor=lambda s: responder.run(
f"Use {analysis.tool_to_use} to answer: {s.message}"
).content,
)
]
return [
Step(
name="escalate",
executor=lambda s: (
"I am not confident or not allowed to answer this. "
"Escalating to a human expert."
),
)
]
metacog_wf = Workflow(
name="metacognitive",
session_state={},
steps=[
Step(name="self_model", executor=analyze_self_step),
Router(name="route_strategy", selector=route_strategy),
],
)
为什么边界感知重要
医疗、法律、金融等场景里,Agent 的关键能力不一定是“回答更多”,而是能识别高风险问题并拒绝自动处理。失败模式也很明确:置信度估计不准。低估会过度保守,高估会危险地自信。
16. Self-Improvement Loop:质量优化做成循环
Reflection 只做一次批评和修订。Self-Improvement 把质量优化变成循环:
flowchart LR
G[Generator<br/>生成] --> C[Critic<br/>评价]
C --> J{通过?}
J -->|否| G
J -->|是| M[(Gold Memory<br/>高质量样例库)]
M --> G
它有两层状态:
- 单次任务内的
last_output、last_critique、revision_count; - 跨任务的高质量样例库。
核心代码
class EmailCritique(BaseModel):
is_approved: bool
feedback: str
class MarketingEmail(BaseModel):
subject: str
body: str
generator = Agent(
name="email_generator",
model=OpenAIChat(id="gpt-5-mini"),
response_model=MarketingEmail,
instructions="Write a marketing email. Follow feedback if provided.",
)
critic = Agent(
name="email_critic",
model=OpenAIChat(id="gpt-5-mini"),
response_model=EmailCritique,
instructions=(
"Approve only if the subject is compelling, "
"the body has a clear CTA, and the tone is on-brand."
),
)
高质量样例库:
class GoldStandardMemory:
def __init__(self):
self.examples: list[MarketingEmail] = []
def few_shot_block(self) -> str:
if not self.examples:
return "No gold examples yet."
return "\n\n---\n\n".join(
f"Subject: {e.subject}\nBody:\n{e.body}"
for e in self.examples[-3:]
)
def add(self, email: MarketingEmail):
self.examples.append(email)
GOLD = GoldStandardMemory()
循环步骤:
def generate_email_step(si):
state = si.workflow_session_state
previous = state.get("last_email")
critique = state.get("last_critique")
prompt = f"Task: {si.message}\n\nGold examples:\n{GOLD.few_shot_block()}\n\n"
if previous and critique:
prompt += (
f"Previous draft:\nSubject: {previous.subject}\nBody:\n{previous.body}\n\n"
f"Feedback: {critique.feedback}\n"
"Rewrite to address the feedback."
)
email = generator.run(prompt).content
state["last_email"] = email
state["revision"] = state.get("revision", 0) + 1
return email
def critique_email_step(si):
email = si.workflow_session_state["last_email"]
verdict = critic.run(
f"Review:\nSubject: {email.subject}\nBody:\n{email.body}"
).content
si.workflow_session_state["last_critique"] = verdict
if verdict.is_approved:
GOLD.add(email)
return verdict
def should_stop(outputs):
state = self_improve_wf.session_state
verdict = state.get("last_critique")
if verdict and verdict.is_approved:
return True
return state.get("revision", 0) >= 3
self_improve_wf = Workflow(
name="self_improve",
session_state={"revision": 0},
steps=[
Loop(
name="refine_loop",
steps=[
Step(name="generate", executor=generate_email_step),
Step(name="critique", executor=critique_email_step),
],
end_condition=should_stop,
)
],
)
失败模式
Self-Improvement 不等于自动变好。它依赖稳定的评价标准和干净的高质量样例库。如果 critic 标准漂移,或者低质量内容进入 Gold Memory,后续生成会被反向污染。
17. Cellular Automata:LLM 退出主执行循环
Cellular Automata 是范式切换。前面的架构都有中心 Agent 或 orchestrator,而元胞自动机让每个格子只根据邻居状态更新,全局行为由局部规则涌现。
flowchart TD
R[局部更新规则] --> C1[Cell 1]
R --> C2[Cell 2]
R --> C3[Cell 3]
R --> C4[Cell 4]
C1 <--> C2
C2 <--> C3
C3 <--> C4
C1 <--> C4
C1 --> G[全局结构涌现]
C2 --> G
C3 --> G
C4 --> G
路径扩散示例
每个 cell 只知道自己的类型和邻居,不知道全局地图。
class CellAgent(BaseModel):
type: str # EMPTY | OBSTACLE | GOAL
pathfinding_value: float = float("inf")
def update(self, neighbors: list["CellAgent"]):
if self.type == "OBSTACLE":
return
if self.type == "GOAL":
self.pathfinding_value = 0
return
best_neighbor = min(
(n.pathfinding_value for n in neighbors),
default=float("inf"),
)
self.pathfinding_value = min(
self.pathfinding_value,
best_neighbor + 1,
)
同步更新网格:
import copy
def neighbors_of(snapshot, row, col):
height = len(snapshot)
width = len(snapshot[0])
result = []
for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
nr, nc = row + dr, col + dc
if 0 <= nr < height and 0 <= nc < width:
result.append(snapshot[nr][nc])
return result
def run_ca(grid, steps=50):
for _ in range(steps):
snapshot = [
[copy.deepcopy(cell) for cell in row]
for row in grid
]
for r in range(len(grid)):
for c in range(len(grid[0])):
grid[r][c].update(neighbors_of(snapshot, r, c))
在这种架构里,LLM 不再负责主循环,它最多负责设计规则、解释结果或调整参数。
rule_designer = Agent(
name="rule_designer",
model=OpenAIChat(id="gpt-5-mini"),
instructions=(
"Design local update rules for a cellular automaton. "
"Each cell can only use neighbor states."
),
)
适用边界
Cellular Automata 适合局部规则能产生全局效果的问题,例如扩散、路径传播、简单群体行为模拟。它不适合需要复杂语义推理和全局规划的语言任务。
Evaluator 是闭环系统的核心
只要 Agent 会循环、重试、重规划、审批或拒绝,就必须有 evaluator。否则系统不知道何时停,也不知道何时改。
| Evaluator 类型 | 作用 | 示例 |
|---|---|---|
| LLM-as-a-Judge | 独立模型评分 | 检查答案完整性 |
| 内置 Critic | 控制迭代是否继续 | Reflection、Self-Improvement |
| 程序化验证 | 硬规则校验 | is_valid()、is_goal() |
| Human-in-the-Loop | 人工最终闸门 | Dry-Run 审批 |
| 多场景验证 | 观察系统行为稳定性 | 测试不同输入路径 |
没有 evaluator 的循环很危险,因为它只是不断调用模型;有 evaluator 的循环才有机会变成可靠系统。
怎么选架构:先问缺哪种控制能力
| 缺少的能力 | 优先选择 | 原因 |
|---|---|---|
| 输出质量不稳 | Reflection | 成本低,能快速增加批评和修订 |
| 需要多轮质量逼近 | Self-Improvement | 把修订做成循环 |
| 需要实时信息或外部能力 | Tool Use | 接入搜索、API、数据库 |
| 需要观察后继续行动 | ReAct | 工具结果能驱动下一步 |
| 需要全局步骤控制 | Planning | 计划变成可检查的数据结构 |
| 工具结果不可靠 | PEV | 执行后必须验证 |
| 单 Agent 角色冲突 | Multi-Agent | 角色边界清晰 |
| 角色调用顺序动态变化 | Blackboard | 共享状态驱动调度 |
| 请求类型差异很大 | Meta-Controller | 入口一次性分诊 |
| 单次结论风险高 | Ensemble | 多视角冗余降低偏差 |
| 需要跨轮个性化 | Episodic Memory | 记住用户偏好和历史 |
| 需要知识检索 | Semantic Memory | 引入外部知识库 |
| 需要关系推理 | Graph Memory | 支持实体关系和多跳查询 |
| 需要回溯搜索 | Tree-of-Thoughts | 保留多条候选路径 |
| 真实试错代价高 | Mental Loop | 先模拟再执行 |
| 动作有副作用 | Dry-Run | 预演、审批、提交分离 |
| 高风险问题需要拒答 | Metacognitive | 系统先判断自己该不该做 |
| 适合局部规则求解 | Cellular Automata | 去中心化涌现比中央规划更合适 |
控制流视角下的 Agent 架构
17 种架构可以压缩成同一条演进线:
flowchart LR
S[状态显式化] --> C[控制流显式化]
C --> V[验证显式化]
V --> M[记忆显式化]
M --> B[边界显式化]
可靠 Agent 的关键不是“更敢自动执行”,而是更清楚地知道:
- 当前状态是什么;
- 下一步由什么规则决定;
- 结果由谁验证;
- 错误在哪里被截断;
- 什么时候重试;
- 什么时候停止;
- 什么时候交给工具;
- 什么时候交给人。
遇到新的 Agent 架构名词,可以直接问三个问题:
- 它新增了什么 State?
- 它新增了什么 Router?
- 它新增了什么 Evaluator?
如果这三个问题回答不出来,它大概率只是旧模式换了名字。真正有工程价值的 Agent 架构,一定会让状态更清晰、控制流更可解释、错误更容易被局部截断。