Over the past year, we've worked with dozens of teams building large language model (LLM) agents across industries. Consistently, the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns.
过去一年中,我们与数十个团队合作,跨行业构建大型语言模型(LLM)代理。一致发现,最成功的实施并非依赖复杂框架或专用库,而是采用简单、可组合的模式进行构建。
In this post, we share what we’ve learned from working with our customers and building agents ourselves, and give practical advice for developers on building effective agents.
在这篇文章中,我们分享了从与客户合作及自行构建代理过程中学到的经验,并为开发者提供了构建高效代理的实用建议。
"Agent" can be defined in several ways. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:
“代理” 可以有多种定义方式。一些客户将代理定义为能够在较长时间内独立运行的全自主系统,它们利用各种工具完成复杂任务。另一些客户则用该术语来描述遵循预定义工作流程的更具规定性的实现。在 Anthropic,我们将所有这些变体归类为代理系统,但在工作流程和代理之间划出了一个重要的架构区别:
Below, we will explore both types of agentic systems in detail. In Appendix 1 (“Agents in Practice”), we describe two domains where customers have found particular value in using these kinds of systems.
下面,我们将详细探讨这两种代理系统。在附录 1(“实践中的代理”)中,我们描述了客户在使用这类系统时发现特别有价值的两个领域。
When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.
使用 LLMs 构建应用程序时,我们建议尽可能寻找最简单的解决方案,仅在必要时增加复杂性。这可能意味着完全不构建代理系统。代理系统通常以延迟和成本为代价换取更好的任务性能,您应考虑这种权衡在何时是合理的。
When more complexity is warranted, workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale. For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough.
当需要更高的复杂性时,工作流为定义明确的任务提供了可预测性和一致性,而在需要大规模灵活性和模型驱动决策的情况下,代理则是更好的选择。然而,对于许多应用来说,通过检索和上下文示例优化单个 LLM 调用通常就足够了。
There are many frameworks that make agentic systems easier to implement, including:
有许多框架使得代理系统更易于实现,包括:
These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice.
这些框架通过简化诸如调用 LLMs、定义和解析工具以及将调用链式连接在一起等标准低级任务,使得入门变得容易。然而,它们常常会创建额外的抽象层,这些层可能会掩盖底层的提示和响应,使得调试变得更加困难。它们还可能诱使人们在简单的设置就足够的情况下增加复杂性。
We suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code. If you do use a framework, ensure you understand the underlying code. Incorrect assumptions about what's under the hood are a common source of customer error.
我们建议开发者直接使用 LLM API 开始:许多模式只需几行代码即可实现。如果确实使用框架,请确保理解底层代码。对内部机制的错误假设是客户错误的常见来源。
See our cookbook for some sample implementations.
查看我们的手册以获取一些示例实现。
In this section, we’ll explore the common patterns for agentic systems we’ve seen in production. We'll start with our foundational building block—the augmented LLM—and progressively increase complexity, from simple compositional workflows to autonomous agents.
在本节中,我们将探讨在生产环境中常见的代理系统模式。我们将从基础构建模块 —— 增强型 LLM—— 开始,逐步增加复杂性,从简单的组合工作流到自主代理。
构建模块:增强的 LLM
The basic building block of agentic systems is an LLM enhanced with augmentations such as retrieval, tools, and memory. Our current models can actively use these capabilities—generating their own search queries, selecting appropriate tools, and determining what information to retain.
代理系统的基本构建模块是一个 LLM,通过检索、工具和记忆等增强功能进行扩展。我们当前的模型能够主动利用这些能力 —— 生成自己的搜索查询、选择合适的工具,并决定保留哪些信息。
The augmented LLM 增强的 LLM
We recommend focusing on two key aspects of the implementation: tailoring these capabilities to your specific use case and ensuring they provide an easy, well-documented interface for your LLM. While there are many ways to implement these augmentations, one approach is through our recently released Model Context Protocol, which allows developers to integrate with a growing ecosystem of third-party tools with a simple client implementation.
我们建议重点关注实施的两个关键方面:根据您的具体使用场景定制这些功能,并确保它们为您的 LLM 提供一个易于使用且文档完善的接口。虽然实现这些增强功能的方法有很多,但一种方法是通过我们最近发布的模型上下文协议,该协议允许开发者通过简单的客户端实现与不断增长的第三方工具生态系统集成。
For the remainder of this post, we'll assume each LLM call has access to these augmented capabilities.
在本文的剩余部分,我们将假设每次 LLM 调用都能访问这些增强功能。
工作流程:提示链
Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks (see "gate” in the diagram below) on any intermediate steps to ensure that the process is still on track.
提示链将任务分解为一系列步骤,其中每个 LLM 调用处理前一个调用的输出。您可以在任何中间步骤添加程序化检查(见下图中的 “门控”),以确保过程仍在正轨上。
The prompt chaining workflow
提示链工作流程
When to use this workflow: This workflow is ideal for situations where the task can be easily and cleanly decomposed into fixed subtasks. The main goal is to trade off latency for higher accuracy, by making each LLM call an easier task.
何时使用此工作流程:此工作流程非常适合任务能够轻松且清晰地分解为固定子任务的情况。其主要目标是通过使每个 LLM 调用成为更简单的任务,以牺牲延迟换取更高的准确性。
Examples where prompt chaining is useful:
提示链有用的示例:
Routing classifies an input and directs it to a specialized followup task. This workflow allows for separation of concerns, and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs.
路由对输入进行分类并将其引导至专门的后续任务。这种工作流程实现了关注点的分离,并构建了更为专业化的提示。若缺乏此流程,针对某类输入的优化可能会损害其他输入的性能。
The routing workflow 路由工作流程
When to use this workflow: Routing works well for complex tasks where there are distinct categories that are better handled separately, and where classification can be handled accurately, either by an LLM or a more traditional classification model/algorithm.
何时使用此工作流程:路由适用于处理复杂任务,这些任务具有明显不同的类别,更适合分开处理,并且可以通过 LLM 或更传统的分类模型 / 算法准确进行分类。
Examples where routing is useful:
路由有用的示例:
工作流程:并行化
LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow, parallelization, manifests in two key variations:
LLMs 有时可以同时处理一个任务,并通过编程方式聚合它们的输出。这种工作流程,即并行化,主要体现在两种关键变体中:
The parallelization workflow
并行化工作流程
When to use this workflow: Parallelization is effective when the divided subtasks can be parallelized for speed, or when multiple perspectives or attempts are needed for higher confidence results. For complex tasks with multiple considerations, LLMs generally perform better when each consideration is handled by a separate LLM call, allowing focused attention on each specific aspect.
何时使用此工作流程:当划分的子任务可以并行化以加快速度,或需要多个视角或尝试以获得更高置信度的结果时,并行化是有效的。对于具有多重考虑的复杂任务,LLMs 通常在每次考虑由单独的 LLM 调用处理时表现更好,这样可以对每个具体方面给予集中关注。
Examples where parallelization is useful:
并行化有用的示例:
工作流程:协调器 - 工作者
In the orchestrator-workers workflow, a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.
在协调器 - 工作者工作流中,一个中央 LLM 动态地分解任务,将它们分配给工作者 LLMs,并综合它们的结果。
The orchestrator-workers workflow
编排器 - 工作器工作流
When to use this workflow: This workflow is well-suited for complex tasks where you can’t predict the subtasks needed (in coding, for example, the number of files that need to be changed and the nature of the change in each file likely depend on the task). Whereas it’s topographically similar, the key difference from parallelization is its flexibility—subtasks aren't pre-defined, but determined by the orchestrator based on the specific input.
何时使用此工作流程:此工作流程非常适合复杂任务,其中无法预测所需的子任务(例如,在编码中,需要更改的文件数量以及每个文件中更改的性质可能取决于任务)。虽然它在拓扑结构上相似,但与并行化的关键区别在于其灵活性 —— 子任务不是预先定义的,而是由协调器根据特定输入确定的。
Example where orchestrator-workers is useful:
编排器 - 工作器模式适用的示例:
工作流程:评估者 - 优化器
In the evaluator-optimizer workflow, one LLM call generates a response while another provides evaluation and feedback in a loop.
在评估者 - 优化器工作流程中,一个 LLM 调用生成响应,而另一个则在循环中提供评估和反馈。
The evaluator-optimizer workflow
评估者 - 优化器工作流程
When to use this workflow: This workflow is particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value. The two signs of good fit are, first, that LLM responses can be demonstrably improved when a human articulates their feedback; and second, that the LLM can provide such feedback. This is analogous to the iterative writing process a human writer might go through when producing a polished document.
何时使用此工作流程:当我们有明确的评估标准,且迭代改进能带来可衡量的价值时,此工作流程尤为有效。适合的两个标志是:首先,当人类明确表达反馈时,LLM 响应能得到显著改善;其次,LLM 能够提供此类反馈。这类似于人类作家在创作精炼文档时可能经历的迭代写作过程。
Examples where evaluator-optimizer is useful:
评估优化器有用的示例:
Agents are emerging in production as LLMs mature in key capabilities—understanding complex inputs, engaging in reasoning and planning, using tools reliably, and recovering from errors. Agents begin their work with either a command from, or interactive discussion with, the human user. Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgement. During execution, it's crucial for the agents to gain “ground truth” from the environment at each step (such as tool call results or code execution) to assess its progress. Agents can then pause for human feedback at checkpoints or when encountering blockers. The task often terminates upon completion, but it’s also common to include stopping conditions (such as a maximum number of iterations) to maintain control.
随着 LLMs 在关键能力上的成熟 —— 理解复杂输入、参与推理和规划、可靠地使用工具以及从错误中恢复 —— 代理正在生产中崭露头角。代理开始工作时,要么接受人类用户的指令,要么与其进行互动讨论。一旦任务明确,代理便独立规划和操作,必要时会返回人类用户处获取更多信息或判断。在执行过程中,代理在每一步从环境中获取 “真实情况”(如工具调用结果或代码执行)以评估其进展至关重要。随后,代理可以在检查点或遇到阻碍时暂停,等待人类反馈。任务通常在完成后终止,但为了保持控制,也常包含停止条件(如最大迭代次数)。
Agents can handle sophisticated tasks, but their implementation is often straightforward. They are typically just LLMs using tools based on environmental feedback in a loop. It is therefore crucial to design toolsets and their documentation clearly and thoughtfully. We expand on best practices for tool development in Appendix 2 ("Prompt Engineering your Tools").
代理能够处理复杂的任务,但其实现通常较为直接。它们通常只是 LLMs,在一个循环中基于环境反馈使用工具。因此,清晰且深思熟虑地设计工具集及其文档至关重要。我们在附录 2(“工具提示工程”)中详细阐述了工具开发的最佳实践。
Autonomous agent 自主代理
When to use agents: Agents can be used for open-ended problems where it’s difficult or impossible to predict the required number of steps, and where you can’t hardcode a fixed path. The LLM will potentially operate for many turns, and you must have some level of trust in its decision-making. Agents' autonomy makes them ideal for scaling tasks in trusted environments.
何时使用代理:代理适用于开放式问题,这类问题难以或无法预测所需的步骤数量,且无法硬编码固定路径。LLM 可能会运行多个回合,因此您必须对其决策能力有一定程度的信任。代理的自主性使其成为在可信环境中扩展任务的理想选择。
The autonomous nature of agents means higher costs, and the potential for compounding errors. We recommend extensive testing in sandboxed environments, along with the appropriate guardrails.
代理的自主性意味着更高的成本和错误累积的潜在风险。我们建议在沙盒环境中进行广泛测试,并设置适当的防护措施。
Examples where agents are useful:
代理有用的示例:
The following examples are from our own implementations:
以下示例来自我们自己的实现:
High-level flow of a coding agent
编码代理的高级流程
These building blocks aren't prescriptive. They're common patterns that developers can shape and combine to fit different use cases. The key to success, as with any LLM features, is measuring performance and iterating on implementations. To repeat: you should consider adding complexity only when it demonstrably improves outcomes.
这些构建模块并非一成不变。它们是开发者可以根据不同使用场景塑造和组合的常见模式。与任何 LLM 功能一样,成功的关键在于衡量性能并对实现进行迭代。重申一遍:只有在明显改善结果的情况下,才应考虑增加复杂性。
Success in the LLM space isn't about building the most sophisticated system. It's about building the right system for your needs. Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short.
在 LLM 领域的成功,不在于构建最复杂的系统,而在于构建适合自身需求的正确系统。从简单的提示开始,通过全面评估进行优化,仅当更简单的解决方案无法满足需求时,才添加多步骤的代理系统。
When implementing agents, we try to follow three core principles:
在实施代理时,我们努力遵循三个核心原则:
Frameworks can help you get started quickly, but don't hesitate to reduce abstraction layers and build with basic components as you move to production. By following these principles, you can create agents that are not only powerful but also reliable, maintainable, and trusted by their users.
框架可以帮助您快速入门,但在进入生产阶段时,不要犹豫减少抽象层并使用基础组件进行构建。遵循这些原则,您可以创建不仅功能强大,而且可靠、可维护并赢得用户信任的代理。
Written by Erik Schluntz and Barry Zhang. This work draws upon our experiences building agents at Anthropic and the valuable insights shared by our customers, for which we're deeply grateful.
本文由 Erik Schluntz 和 Barry Zhang 撰写。本作品借鉴了我们在 Anthropic 构建代理的经验以及客户分享的宝贵见解,对此我们深表感谢。
Our work with customers has revealed two particularly promising applications for AI agents that demonstrate the practical value of the patterns discussed above. Both applications illustrate how agents add the most value for tasks that require both conversation and action, have clear success criteria, enable feedback loops, and integrate meaningful human oversight.
我们与客户的合作揭示了 AI 代理的两个特别有前景的应用,这些应用展示了上述模式的实用价值。这两个应用都说明了代理如何为需要对话和行动、具有明确成功标准、能够实现反馈循环并整合有意义的人类监督的任务增加最大价值。
Customer support combines familiar chatbot interfaces with enhanced capabilities through tool integration. This is a natural fit for more open-ended agents because:
客户支持将熟悉的聊天机器人界面与通过工具集成增强的功能相结合。这对于更开放的代理来说是一个自然的选择,因为:
Several companies have demonstrated the viability of this approach through usage-based pricing models that charge only for successful resolutions, showing confidence in their agents' effectiveness.
多家公司已通过基于使用量的定价模式证明了这种方法的可行性,该模式仅对成功解决的问题收费,显示出对其代理有效性的信心。
The software development space has shown remarkable potential for LLM features, with capabilities evolving from code completion to autonomous problem-solving. Agents are particularly effective because:
软件开发领域展现了 LLM 功能的显著潜力,其能力已从代码补全发展到自主解决问题。代理之所以特别有效,原因在于:
In our own implementation, agents can now solve real GitHub issues in the SWE-bench Verified benchmark based on the pull request description alone. However, whereas automated testing helps verify functionality, human review remains crucial for ensuring solutions align with broader system requirements.
在我们的实现中,代理现在能够仅根据拉取请求描述解决 SWE-bench Verified 基准中的真实 GitHub 问题。然而,尽管自动化测试有助于验证功能,但人工审查对于确保解决方案符合更广泛的系统要求仍然至关重要。
No matter which agentic system you're building, tools will likely be an important part of your agent. Tools enable Claude to interact with external services and APIs by specifying their exact structure and definition in our API. When Claude responds, it will include a tool use block in the API response if it plans to invoke a tool. Tool definitions and specifications should be given just as much prompt engineering attention as your overall prompts. In this brief appendix, we describe how to prompt engineer your tools.
无论您正在构建何种代理系统,工具很可能都是您代理的重要组成部分。工具使 Claude 能够通过在我们的 API 中指定其确切结构和定义来与外部服务和 API 进行交互。当 Claude 响应时,如果它计划调用工具,它将在 API 响应中包含一个工具使用块。工具定义和规范应与您的整体提示一样受到提示工程的高度重视。在这个简短的附录中,我们描述了如何对您的工具进行提示工程。
There are often several ways to specify the same action. For instance, you can specify a file edit by writing a diff, or by rewriting the entire file. For structured output, you can return code inside markdown or inside JSON. In software engineering, differences like these are cosmetic and can be converted losslessly from one to the other. However, some formats are much more difficult for an LLM to write than others. Writing a diff requires knowing how many lines are changing in the chunk header before the new code is written. Writing code inside JSON (compared to markdown) requires extra escaping of newlines and quotes.
指定同一操作通常有多种方式。例如,您可以通过编写差异(diff)或重写整个文件来指定文件编辑。对于结构化输出,您可以在 markdown 或 JSON 中返回代码。在软件工程中,这些差异是表面上的,可以无损地从一种格式转换为另一种格式。然而,某些格式对于 LLM 来说比其他格式更难编写。编写差异需要知道在编写新代码之前块头中有多少行正在更改。在 JSON 中编写代码(与 markdown 相比)需要对换行符和引号进行额外的转义。
Our suggestions for deciding on tool formats are the following:
我们关于决定工具格式的建议如下:
One rule of thumb is to think about how much effort goes into human-computer interfaces (HCI), and plan to invest just as much effort in creating good agent-computer interfaces (ACI). Here are some thoughts on how to do so:
一个经验法则是考虑人机界面(HCI)投入了多少努力,并计划在创建良好的代理 - 计算机界面(ACI)时投入同样多的努力。以下是一些关于如何做到这一点的思考:
While building our agent for SWE-bench, we actually spent more time optimizing our tools than the overall prompt. For example, we found that the model would make mistakes with tools using relative filepaths after the agent had moved out of the root directory. To fix this, we changed the tool to always require absolute filepaths—and we found that the model used this method flawlessly.
在构建我们的 SWE-bench 代理时,我们实际上花了更多时间优化工具,而不是整体提示。例如,我们发现当代理移出根目录后,模型在使用相对文件路径的工具时会出错。为了解决这个问题,我们将工具更改为始终要求绝对文件路径 —— 我们发现模型使用这种方法时毫无瑕疵。
原文链接:https://www.anthropic.com/research/building-effective-agents