大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享原創(chuàng) 精華

發(fā)布于 2024-9-13 12:45

瀏覽

0收藏

2024年，大語(yǔ)言模型智能體LLM Agent吸引了越來(lái)越多人的關(guān)注，各種技術(shù)層出不窮。相比于傳統(tǒng)機(jī)器學(xué)習(xí)方法，LLM Agent借助于大語(yǔ)言模型的涌現(xiàn)能力，能夠?qū)崿F(xiàn)少樣本、零樣本學(xué)習(xí)，并且能夠?qū)崿F(xiàn)與現(xiàn)實(shí)物理世界的交互。如何入門(mén)這一令人興奮的領(lǐng)域呢？小編找到了OpenAI研究員Lilian Weng分享的博客，并全文翻譯過(guò)來(lái)，原文地址：https://lilianweng.github.io/posts/2023-06-23-agent

構(gòu)建以大型語(yǔ)言模型（LLM）為核心控制器的智能體是一個(gè)極富創(chuàng)意的概念。幾個(gè)概念驗(yàn)證演示，如AutoGPT、GPT-Engineer和BabyAGI，都提供了鼓舞人心的示例。LLM的潛力遠(yuǎn)不止于生成優(yōu)美的文案、故事、文章和程序；它可以被看作是一種強(qiáng)大的通用問(wèn)題解決者。

智能體系統(tǒng)概覽

在由LLM驅(qū)動(dòng)的自主智能體系統(tǒng)中，LLM充當(dāng)智能體的“大腦”，并由以下幾個(gè)關(guān)鍵組件輔助：

規(guī)劃

子目標(biāo)與分解：智能體將大任務(wù)分解為更小、更易于管理的子目標(biāo)，從而高效處理復(fù)雜任務(wù)。
反思與完善：智能體能夠?qū)^(guò)去的行為進(jìn)行自我批評(píng)和自我反思，從錯(cuò)誤中學(xué)習(xí)并在未來(lái)的步驟中進(jìn)行改進(jìn)，從而提升最終結(jié)果的質(zhì)量。

記憶

短期記憶：我將所有上下文學(xué)習(xí)（參見(jiàn)提示工程）視為利用模型的短期記憶來(lái)學(xué)習(xí)。
長(zhǎng)期記憶：這為智能體提供了長(zhǎng)期保留和回憶（無(wú)限）信息的能力，通常是通過(guò)利用外部向量存儲(chǔ)和快速檢索來(lái)實(shí)現(xiàn)。

工具使用

智能體學(xué)會(huì)調(diào)用外部API以獲取模型權(quán)重中缺失的額外信息（通常在預(yù)訓(xùn)練后難以更改），包括當(dāng)前信息、代碼執(zhí)行能力、訪問(wèn)專(zhuān)有信息源等。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖1 大語(yǔ)言模型智能體基本結(jié)構(gòu)

組件一：規(guī)劃

一個(gè)復(fù)雜的任務(wù)通常涉及許多步驟。智能體需要知道這些步驟是什么，并且提前規(guī)劃。

任務(wù)分解

思維鏈（CoT；Wei等人，2022年）已經(jīng)成為一種標(biāo)準(zhǔn)的提示技術(shù)，用于提升模型在復(fù)雜任務(wù)上的性能。模型被指示“一步一步思考”，以利用更多的測(cè)試時(shí)計(jì)算來(lái)將困難任務(wù)分解為更小、更簡(jiǎn)單的步驟。CoT將大任務(wù)轉(zhuǎn)化為多個(gè)可管理的任務(wù)，并揭示了對(duì)模型思考過(guò)程的解釋。

思維樹(shù)（Yao等人，2023年）通過(guò)在每一步探索多種推理可能性來(lái)擴(kuò)展CoT。它首先將問(wèn)題分解為多個(gè)思維步驟，并在每一步生成多個(gè)想法，創(chuàng)建一個(gè)樹(shù)狀結(jié)構(gòu)。搜索過(guò)程可以是廣度優(yōu)先搜索（BFS）或深度優(yōu)先搜索（DFS），每個(gè)狀態(tài)由分類(lèi)器（通過(guò)提示）或多數(shù)投票進(jìn)行評(píng)估。

任務(wù)分解可以通過(guò)以下方式完成：（1）使用LLM進(jìn)行簡(jiǎn)單提示，如“實(shí)現(xiàn)XYZ的步驟。\n1.”，“實(shí)現(xiàn)XYZ的子目標(biāo)是什么？”；（2）使用特定任務(wù)的指令；例如，寫(xiě)小說(shuō)時(shí)使用“寫(xiě)出故事大綱。”；（3）通過(guò)人類(lèi)輸入。

另一種截然不同的方法，LLM+P（Liu等人，2023年），涉及依賴(lài)外部經(jīng)典規(guī)劃器進(jìn)行長(zhǎng)期規(guī)劃。這種方法使用規(guī)劃領(lǐng)域定義語(yǔ)言（PDDL）作為中間接口來(lái)描述規(guī)劃問(wèn)題。在這個(gè)過(guò)程中，LLM（1）將問(wèn)題翻譯成“問(wèn)題PDDL”，然后（2）請(qǐng)求經(jīng)典規(guī)劃器根據(jù)現(xiàn)有的“領(lǐng)域PDDL”生成PDDL計(jì)劃，最后（3）將PDDL計(jì)劃翻譯回自然語(yǔ)言。本質(zhì)上，規(guī)劃步驟是外包給外部工具的，假設(shè)有特定領(lǐng)域的PDDL和合適的規(guī)劃器，這在某些機(jī)器人設(shè)置中很常見(jiàn)，但在許多其他領(lǐng)域則不然。

自我反思

自我反思是一個(gè)至關(guān)重要的方面，它允許自主智能體通過(guò)改進(jìn)過(guò)去的行動(dòng)決策和糾正先前的錯(cuò)誤來(lái)迭代式地提升。它在現(xiàn)實(shí)世界任務(wù)中扮演著關(guān)鍵角色，因?yàn)樵谶@些任務(wù)中試錯(cuò)是不可避免的。

ReAct（Yao等人，2023年）通過(guò)將行動(dòng)空間擴(kuò)展為特定任務(wù)的離散行動(dòng)和語(yǔ)言空間的組合，將推理和行動(dòng)整合到LLM中。前者使LLM能夠與環(huán)境互動(dòng)（例如使用維基百科搜索API），而后者提示LLM以自然語(yǔ)言生成推理軌跡。

ReAct提示模板包含了LLM思考的明確步驟，大致格式為：

Thought: ...
Action: ...
Observation: ...
... (Repeated many times)

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖2 ReAct推理例子

在知識(shí)密集型任務(wù)和決策任務(wù)的實(shí)驗(yàn)中，ReAct的表現(xiàn)都優(yōu)于僅執(zhí)行行動(dòng)的基線模型，后者移除了“思考：…步驟”。

Reflexion（Shinn & Labash，2023年）是一個(gè)框架，它為智能體裝備了動(dòng)態(tài)記憶和自我反思能力，以提升推理技能。Reflexion具有標(biāo)準(zhǔn)的強(qiáng)化學(xué)習(xí)（RL）設(shè)置，其中獎(jiǎng)勵(lì)模型提供一個(gè)簡(jiǎn)單的二元獎(jiǎng)勵(lì)，行動(dòng)空間遵循ReAct中的設(shè)置，任務(wù)特定的行動(dòng)空間通過(guò)語(yǔ)言進(jìn)行擴(kuò)展，以實(shí)現(xiàn)復(fù)雜的推理步驟。在每次行動(dòng)之后，智能體計(jì)算一個(gè)啟發(fā)式值，并且根據(jù)自我反思的結(jié)果，可以選擇性地決定重置環(huán)境以開(kāi)始新的嘗試。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖3 Reflexion基本結(jié)構(gòu)

啟發(fā)式函數(shù)用于確定軌跡何時(shí)效率低下或包含幻覺(jué)，并應(yīng)停止。效率低下的規(guī)劃指的是在沒(méi)有成功的情況下耗時(shí)過(guò)長(zhǎng)的軌跡。幻覺(jué)定義為遇到一系列連續(xù)相同的行動(dòng)，這些行動(dòng)導(dǎo)致環(huán)境中出現(xiàn)相同的觀察結(jié)果。

自我反思是通過(guò)向LLM展示兩個(gè)示例來(lái)創(chuàng)建的，每個(gè)示例都是一個(gè)（失敗的軌跡，理想的反思，用于指導(dǎo)未來(lái)計(jì)劃中的變化）對(duì)。然后，將這些反思添加到智能體的工作記憶中，最多三個(gè)，以作為查詢(xún)LLM的上下文。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖4 AlfWorld和HotpotQA上的實(shí)驗(yàn)結(jié)果

鏈?zhǔn)胶笠?jiàn)之明（CoH；Liu等人，2023年）通過(guò)明確地向模型展示一系列帶有反饋?zhàn)⒔獾倪^(guò)去輸出，鼓勵(lì)模型改進(jìn)自己的輸出。人類(lèi)反饋數(shù)據(jù)是一系列集合，其中是提示，每個(gè) 是模型的完成輸出，是人類(lèi)對(duì) 的評(píng)分，是相應(yīng)的人類(lèi)提供的后見(jiàn)之明反饋。假設(shè)反饋元組按獎(jiǎng)勵(lì)排序，過(guò)程是監(jiān)督式微調(diào)，數(shù)據(jù)形式為，其中。模型被微調(diào)以?xún)H預(yù)測(cè) ，條件是序列前綴，使得模型可以根據(jù)反饋序列進(jìn)行自我反思，以產(chǎn)生更好的輸出。在測(cè)試時(shí)，模型可以選擇性地接收人類(lèi)注釋者的多輪指令。

為了防止過(guò)擬合，CoH添加了一個(gè)正則化項(xiàng)，以最大化預(yù)訓(xùn)練數(shù)據(jù)集的對(duì)數(shù)似然。為了避免走捷徑和復(fù)制（因?yàn)榉答佇蛄兄杏性S多常見(jiàn)詞匯），他們?cè)谟?xùn)練過(guò)程中隨機(jī)掩蓋了0% - 5%的過(guò)去令牌。

他們的實(shí)驗(yàn)中的訓(xùn)練數(shù)據(jù)集是WebGPT比較、人類(lèi)反饋的摘要和人類(lèi)偏好數(shù)據(jù)集的組合。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖5 CoH實(shí)驗(yàn)結(jié)果

CoH（鏈?zhǔn)胶笠?jiàn)之明）的理念是在上下文中呈現(xiàn)一系列逐漸改進(jìn)的輸出歷史，并訓(xùn)練模型承接這一趨勢(shì)以產(chǎn)生更好的輸出。算法蒸餾（AD；Laskin等人，2023年）將相同的理念應(yīng)用于強(qiáng)化學(xué)習(xí)任務(wù)中的跨情節(jié)軌跡，其中算法被封裝在一個(gè)長(zhǎng)期歷史條件策略中。考慮到智能體多次與環(huán)境互動(dòng)，并且在每個(gè)情節(jié)中智能體都會(huì)有所進(jìn)步，AD將這種學(xué)習(xí)歷史連接起來(lái)并輸入到模型中。因此，我們應(yīng)當(dāng)期望下一個(gè)預(yù)測(cè)的行動(dòng)比之前的嘗試帶來(lái)更好的性能。目標(biāo)是學(xué)習(xí)強(qiáng)化學(xué)習(xí)的過(guò)程，而不是訓(xùn)練特定任務(wù)的政策本身。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖6 算法蒸餾基本框架

該論文假設(shè)，任何生成一組學(xué)習(xí)歷史的算法都可以通過(guò)執(zhí)行行為克隆來(lái)蒸餾成一個(gè)神經(jīng)網(wǎng)絡(luò)。這些歷史數(shù)據(jù)是由一組源策略生成的，每個(gè)源策略都是針對(duì)特定任務(wù)進(jìn)行訓(xùn)練的。在訓(xùn)練階段，每次RL運(yùn)行時(shí)，都會(huì)隨機(jī)抽取一個(gè)任務(wù)，并使用多情節(jié)歷史的子序列進(jìn)行訓(xùn)練，從而使得學(xué)到的策略與任務(wù)無(wú)關(guān)。

在現(xiàn)實(shí)中，模型的上下文窗口長(zhǎng)度是有限的，因此情節(jié)應(yīng)該足夠短，以便構(gòu)建多情節(jié)歷史。為了學(xué)習(xí)接近最優(yōu)的上下文內(nèi)RL算法，需要2-4情節(jié)的多情節(jié)上下文。上下文內(nèi)RL的出現(xiàn)需要足夠長(zhǎng)的上下文。

與三個(gè)基線進(jìn)行比較，包括ED（專(zhuān)家蒸餾，使用專(zhuān)家軌跡而不是學(xué)習(xí)歷史的行為克隆）、源策略（用于通過(guò)UCB生成蒸餾軌跡的策略）、RL^2（Duan等人，2017年；用作上限，因?yàn)樗枰诰€RL），AD展示了上下文內(nèi)RL，其性能接近RL^2，盡管只使用了離線RL，并且比其他基線學(xué)習(xí)得更快。當(dāng)條件是源策略的部分訓(xùn)練歷史時(shí)，AD也比ED基線改進(jìn)得更快。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖7 AD、ED等對(duì)比結(jié)果

組件二：記憶

記憶類(lèi)型

記憶可以定義為用于獲取、存儲(chǔ)、保留和隨后檢索信息的過(guò)程。人類(lèi)大腦中有幾種類(lèi)型的記憶。

感官記憶：這是記憶的最早階段，它提供了在原始刺激結(jié)束后保留感官信息（視覺(jué)、聽(tīng)覺(jué)等）印象的能力。感官記憶通常只能持續(xù)幾秒鐘。子類(lèi)別包括圖像記憶（視覺(jué)）、回聲記憶（聽(tīng)覺(jué)）和觸覺(jué)記憶（觸覺(jué)）。
短期記憶（STM）或工作記憶：它存儲(chǔ)我們當(dāng)前意識(shí)到并且需要用來(lái)執(zhí)行復(fù)雜認(rèn)知任務(wù)（如學(xué)習(xí)和推理）的信息。短期記憶被認(rèn)為大約能容納7個(gè)項(xiàng)目（Miller 1956）并且持續(xù)20-30秒。
長(zhǎng)期記憶（LTM）：長(zhǎng)期記憶可以存儲(chǔ)信息的時(shí)間非常長(zhǎng)，從幾天到幾十年不等，并且具有實(shí)質(zhì)上無(wú)限存儲(chǔ)容量的特點(diǎn)。長(zhǎng)期記憶有兩個(gè)子類(lèi)型：
顯式/陳述性記憶：這是關(guān)于事實(shí)和事件的記憶，指的是那些可以自覺(jué)回憶的記憶，包括情景記憶（事件和經(jīng)歷）和語(yǔ)義記憶（事實(shí)和概念）。
隱式/程序性記憶：這種類(lèi)型的記憶是無(wú)意識(shí)的，涉及自動(dòng)執(zhí)行的能力和常規(guī)，如騎自行車(chē)或鍵盤(pán)打字。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖8 人類(lèi)記憶分類(lèi)

我們可以大致考慮以下映射：

感官記憶類(lèi)似于學(xué)習(xí)原始輸入的嵌入表示，包括文本、圖像或其他模態(tài)；
短期記憶類(lèi)似于上下文內(nèi)的學(xué)習(xí)。它是短暫且有限的，因?yàn)樗艿阶儞Q器有限上下文窗口長(zhǎng)度的限制。
長(zhǎng)期記憶類(lèi)似于智能體在查詢(xún)時(shí)可以關(guān)注的外部向量存儲(chǔ)，通過(guò)快速檢索即可訪問(wèn)。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖9 MIPS性能對(duì)比

最大內(nèi)積搜索（MIPS）

外部記憶可以緩解有限注意力范圍的限制。一種標(biāo)準(zhǔn)做法是將信息的嵌入表示保存到支持快速最大內(nèi)積搜索（MIPS）的向量存儲(chǔ)數(shù)據(jù)庫(kù)中。為了優(yōu)化檢索速度，常見(jiàn)的做法是使用近似最近鄰（ANN）算法來(lái)返回大約最接近的 top k 個(gè)鄰居，以犧牲一點(diǎn)準(zhǔn)確性的損失來(lái)?yè)Q取巨大的速度提升。

用于快速M(fèi)IPS的常見(jiàn)ANN算法選擇包括：

LSH（局部敏感哈希）：它引入了一個(gè)哈希函數(shù)，使得相似的輸入項(xiàng)以高概率映射到相同的桶中，其中桶的數(shù)量遠(yuǎn)小于輸入的數(shù)量。
ANNOY（Approximate Nearest Neighbors Oh Yeah）：核心數(shù)據(jù)結(jié)構(gòu)是隨機(jī)投影樹(shù)，這是一組二叉樹(shù)，每個(gè)非葉子節(jié)點(diǎn)代表一個(gè)將輸入空間一分為二的超平面，每個(gè)葉子存儲(chǔ)一個(gè)數(shù)據(jù)點(diǎn)。樹(shù)是獨(dú)立且隨機(jī)構(gòu)建的，因此在某種程度上，它模仿了哈希函數(shù)。ANNOY搜索在所有樹(shù)中進(jìn)行，通過(guò)迭代搜索最接近查詢(xún)的一半，然后聚合結(jié)果。這個(gè)想法與KD樹(shù)有關(guān)，但更具可擴(kuò)展性。
HNSW（層次可導(dǎo)航小世界）：它受到小世界網(wǎng)絡(luò)思想的啟發(fā)，其中大多數(shù)節(jié)點(diǎn)可以在少數(shù)幾步內(nèi)被任何其他節(jié)點(diǎn)訪問(wèn)；例如，社交網(wǎng)絡(luò)的“六度分離”特征。HNSW構(gòu)建了這些小世界圖的分層層，其中底層包含實(shí)際的數(shù)據(jù)點(diǎn)。中間層創(chuàng)建快捷方式以加速搜索。執(zhí)行搜索時(shí)，HNSW從頂層的隨機(jī)節(jié)點(diǎn)開(kāi)始，向目標(biāo)導(dǎo)航。當(dāng)它無(wú)法再接近時(shí)，它移動(dòng)到下一層，直到到達(dá)底層。在上層中的每次移動(dòng)都可能在大數(shù)據(jù)空間中覆蓋一段較大的距離，而在下層的每次移動(dòng)都提高了搜索質(zhì)量。
FAISS（Facebook AI相似性搜索）：它基于高維空間中節(jié)點(diǎn)間距離遵循高斯分布的假設(shè)，因此數(shù)據(jù)點(diǎn)應(yīng)該存在聚類(lèi)。FAISS通過(guò)將向量空間劃分為簇，然后在簇內(nèi)細(xì)化量化來(lái)應(yīng)用向量量化。搜索首先使用粗量化查找簇候選者，然后在每個(gè)簇中使用更精細(xì)的量化進(jìn)一步查找。
ScaNN（可擴(kuò)展最近鄰）：ScaNN的主要?jiǎng)?chuàng)新是各向異性向量量化。它將數(shù)據(jù)點(diǎn) ( q ) 量化為 ( q' )，使得內(nèi)積 ( q' \cdot q ) 盡可能接近原始距離 ( q )，而不是選擇最近的量化中心點(diǎn)。

組件三：工具使用

工具使用是人類(lèi)的一個(gè)顯著且獨(dú)特的特征。我們創(chuàng)造、修改并利用外部物體來(lái)完成超出我們身體和認(rèn)知極限的事情。為L(zhǎng)LMs配備外部工具可以顯著擴(kuò)展模型的能力。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖10 海獺在使用工具

MRKL（Karpas等人，2022年），即“模塊化推理、知識(shí)和語(yǔ)言”，是一種用于自主智能體的神經(jīng)符號(hào)架構(gòu)。MRKL系統(tǒng)被提議包含一系列“專(zhuān)家”模塊，而通用型LLM則作為一個(gè)路由器，將詢(xún)問(wèn)引導(dǎo)至最適合的專(zhuān)家模塊。這些模塊可以是神經(jīng)型的（例如深度學(xué)習(xí)模型）或符號(hào)型的（例如數(shù)學(xué)計(jì)算器、貨幣轉(zhuǎn)換器、天氣API）。

他們進(jìn)行了一項(xiàng)實(shí)驗(yàn)，微調(diào)LLM以調(diào)用計(jì)算器，使用算術(shù)作為測(cè)試案例。他們的實(shí)驗(yàn)顯示，解決口頭數(shù)學(xué)問(wèn)題比明確陳述的數(shù)學(xué)問(wèn)題更難，因?yàn)長(zhǎng)LM（7B Jurassic1-large模型）無(wú)法可靠地提取基本算術(shù)的正確參數(shù)。這些結(jié)果突出了外部符號(hào)工具何時(shí)可以可靠地工作，了解何時(shí)以及如何使用工具是至關(guān)重要的，這取決于LLM的能力。

TALM（工具增強(qiáng)的語(yǔ)言模型；Parisi等人，2022年）和Toolformer（Schick等人，2023年）都是微調(diào)語(yǔ)言模型以學(xué)習(xí)使用外部工具API。數(shù)據(jù)集是基于新添加的API調(diào)用注釋是否可以提高模型輸出的質(zhì)量來(lái)擴(kuò)展的。有關(guān)更多詳細(xì)信息，請(qǐng)參見(jiàn)“提示工程”部分的“外部API”部分。

ChatGPT插件和OpenAI API函數(shù)調(diào)用是LLMs增強(qiáng)工具使用能力的實(shí)際應(yīng)用示例。工具API的集合可以由其他開(kāi)發(fā)者提供（如插件）或自行定義（如函數(shù)調(diào)用）。

HuggingGPT（Shen等人，2023年）是一個(gè)框架，它使用ChatGPT作為任務(wù)規(guī)劃器，根據(jù)模型描述選擇HuggingFace平臺(tái)上的可用模型，并根據(jù)執(zhí)行結(jié)果總結(jié)響應(yīng)。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖11 HuggingGPT工作原理

系統(tǒng)包括4個(gè)階段：

（1）任務(wù)規(guī)劃：LLM作為大腦，將用戶(hù)請(qǐng)求解析為多個(gè)任務(wù)。每個(gè)任務(wù)關(guān)聯(lián)有四個(gè)屬性：任務(wù)類(lèi)型、ID、依賴(lài)關(guān)系和參數(shù)。他們使用少量示例來(lái)指導(dǎo)LLM進(jìn)行任務(wù)解析和規(guī)劃。指令：

The AI assistant can parse user input to several tasks: [{"task": task, "id", task_id, "dep": dependency_task_ids, "args": {"text": text, "image": URL, "audio": URL, "video": URL}}]. The "dep" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag "-task_id" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can't be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.

（2）模型選擇：LLM將任務(wù)分配給專(zhuān)家模型，其中請(qǐng)求被表述為多項(xiàng)選擇題。LLM被提供一份模型列表以供選擇。由于上下文長(zhǎng)度的限制，需要基于任務(wù)類(lèi)型的過(guò)濾。指令：

Given the user request and the call command, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The AI assistant merely outputs the model id of the most appropriate model. The output must be in a strict JSON format: "id": "id", "reason": "your detail reason for the choice". We have a list of models for you to choose from {{ Candidate Models }}. Please select one model from the list.

（3）任務(wù)執(zhí)行：專(zhuān)家模型執(zhí)行特定的任務(wù)并記錄結(jié)果。指令：

With the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.

（4）響應(yīng)生成：LLM接收?qǐng)?zhí)行結(jié)果，并為用戶(hù)提供總結(jié)結(jié)果。要將HuggingGPT應(yīng)用于實(shí)際世界，需要解決幾個(gè)挑戰(zhàn)：（1）需要提高效率，因?yàn)長(zhǎng)LM推理輪次和其他模型的交互會(huì)減慢過(guò)程；（2）依賴(lài)于長(zhǎng)上下文窗口來(lái)處理復(fù)雜的任務(wù)內(nèi)容；（3）提高LLM輸出和外部模型服務(wù)的穩(wěn)定性。API-Bank（Li等人，2023年）是評(píng)估工具增強(qiáng)LLM性能的基準(zhǔn)。它包含53個(gè)常用的API工具，一個(gè)完整的工具增強(qiáng)LLM工作流程，以及264個(gè)涉及568個(gè)API調(diào)用的注釋對(duì)話。API的選擇相當(dāng)多樣，包括搜索引擎、計(jì)算器、日歷查詢(xún)、智能家居控制、日程管理、健康數(shù)據(jù)管理、賬戶(hù)認(rèn)證工作流等。由于API數(shù)量眾多，LLM首先可以訪問(wèn)API搜索引擎以找到要調(diào)用的正確API，然后使用相應(yīng)的文檔進(jìn)行調(diào)用。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖12 API-Bank算法流程

在API-Bank工作流程中，LLM需要做出幾個(gè)決策，在每個(gè)步驟我們都可以評(píng)估這個(gè)決策的準(zhǔn)確性。決策包括：

是否需要API調(diào)用。
確定要調(diào)用的正確API：如果不理想，LLM需要迭代修改API輸入（例如，為搜索引擎API決定搜索關(guān)鍵詞）。
根據(jù)API結(jié)果生成響應(yīng)：如果結(jié)果不滿(mǎn)意，模型可以選擇再次細(xì)化并調(diào)用。

這個(gè)基準(zhǔn)評(píng)估了智能體在三個(gè)層次的工具使用能力：

第1級(jí)評(píng)估調(diào)用API的能力。給定API的描述，模型需要確定是否調(diào)用給定的API，正確調(diào)用，并適當(dāng)響應(yīng)API返回的結(jié)果。
第2級(jí)檢查檢索API的能力。模型需要搜索可能解決用戶(hù)需求的API，并通過(guò)閱讀文檔學(xué)習(xí)如何使用它們。
第3級(jí)評(píng)估超出檢索和調(diào)用的API規(guī)劃能力。給定不明確的用戶(hù)請(qǐng)求（例如，安排小組會(huì)議、為旅行預(yù)訂航班/酒店/餐廳），模型可能需要進(jìn)行多次API調(diào)用來(lái)解決它。

案例研究

科學(xué)發(fā)現(xiàn)代理

ChemCrow（Bran等人，2023年）是一個(gè)特定領(lǐng)域的示例，LLM被增強(qiáng)為13個(gè)專(zhuān)家設(shè)計(jì)的工具，以完成有機(jī)合成、藥物發(fā)現(xiàn)和材料設(shè)計(jì)等任務(wù)。在LangChain中實(shí)現(xiàn)的工作流程反映了之前描述的ReAct和MRKLs，并將CoT推理與任務(wù)相關(guān)的工具結(jié)合起來(lái)：

LLM被提供了一個(gè)工具名稱(chēng)列表、它們的用途描述以及預(yù)期的輸入/輸出詳情。
然后，它被指示在必要時(shí)使用提供的工具回答用戶(hù)給定的提示。提示建議模型遵循ReAct格式 - 思考、行動(dòng)、行動(dòng)輸入、觀察。

一個(gè)有趣的觀察是，盡管LLM評(píng)估得出GPT-4和ChemCrow的表現(xiàn)幾乎相當(dāng)，但以解決方案的完成和化學(xué)正確性為導(dǎo)向的人類(lèi)評(píng)估顯示，ChemCrow在很大程度上超過(guò)了GPT-4。這表明使用LLM評(píng)估其在需要深厚專(zhuān)業(yè)知識(shí)領(lǐng)域的表現(xiàn)可能存在潛在問(wèn)題。缺乏專(zhuān)業(yè)知識(shí)可能導(dǎo)致LLM不知道其缺陷，因此無(wú)法很好地判斷任務(wù)結(jié)果的正確性。Boiko等人（2023年）還研究了LLM增強(qiáng)的代理在科學(xué)發(fā)現(xiàn)中的應(yīng)用，以處理自主設(shè)計(jì)、規(guī)劃和執(zhí)行復(fù)雜科學(xué)實(shí)驗(yàn)。這個(gè)代理可以使用工具瀏覽互聯(lián)網(wǎng)、閱讀文檔、執(zhí)行代碼、調(diào)用機(jī)器人實(shí)驗(yàn)API并利用其他LLM。

例如，當(dāng)被要求“開(kāi)發(fā)一種新型抗癌藥物”時(shí)，模型提出了以下推理步驟：

詢(xún)問(wèn)當(dāng)前抗癌藥物發(fā)現(xiàn)的趨勢(shì)；
選擇一個(gè)目標(biāo)；
請(qǐng)求針對(duì)這些化合物的支架；
一旦確定了化合物，模型嘗試進(jìn)行合成。

他們也討論了風(fēng)險(xiǎn)，特別是與非法藥物和生物武器有關(guān)的風(fēng)險(xiǎn)。他們開(kāi)發(fā)了一個(gè)測(cè)試集，包含了一份已知化學(xué)武器代理的列表，并要求代理合成它們。11個(gè)請(qǐng)求中有4個(gè)（36%）被接受以獲取合成解決方案，而代理嘗試咨詢(xún)文檔以執(zhí)行程序。7個(gè)請(qǐng)求被拒絕，其中5個(gè)是在網(wǎng)絡(luò)搜索后被拒絕的，而另外2個(gè)是基于提示直接被拒絕的。

生成代理模擬

生成代理（Park等人，2023年）是一個(gè)超級(jí)有趣的實(shí)驗(yàn)，其中25個(gè)虛擬角色，每個(gè)角色都由一個(gè)LLM驅(qū)動(dòng)的代理控制，在一個(gè)沙盒環(huán)境中生活和互動(dòng)，靈感來(lái)自《模擬人生》。生成代理為交互式應(yīng)用創(chuàng)建了可信的人類(lèi)行為模擬。生成代理的設(shè)計(jì)結(jié)合了LLM、記憶、規(guī)劃和反思機(jī)制，使代理能夠根據(jù)過(guò)去的經(jīng)驗(yàn)行為，并與其他代理互動(dòng)。

記憶流：是一個(gè)長(zhǎng)期記憶模塊（外部數(shù)據(jù)庫(kù)），記錄了代理的自然語(yǔ)言體驗(yàn)的全面列表。每個(gè)元素都是一個(gè)觀察結(jié)果，直接由代理提供。代理之間的通信可以觸發(fā)新的自然語(yǔ)言語(yǔ)句。
檢索模型：根據(jù)相關(guān)性、最近性和重要性，向代理提供上下文以指導(dǎo)其行為。

最近性：近期事件得分更高
重要性：區(qū)分日常記憶和核心記憶。直接問(wèn)LLM。
相關(guān)性：基于當(dāng)前情況/查詢(xún)的相關(guān)性。

反思機(jī)制：隨著時(shí)間的推移，將記憶合成成更高層次的推論，并指導(dǎo)代理的未來(lái)行為。這是對(duì)過(guò)去事件的高級(jí)總結(jié)（請(qǐng)注意，這與上面的自我反思有所不同）。

提示LLM使用最近的100個(gè)觀察結(jié)果，并生成一組觀察/陳述中3個(gè)最突出的高級(jí)問(wèn)題。然后要求LLM回答這些問(wèn)題。

規(guī)劃和反應(yīng)：將反思和環(huán)境信息轉(zhuǎn)化為行動(dòng)

規(guī)劃本質(zhì)上是為了在時(shí)間和瞬間優(yōu)化可信度。
提示模板：{代理X的介紹}。以下是X今天的計(jì)劃大綱：1)
代理之間的關(guān)系以及一個(gè)代理對(duì)另一個(gè)代理的觀察都在規(guī)劃和反應(yīng)中被考慮。
環(huán)境信息以樹(shù)狀結(jié)構(gòu)呈現(xiàn)。

這個(gè)有趣的模擬結(jié)果導(dǎo)致了涌現(xiàn)的社會(huì)行為，例如信息傳播、關(guān)系記憶（例如，兩個(gè)代理繼續(xù)對(duì)話主題）和社會(huì)事件的協(xié)調(diào)（例如，舉辦派對(duì)并邀請(qǐng)其他人）。

大語(yǔ)言模型智能體怎么入門(mén)？來(lái)看看OpenAI研究員Lilian Weng的干貨分享-AI.x社區(qū)

圖 13 生成式智能體框架

概念驗(yàn)證示例

AutoGPT已經(jīng)引起了人們對(duì)使用LLM作為主要控制器來(lái)建立自主代理的可能性的大量關(guān)注。由于自然語(yǔ)言接口，AutoGPT存在相當(dāng)多的可靠性問(wèn)題，但無(wú)論如何都是一個(gè)酷的概念驗(yàn)證演示。AutoGPT中有很多代碼是關(guān)于格式解析的。

以下是AutoGPT使用的系統(tǒng)消息，其中{{...}}是用戶(hù)輸入：

You are {{ai-name}}, {{user-provided AI bot description}}.
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:

1. {{user-provided goal 1}}
2. {{user-provided goal 2}}
3. ...
4. ...
5. ...

Constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"
5. Use subprocesses for commands that will not terminate within a few minutes

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args:
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Analyze Code: "analyze_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Send Tweet: "send_tweet", args: "text": "<text>"
19. Do Nothing: "do_nothing", args:
20. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"

Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.

Performance Evaluation:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.

You should only respond in JSON format as described below
Response Format:
{
    "thoughts": {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted\n- list that conveys\n- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    },
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    }
}
Ensure the response can be parsed by Python json.loads

GPT-Engineer是另一個(gè)項(xiàng)目，旨在根據(jù)自然語(yǔ)言指定的任務(wù)創(chuàng)建一個(gè)完整的代碼庫(kù)。GPT-Engineer被指示思考一系列更小的組件來(lái)構(gòu)建，并根據(jù)需要向用戶(hù)請(qǐng)求輸入以澄清問(wèn)題。

以下是GPT-Engineer發(fā)送到OpenAI ChatCompletion端點(diǎn)的任務(wù)澄清示例對(duì)話。用戶(hù)輸入被包裹在{{user input text}}中。

[
  {
    "role": "system",
    "content": "You will read instructions and not carry them out, only seek to clarify them.\nSpecifically you will first summarise a list of super short bullets of areas that need clarification.\nThen you will pick one clarifying question, and wait for an answer from the user.\n"
  },
  {
    "role": "user",
    "content": "We are writing {{a Super Mario game in python. MVC components split in separate files. Keyboard control.}}\n"
  },
  {
    "role": "assistant",
    "content": "Summary of areas that need clarification:\n1. Specifics of the Super Mario game (e.g. level design, characters, gameplay mechanics)\n2. Details about the MVC components (e.g. which components are in each file)\n3. Keyboard control implementation (e.g. which keys to use, how to handle input)\n\nClarifying question:\nCan you provide more details about the Super Mario game, such as level design, characters, and gameplay mechanics?"
  },
  {
    "role": "user",
    "content": "{{There are 10 levels in total. The main character is a plumber named Mario, who can walk and jump. It is a classical platform game just like Super Mario. The main character moves from left to right, trying to get to the destination, where there are many obstacles and attacks from enemies in the process.}}\n\nIs anything else unclear? If yes, only answer in the form:\n{remaining unclear areas} remaining questions.\n{Next question}\nIf everything is sufficiently clear, only answer \"Nothing more to clarify.\"."
  },
  {
    "role": "assistant",
    "content": "Remaining unclear areas: 2 remaining questions.\nCan you provide more information about how the MVC components are split into separate files?"
  },
  {
    "role": "user",
    "content": "{{Make your own assumptions and state them explicitly before starting}}"
  }
]

然后在這些澄清之后，代理進(jìn)入了代碼編寫(xiě)模式，使用了一個(gè)不同的系統(tǒng)消息。系統(tǒng)消息：

You will get instructions for code to write. You will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code. Make sure that every detail of the architecture is, in the end, implemented as code.
Think step by step and reason yourself to the right decisions to make sure we get it right. You will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.

Then you will output the content of each file including ALL code. Each file must strictly follow a markdown code block format, where the following tokens must be replaced such that FILENAME is the lowercase file name including the file extension, LANG is the markup code block language for the code’s language, and CODE is the code:

FILENAME

CODE
You will start with the “entrypoint” file, then go to the ones that are imported by that file, and so on. Please note that the code should be fully functional. No placeholders.

Follow a language and framework appropriate best practice file naming convention. Make sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other. Ensure to implement all code, if you are unsure, write a plausible implementation. Include module dependency or package manager dependency definition file. Before you finish, double check that all parts of the architecture is present in the files.

Useful to know: You almost always put different classes in different files. For Python, you always create an appropriate requirements.txt file. For NodeJS, you always create an appropriate package.json file. You always add a comment briefly describing the purpose of the function definition. You try to add comments explaining very complex bits of logic. You always follow the best practices for the requested languages in terms of describing the code written as a defined package/project.

Python toolbelt preferences:

pytest
dataclasses

對(duì)話樣本：

[
  {
    "role": "system",
    "content": "You will get instructions for code to write.\nYou will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code.\nMake sure that every detail of the architecture is, in the end, implemented as code.\n\nThink step by step and reason yourself to the right decisions to make sure we get it right.\nYou will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.\n\nThen you will output the content of each file including ALL code.\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\nFILENAME is the lowercase file name including the file extension,\nLANG is the markup code block language for the code's language, and CODE is the code:\n\nFILENAME\n```LANG\nCODE\n```\n\nYou will start with the \"entrypoint\" file, then go to the ones that are imported by that file, and so on.\nPlease note that the code should be fully functional. No placeholders.\n\nFollow a language and framework appropriate best practice file naming convention.\nMake sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other.\nEnsure to implement all code, if you are unsure, write a plausible implementation.\nInclude module dependency or package manager dependency definition file.\nBefore you finish, double check that all parts of the architecture is present in the files.\n\nUseful to know:\nYou almost always put different classes in different files.\nFor Python, you always create an appropriate requirements.txt file.\nFor NodeJS, you always create an appropriate package.json file.\nYou always add a comment briefly describing the purpose of the function definition.\nYou try to add comments explaining very complex bits of logic.\nYou always follow the best practices for the requested languages in terms of describing the code written as a defined\npackage/project.\n\n\nPython toolbelt preferences:\n- pytest\n- dataclasses\n"
  },
 #  … same conversation as earlier, ended with "Make your own assumptions and state them explicitly before starting".
  {
    "role": "assistant",
    "content": "Assumptions:\n1. Model: The model will contain the game's data, such as level information, character states, and enemy positions.\n2. View: The view will handle the game's visuals, including rendering the game objects, backgrounds, and updating the display.\n3. Controller: The controller will manage user input, such as keyboard controls, and update the model accordingly.\n\nWith these assumptions in mind, is there any specific aspect of the keyboard control implementation you would like me to clarify?"
  },
  {
    "role": "user",
    "content": "Please now remember the steps:\n\nThink step by step and reason yourself to the right decisions to make sure we get it right.\nFirst lay out the names of the core classes, functions, methods that will be necessary, As well as a quick comment on their purpose.\n\nThen you will output the content of each file including ALL code.\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\nFILENAME is the lowercase file name including the file extension,\nLANG is the markup code block language for the code's language, and CODE is the code:\n\nFILENAME\n```LANG\nCODE\n```\n\nPlease note that the code should be fully functional. No placeholders.\n\nYou will start with the \"entrypoint\" file, then go to the ones that are imported by that file, and so on.\nFollow a language and framework appropriate best practice file naming convention.\nMake sure that files contain all imports, types etc. The code should be fully functional. Make sure that code in different files are compatible with each other.\nBefore you finish, double check that all parts of the architecture is present in the files.\n"
  }
]

挑戰(zhàn)

有限的上下文長(zhǎng)度：受限的上下文容量限制了歷史信息的包含、詳細(xì)指令、API調(diào)用上下文和響應(yīng)。系統(tǒng)設(shè)計(jì)必須與這種有限的通信帶寬一起工作，而自我反思等機(jī)制從過(guò)去的錯(cuò)誤中學(xué)習(xí)將受益于長(zhǎng)或無(wú)限上下文窗口。盡管向量存儲(chǔ)和檢索可以提供對(duì)更廣泛知識(shí)庫(kù)的訪問(wèn)，但它們的表示能力不如全注意強(qiáng)大。
長(zhǎng)期規(guī)劃和任務(wù)分解的挑戰(zhàn)：規(guī)劃長(zhǎng)期歷史和有效探索解決方案空間仍然具有挑戰(zhàn)性。LLM在遇到意外錯(cuò)誤時(shí)難以調(diào)整計(jì)劃，這使得它們與人類(lèi)相比，在從試錯(cuò)中學(xué)習(xí)方面更脆弱。
自然語(yǔ)言接口的可靠性：當(dāng)前的智能體系統(tǒng)依賴(lài)于自然語(yǔ)言作為L(zhǎng)LM與外部組件（如記憶和工具）之間的接口。然而，模型輸出的可靠性值得懷疑，因?yàn)長(zhǎng)LM可能會(huì)出現(xiàn)格式錯(cuò)誤，并偶爾表現(xiàn)出叛逆行為（例如，拒絕遵循指令）。因此，智能體演示代碼的大部分集中在解析模型輸出上。

本文轉(zhuǎn)載自公眾號(hào)AIGC最前線

原文鏈接：??https://mp.weixin.qq.com/s/N4z6wyJTEpk890iIALjfiQ??

?著作權(quán)歸作者所有，如需轉(zhuǎn)載，請(qǐng)注明出處，否則將追究法律責(zé)任

標(biāo)簽

大語(yǔ)言模型

智能體

贊

回復(fù)