DeepSeek-R1图形界面Agent指南

开源大型语言模型 (LLM) 的兴起使得创建可与 OpenAI 的 ChatGPT Operator 等专有解决方案相媲美的 AI 驱动工具变得比以往任何时候都更容易。

在这些开源模型中，DeepSeek R1 以其强大的推理能力、免费可访问性和适应性脱颖而出。通过将 DeepSeek R1 与 Browser Use 等工具相结合，你可以构建一个功能强大、完全开源的 ChatGPT Operator 替代品，而无需花费数百美元购买高级订阅。

本文将指导你完成设置 DeepSeek R1 和 Browser Use 的过程，以创建能够执行复杂任务的 AI 代理，包括 Web 自动化、推理和自然语言交互。

无论你是初学者还是经验丰富的开发人员，本分步指南都将帮助你入门。

1、ChatGPT Operator概述

ChatGPT Operator 是 OpenAI 提供的一项高级功能，允许用户创建能够执行复杂任务（例如推理、Web 自动化和多步骤问题解决）的高级 AI 代理。

例如，ChatGPT Operator 在这个视频中预订了计划票：

2、为什么需要ChatGPT Operator的开源平替？

虽然 ChatGPT Operator 功能强大，但它有几个限制，使得开源替代方案具有吸引力：

成本：200 美元/月的订阅费对许多用户来说可能过高。
数据隐私：使用专有 API 需要将数据发送到外部服务器，这可能不符合隐私政策或监管要求。
有限的定制：专有解决方案通常会限制微调或特定于任务的优化，从而限制了它们对专门用例的适应性。

通过选择 DeepSeek R1 和 Browser Use 等开源工具，可以克服这些挑战并获得多项好处：

节省成本：DeepSeek R1 和 Browser Use 都是完全免费和开源的，无需支付订阅费。
完全控制：在本地或自己的服务器上托管工具可确保完全的数据隐私和安全
可定制性：你可以针对特定任务微调模型，将其与其他工具集成，并修改系统以满足您的独特要求。

开源方法不仅可以减少对专有平台的依赖，还可以让你构建适合需求的解决方案，同时保持对成本和数据的控制。

3、关键组件

我们的开源替代方案使用的关键组件有两个：DeepSeek R1 和Computer Use。

3.1 DeepSeek R1

DeepSeek R1 是一款针对推理任务优化的开源 LLM。它在思路链问题解决、编码辅助和自然语言理解方面表现出色。它有多种大小可供选择（例如 1.5B、7B 参数），使其能够适应不同的硬件功能。

3.2 Browser Use

Browser Use是一种开源工具，可使 AI 代理执行基于浏览器的任务，例如网页抓取、表单填写和自动导航。它提供了用户友好的界面，可以与 DeepSeek R1 等 LLM 集成以增强功能。

4、设置运行环境

硬件要求：

对于较小版本的 DeepSeek R1（例如，1.5B 参数），CPU 或中档 GPU（8GB VRAM）就足够了。
较大版本需要高端 GPU（例如，NVIDIA A100 或 RTX 4090）。

操作系统：

建议使用 Linux 或 macOS 以方便设置。Windows 用户可以使用 WSL（适用于 Linux 的 Windows 子系统）。

Python 环境：

创建 Python 虚拟环境以隔离依赖项：

python -m venv venv
source venv/bin/activate  # On Linux/macOS
# On Windows:
# venv\Scripts\activate

安装所需的库：

pip install torch torchvision transformers sentencepiece

4、访问DeepSeek-R1

有两种方式可以访问DeepSeek-R1：使用云端API 或本地使用 Ollama 运行 DeepSeek-R1模型。

4.1 使用DeepSeek API

要与 DeepSeek API 交互，请按照以下更新步骤操作：

首先在 DeepSeek 平台上注册并从“API 密钥”部分生成 API 密钥。请安全保存此密钥，因为它不会再次显示。

然后进行第一次 API 调用。DeepSeek API 与 OpenAI 的 API 格式兼容，可轻松与现有的 OpenAI SDK 或软件集成。下面是 Python 实现的示例：

from openai import OpenAI

client = OpenAI(api_key="<Your_DeepSeek_API_Key>", base_url="https://api.deepseek.com")

response = client.chat.completions.create(
    model="deepseek-reasoner",  # Use 'deepseek-reasoner' for DeepSeek-R1
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum entanglement."}
    ],
    stream=False  # Set to True if you want streaming responses
)

print(response.choices[0].message.content)

如果你更喜欢使用 cURL，可以按照以下方式发出请求：

curl https://api.deepseek.com/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <Your_DeepSeek_API_Key>" \
-d '{
    "model": "deepseek-reasoner",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    "stream": false
}'

模型选择方法如下：

为 DeepSeek-R1 指定 model="deepseek-reasoner"。
对于通用聊天任务，使用 model="deepseek-chat"。

对于 OpenAI 兼容配置， base_url 也可以设置为 https://api.deepseek.com/v1，尽管 /v1 路径与模型版本无关。

4.2 使用 Ollama 在本地运行 DeepSeek-R1

Ollama 简化了在本地机器上运行大型语言模型（如 DeepSeek-R1）的过程。以下是正确设置和使用的方法。

首先从其官方网站下载并安装 Ollama。

然后使用以下命令下载 DeepSeek-R1 的特定版本：

# For the 7B model (default):
ollama pull deepseek-r1:7b

# For a smaller 1.5B model:
ollama pull deepseek-r1:1.5b

# For larger models like 70B:
ollama pull deepseek-r1:70b

下载后，使用以下命令运行模型：

ollama run deepseek-r1:7b

这将启动一个交互式会话，您可以在其中直接与模型交互。

模型变体

DeepSeek 提供基于 Qwen 和 Llama 架构的多个精简版本，针对不同用例进行了优化：

DeepSeek-R1-Distill-Qwen-7B：

ollama run deepseek-r1:7b-qwen-distill

DeepSeek-R1-Distill-Llama-70B：

ollama run deepseek-r1:70b-llama-distill

硬件注意事项：

较小的模型（如 1.5B 或 7B）可以在消费级 GPU 甚至 CPU 上运行。
较大的模型（例如 70B）需要具有大量 VRAM 的高端 GPU（例如 NVIDIA A100 或 RTX 4090）。

通过 API 进行交互式聊天

Ollama 提供了一个 API，用于将本地运行的模型集成到你的应用程序中：

curl http://localhost:11434/api/chat -d '{
    "model": "deepseek-r1:7b",
    "messages": [
        {"role": "user", "content": "Write a short poem about the stars."}
    ]
}'

5、安装 Browser Use

Browser Use 使你的 AI 代理能够与 Web 浏览器交互。请按照以下步骤操作。

首先从 GitHub 克隆浏览器使用存储库：

git clone https://github.com/browser-use/browser-use.git
cd browser-use
pip install -r requirements.txt

然后设置Browser Use使用 WebUI：

python webui.py

在浏览器中打开 WebUI 以配置代理设置。可以指定：

LLM 模型（例如 DeepSeek R1）
浏览器设置（例如窗口大小）

6、结合 DeepSeek R1 和Browser Use

要创建集成两种工具的功能性 AI 代理：

6.1 代理配置

修改Browser Use中的代理设置以将其与 DeepSeek R1 连接：

{
  "model": "deepseek-r1",
  "base_url": "http://localhost:5000",
  "browser_settings": {
    "window_height": 1080,
    "window_width": 1920,
    "keep_browser_open": true
  }
}

6.2 运行代理

启动 DeepSeek R1 和Browser Use：

# Start DeepSeek R1 API server
python -m deepseek.api_server

# Start Browser Use WebUI
python webui.py

一旦两个服务都运行，代理就可以执行填写表格、抓取等任务数据，或自主浏览网站。

7、提示工程

要优化 AI 代理的性能，请使用提示工程技术。例如：

7.1 通用提示模板

<instructions>
You are an AI assistant tasked with automating web tasks using Browser Use.
Follow these steps:
1. Navigate to [website].
2. Perform [specific task].
3. Return results in a structured format.
</instructions>
<example>
Navigate to https://example.com and extract all hyperlinks.
</example>

此结构可确保清晰度并提高任务执行准确性。

以下是一些可以通过运行尝试的演示：

uv pip install gradio

python examples/gradio_demo.py

7.2 示例1

提示：

Write a letter in Google Docs to my Papa, thanking him for everything, and save the document as a PDF.

7.3 示例 2

提示：

Find flights on kayak.com from Zurich to Beijing from 25.12.2024 to 02.02.2025.

7.4 示例 3

提示：

Read my CV & find ML jobs, save them to a file, and then start applying for them in new tabs, if you need help, ask me.'

8、结束语

通过将 DeepSeek R1 与浏览器使用相结合，你可以构建一个功能齐全的 ChatGPT Operator 替代方案，它是免费的、开源的并且高度可定制。此设置不仅节省了成本，还使能够完全控制数据隐私和系统行为。

无论你是自动化 Web 任务、构建对话代理，还是尝试使用检索增强生成等高级 AI 功能，本指南都提供了入门所需的一切。拥抱开源的力量，立即创建自己的智能助手！

原文链接：How to Use DeepSeek R1 to Build an Open Source ChatGPT Operator Alternative

汇智网翻译整理，转载请标明出处