DeepSeek-R1驱动的金融分析师

我们将专注于创建一个专门用于提取相关新闻见解的代理。该代理将利用 DeepSeek-R1 提供全面的市场洞察。

DeepSeek-R1驱动的金融分析师

在当今快节奏的金融市场中,获取准确及时的信息对于做出明智的投资决策至关重要。想象一下,一位人工智能金融分析师能够分析股票数据、提取相关新闻见解并综合可行的建议——所有这些都是实时的。

在上一节中,我们构建了一个能够分析股票数据的金融分析师。现在,我们将专注于创建一个专门用于提取相关新闻见解的代理。该代理将利用 Yahoo Finance 来获取与股票相关的新闻,利用 MarkItDown 进行网页抓取,利用 DeepSeek-R1 进行 llm 学习,利用 LangChain 来构建应用程序。通过利用这些工具,我们旨在开发一种简化的工作流程,提供全面的市场洞察,使投资者能够做出明智的决策。

这个新闻洞察代理将实现:

  • 获取新闻 URL:Yahoo Finance 收集所提供股票的相关新闻 URL。
  • 提取新闻内容:利用 Microsoft 的 MarkItDown 从获取的 URL 中提取所有文本。
  • 分析和总结:利用 DeepSeek R1 对提取的内容进行深入的推理和分析。

此项目中使用的所有工具和库都是开源的,可免费使用。它们不需要 API 密钥,确保开发人员和爱好者的无缝设置:

  • Yahoo Finance:用于获取新闻 URL 和收集与股票相关的数据。
  • MarkItDown:一种用于从新闻 URL 中提取文本内容的 Microsoft 工具。
  • LangChain:一个由语言模型驱动的应用程序构建框架。
  • DeepSeek-R1:一种以智能推理和结构化财务洞察而闻名的开源 AI 模型。在这个项目中,我们将使用 15 亿参数版本——一个紧凑但功能强大的模型,非常适合具有 8GB RAM 且没有 NVIDIA 显卡的系统。
  • Python 库:包括 pandas、dotenv 和 datetime,用于数据处理和环境设置。

本教程的相关代码可以从github获得。

1、设置环境

首先安装所需的库:

pip install -U langgraph langchain langchain-ollama pandas python-dotenv yfinance markitdown

2、获取股票新闻

此步骤涉及使用 Yahoo Finance 检索与特定股票相关的最新新闻文章。

get_news(stock) 函数使用 Yahoo Finance 检索与指定股票相关的新闻文章。它过滤新闻项目以仅包含归类为故事( contentType='STORY')的新闻项目,并提取标题、摘要、URL 和发布日期等关键详细信息。

import yfinance as yf
import pandas as pd

def get_news(stock: str) -> list:
    """
    Fetch relevant news articles for a given stock ticker.

    Parameters:
    - stock (str): The stock ticker symbol.

    Returns:
    - list: A list of dictionaries containing title, summary, URL, and publication date of relevant news articles.
    """
    try:
        # Fetch the ticker object and retrieve its news
        ticker = yf.Ticker(stock)
        news = ticker.news

        if not news:
            print(f"No news found for {stock}.")
            return []

        # Filter news with contentType='STORY'
        relevant_news = [
            item for item in news if item.get('content', {}).get('contentType') == 'STORY'
        ]

        all_news = []
        for i, item in enumerate(relevant_news):
            try:
                content = item.get('content', {})
                current_news = {
                    'title': content.get('title'),
                    'summary': content.get('summary'),
                    'url': content.get('canonicalUrl', {}).get('url'),
                    'pubdate': content.get('pubDate', '').split('T')[0],
                }
                all_news.append(current_news)
            except Exception as e:
                print(f"Error processing news {i}: {e}")
                continue

        return all_news

    except Exception as e:
        print(f"An error occurred while fetching news for {stock}: {e}")
        return None


# news = get_news('SOFI')
# news[0]
# {'title': "This Cathie Wood Fintech Stock Just Hit a New 52-Week High -- but I'm Not Selling a Single Share",
# 'summary': "Cathie Wood's ARK Invest offers several popular exchange-traded funds (ETFs), and they tend to be rather concentrated, with all of them holding three dozen or fewer stocks.  The banking innovator is the sixth-largest holding in the ARK Fintech Innovation ETF (NYSEMKT: ARKF), making up 5% of the fund's total assets.  You'll also find about $95 million worth of SoFi stock in the flagship ARK Innovation ETF (NYSEMKT: ARKK), and it's also worth noting that the SoFi app is the exclusive distribution partner for the ARK Venture Fund (NASDAQMUTFUND: ARKVX), which allows investors to get exposure to companies like SpaceX and OpenAI before their initial public offering.",
# 'url': 'https://www.fool.com/investing/2025/01/25/this-cathie-wood-fintech-stock-just-hit-a-new-52-w/?source=eptyholnk0000202&utm_source=yahoo-host-full&utm_medium=feed&utm_campaign=article&referring_guid=8eb186e2-b253-418b-b12f-4c58f97e79b2',
# 'pubdate': '2025-01-25'}

3、提取干净的新闻内容

一旦我们从 Yahoo Finance 获得了相关的新闻 URL,下一步就是从这些链接中提取干净且可读的文本内容。原始网页内容通常包含不必要的元素,例如链接、特殊字符和格式这会阻碍分析。在此步骤中,我们使用 MarkItDown,这是一个强大的从网页中提取文本的库,并清理输出以提高可用性。

from markitdown import MarkItDown
import requests
import re

# Create a session for reliable requests
session = requests.Session()
session.headers.update({'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'})

# Initialize MarkItDown
md = MarkItDown(requests_session=session)

# Function to clean unnecessary links and special characters
def remove_links(text):
    text = re.sub(r'http\S+', '', text)  # Remove URLs
    text = re.sub(r'\[.*?\]', '', text)  # Remove markdown-style links
    text = re.sub(r'[#*()+\-\n]', '', text)  # Remove special characters
    text = re.sub(r'/\S*', '', text)  # Remove slashes
    text = re.sub(r'  ', '', text)  # Remove double spaces
    return text

# Function to extract news content from a URL
def extract_news(link):
    # Use MarkItDown to extract the content
    information_to_extract = md.convert(link)
    text_title = information_to_extract.title.strip()  # Extract title
    text_content = information_to_extract.text_content.strip()  # Extract main content
    
    # Clean and combine the title and content
    return text_title + '\n' + remove_links(text_content)


# extract_news(news[1]['url'])

# 'This Cathie Wood Fintech Stock Just Hit a New 52-Week High --......

4、提取完整新闻文章

在此步骤中,我们基于之前的功能,集成了一种方法,用于获取和提取给定股票的完整新闻文章。标题或摘要通常缺乏有意义的分析所需的深度。通过提取新闻文章的全部内容,我们可以提供更丰富的见解,发现更多信息,并为财务决策提供详细的理由。

def extract_full_news(stock: str) -> list:
    """
    Fetch full news articles.

    Parameters:
    - stock (str): The stock ticker symbol.

    Returns:
    - list: A list of dictionaries containing full_news of relevant news articles.
    """
    # Step 1: Fetch news using the get_news function
    news = get_news(stock)
    
    # Step 2: Iterate through each news article
    for i, item in enumerate(news):
        try:
            # Step 3: Extract the full news content using the URL
            full_news = extract_news(item['url'])
            item['full_news'] = full_news
        except Exception as e:
            # Step 4: Handle errors gracefully
            print(f"Error extracting news {i}: {e}")
            continue

    # Step 5: Return the list of enriched news articles
    return news

5、执行情绪分析并生成投资建议

在前面的步骤中,我们丰富了完整的新闻文章,在此步骤中,我们利用 LangChain 和 DeepSeek-R1 分析提取的新闻文章。这涉及评估每篇文章的情绪并生成包含投资建议的综合摘要。

from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

# Step 1: Initialize the LLM with DeepSeek-R1 model
llm = OllamaLLM(model="deepseek-r1:1.5b") # can downlaod bigger model
# https://ollama.com/library/deepseek-r1:1.5b

# Step 2: Define the prompt template
PROMPT = """
You are an expert financial analyst. I will provide you with a list of news articles related to a specific stock. Your tasks are as follows:

1. **Sentiment Analysis:**
   - For each news article, evaluate its sentiment as 'Positive', 'Negative', or 'Neutral'.
   - Present your evaluation in a dictionary format where each key is the article's title, and the corresponding value is the assessed sentiment.

2. **Comprehensive Summary and Investment Recommendation:**
   - After analyzing all the articles, provide a concise summary that encapsulates the overall sentiment and key points from the news.
   - Based on this summary, advise whether investing in the stock is advisable at this time, supporting your recommendation with reasons derived from the news analysis.

**News Articles:**  

{articles}

**Output Format:**  

1. **Sentiment Analysis Dictionary:**  

   ```json
   {{
       "Article Title 1": "Positive",
       "Article Title 2": "Negative",
       "Article Title 3": "Neutral",
       ...
   }}
2. Summary: [Your summary here]
3. Investment Recommendation: [Your recommendation here]
"""

# Step 3: Create a ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_messages(
    [
        ('system', PROMPT),
        ('human', "I would like to analyze the news articles related to the stock {stock}.")
    ]
)

# Step 4: Create a pipeline with LangGraph for orchestration
structure = prompt_template | llm

# Step 5: Invoke the pipeline with the news articles and stock data
result = structure.invoke(
    {
        "stock": stock,
        "articles": [fn['full_news'] for fn in full_news]  # Pass the full news content

pprint(result)

输出如下:

Thinking 

* Alright, so I need to analyze some news articles about the stock SOFI.
  First, let me look at each article provided and try to extract relevant
  information.

* Starting with Mike Bailey's CNBC interview from January 23, 2025. He
  mentioned that the macroeconomic conditions in the U.S., especially job
  growth beyond 2025, are positive. So, I can note that he's confident large
  caps outperform small caps for long-term growth. That seems like a key
  point.

* Looking at SOFI's performance: they had a 12.5% return over 30 days from
  $15.63 to $17.59. Their revenue grew by 30%, mainly due to two segments
  increasing their market share. The customer base is over 10 million, and
  they've raised funding of $73 billion in loans.

* Next, I should check if there's any conflict between Bailey's optimism and
  SOFI's positive stock performance. His confidence in large caps being better
  positioned for earnings growth might suggest that while he thinks large caps
  are great for earning money, the real challenge is whether those large caps
  can deliver on their promises within the same timeframe as his
  expectations.

* I also need to consider if there are any conflicting viewpoints or if one
  article supports the other. It seems both points align:Bailey's confidence
  in large caps and SOFI's positive performance support that perspective.

* So, putting it all together, I can create a sentiment analysis dictionary 
  reflecting this alignment. I'll summarize the key points about his optimism 
  and the stock's positive performance, then recommend an investment decision
  based on these supporting factors.
Summary
   Mike Bailey, FBB Capital Partners' director of research, expressed
   confidence that large-cap stocks outperform small caps for long-term '
   earnings growth. This sentiment aligns with SOFI's positive performance, "
   which rose 12.5% over three months and generated revenue growth driven by '
   two financial services segments.

Investment Recommendation:
  Invest in SOFI Technologies (NASDAQ:SOFI) due to its supportive
  macroeconomic outlook and solid performance as a fintech company. The
  confidence in large caps aligning with SOFI's positive returns reinforces
  the potential for high returns within the investment timeframe.

6、结束语

在本节中,我们成功构建了一个完整的工作流,可以获取新闻文章、执行情绪分析、生成摘要和提供投资建议。该系统基于实时新闻数据和见解,提供智能的股票分析方法,帮助投资者做出更明智的决策。

DeepSeek-R1 的集成显著增强了我们金融分析师的推理和分析能力。通过利用 DeepSeek-R1,我们的金融分析师可以处理和解释复杂的财务数据,提供更准确、更有见地的分析。这一进步为投资者提供了一种工具,不仅可以自动化数据处理,还可以提供复杂的推理来为投资决策提供信息。

在 GitHub 上查看完整的项目。分叉它,进行实验,让我知道你的想法!


原文链接:Building an Agentic Financial Analyst with DeepSeek-R1 — Part II

汇智网翻译整理,转载请标明出处