DSPy.Image:视觉模型支持

DSPy 最近在测试版中增加了对 VLM 的支持。本文介绍使用 DSPy 从图像中提取属性。对于此示例,我们将了解如何从网站屏幕截图中提取有用的属性

1、定义签名

定义DSPy签名。注意 dspy.Image 输入字段:

import dspy
class WebsiteDataExtractionSignature(dspy.Signature):
    """Website data extraction"""
    website_screenshot: dspy.Image = dspy.InputField(
        desc="A screenshot of the website"
    )
    hero_text: str = dspy.OutputField(
        desc="The hero text of the website"
    )
    website_description: str = dspy.OutputField(
        desc="A description of the website"
    )
    call_to_action: str = dspy.OutputField(
        desc="The call to action of the website"
    )
    color_palette: list[str] = dspy.OutputField(
        desc="The color palette of the website"
    )
    font_palette: list[str] = dspy.OutputField(
        desc="The font palette of the website"
    )

2、定义模块

接下来使用 ChainOfThought 优化器和上一步中的签名定义一个简单的程序:

class WebsiteDataExtraction(dspy.Module):
    """Module for extracting structured data from website screenshots."""
    def __init__(self):
        self.website_data_extraction = dspy.ChainOfThought(
            WebsiteDataExtractionSignature
        )
        
    # pylint: disable=missing-function-docstring
    def forward(self, website_screenshot: str):
        website_data = self.website_data_extraction(website_screenshot)
        return website_data

3、最终代码

最后,编写一个函数来读取图像并通过调用上一步中的程序来提取属性:

def extract_website_data(website_screenshot_path: str):
    """Extract data from a website screenshot.
    
    Args:
        website_screenshot_path (str): Path to the website screenshot image
    
    Returns:
        dict: Extracted website data
    """
    # Load the image
    with open(website_screenshot_path, "rb") as image_file:
        base64_data = base64.b64encode(image_file.read()).decode('utf-8').replace('\n', '')
        image_data_uri = f"data:image/png;base64,{base64_data}"
    website_data_extraction = WebsiteDataExtraction()
    website_data = website_data_extraction(image_data_uri)
    return website_data

if __name__ == "__main__":
    dspy_lm = dspy.LM(model="openai/gpt-4o-mini")
    dspy.config( lm=dspy_lm)
    result = extract_website_data(
        "src/vision_lm/data/langtrace-screenshot.png"
    )
    print(result)

4、可观察性

就是这样!如果您的开发需要可观察性,只需添加 langtrace.init() 即可从跟踪中获得更深入的见解。

5、源代码

你可以在此处找到此示例的完整源代码


原文链接:Attribute Extraction from Images using DSPy

汇智网翻译整理,转载请标明出处