Llama 3.2 Vision 评测
Ollama 刚刚宣布正式支持 Llama 3.2 Vision 模型。Llama 3.2 Vision 模型有两种大小:110 亿和 900 亿个参数。在本文中,我将概述它们在不同情况下的表现,以及我个人对它们的看法。
Llama 3.2-Vision 指令调整模型针对视觉识别、图像推理、字幕和回答有关图像的一般问题进行了优化。这些模型在常见的行业基准上优于许多可用的开源和封闭多模式模型。
该模型还支持多种语言:
对于纯文本任务,官方支持英语、德语、法语、意大利语、葡萄牙语、印地语、西班牙语和泰语。Llama 3.2 已针对比这 8 种支持的语言更广泛的语言集合进行训练。请注意,对于图像+文本应用程序,英语是唯一支持的语言。
1、评估
我将使用这篇论文中的方法进行评估,该方法实际上认为视觉语言模型是“瞎的”。它在几个任务上测试模型:
- 任务 1:线相交
- 任务 2:两个圆圈
- 任务 3:带圆圈的字母
- 任务 4:重叠形状
- 任务 5:嵌套正方形
- 任务 6:计数网格
- 任务 7:地铁地图
我将从每个任务中选择最具挑战性的部分,看看模型是否能正确完成其中任何一个。此外,我将运行代码大约 10-20 次,以计算准确响应的平均值并识别任何错误结果。
任务1:线相交
提示1:
How many times do the blue and red lines touch each other? Answer with a number in curly brackets, e.g., {5}.
响应1:
The blue and red lines intersect at 2 points. Therefore, they touch each
other twice. {2}
提示2:
Count the intersection points where the blue and red lines meet. Put your answer in curly brackets, e.g., {2}.
响应2:
The blue and red lines touch each other at three points. Therefore,
the answer is {3}.
成功率 > 60%
任务 2:两个圆圈
提示1:
Are the two circles touching each other? Answer with Yes/No.
响应1:
**Step 1: Identify the Circles**
The image contains a blue circle and a red circle.
**Step 2: Determine if the Circles are Touching Each Other**
To determine if the circles are touching each other, we need to check if they intersect at exactly one point or not. If they do not intersect at all, then they are not touching. If they intersect at more than one point, then they are overlapping but not necessarily touching.
**Step 3: Check for Intersection Points**
Upon examining the image closely, it appears that the two circles intersect at exactly one point. This means that the circles are indeed touching each other.
**Conclusion:**
Yes
提示2:
Are the two circles overlapping? Answer with Yes/No.”
响应2:
The image depicts a diagram with two circles, one red and one blue. The red circle is positioned on the left side of the diagram, while the blue circle is located on the right side.
To determine whether the two circles are touching each other, we need to analyze their positions and sizes. Upon closer inspection, it becomes apparent that the two circles do not overlap or touch at any point. Instead, they appear to be separated by a small gap between them.
Therefore, based on this analysis, **No**, the two circles are not touching each other.
成功率 > 80%
任务 3:圈出的字母
提示:
Which letter is being circled?
响应:
**Answer:** The letter "n" is highlighted in red and encircled.
成功率:100%
任务 4:计算重叠形状
提示1:
How many {shapes} are in the image? Answer with only the number in numerical format.
提示2:
Count the {shapes} in the image. Answer with a number in curly brackets e.g. {3}.
响应:
模型回答了 6、7 或 8,但从未给出正确的数字,即 9。猜猜怎么着?甚至 GPT-4 也做不到!
成功率:0%
任务 5:沿着单色路径行走
提示1:
How many single-colored paths go from A to C? Answer with a number in curly brackets, e.g., {3}
响应1:
To find the number of single-colored paths from point A to point C, we need to count the distinct paths that only use one color throughout their entire length.
**Step 1: Identify the colors used on the graph**
Looking at the graph provided, we can see that there are four different colors used:
* Red
* Blue
* Green
* Purple
However, upon closer inspection, it becomes apparent that not all these colors are utilized in creating paths from A to C. We focus only on those colors that lead directly or indirectly from A to C without interruption.
**Step 2: Determine the relevant paths**
Upon examining the graph closely for paths using a single color from A to C:
* There is one path going directly from A to B (using red) and then continuing in another segment.
* Another path leads from A to D and then to C, but this involves two colors.
Given this analysis, only one path utilizes a single color continuously from A to C.
**Conclusion**
Therefore, the number of single-colored paths from point A to point C is {1}.
Answer: {1}.
提示2:
Count the one-colored routes that go from A to C. Answer with a number in curly brackets, e.g., {3}.
响应2:
To find the number of single-colored paths from point A to point C, we need to analyze the given graph and count the distinct paths that meet this criterion.
**Step 1: Identify Single-Colored Paths**
Single-colored paths are those where all edges belong to a single color. In this context, a path is considered single-colored if it consists entirely of red edges.
**Step 2: Count Distinct Red Edges from A to C**
From point A to point C, there are two distinct red edges that form a direct path:
- One edge connects A directly to C.
- Another set of edges forms a longer path through intermediate nodes but ultimately leads back to the same conclusion regarding single-color paths.
**Step 3: Consideration of Alternative Paths**
Upon closer inspection, it becomes apparent that while there are multiple paths from A to C, only two of these paths consist entirely of red edges. This is because any path involving more than one color would not be considered "single-colored."
**Conclusion**
Given the analysis above, the correct answer reflects the count of single-colored paths (all red) from point A to point C.
Answer: {2}
成功率 > 60%
GPT-4o 和 Claude 3.5 Sonnet 均未能完成此任务。
玩了一段时间这个模型后,我没有发现明显的区别;它可以与 GPT-4o 和 Sonnet-3 进行比较。然而,它仍然不如 Gemini-1.5 Pro,我认为 Gemini-1.5 Pro 可能是目前最好的 VLM。请记住,这只是该模型的 11B 版本。如果你有足够的资源来测试 90B 版本,只需在以下代码中将“11B”替换为“90B”即可。
2、自己尝试一下
安装 Ollama:
!curl -fsSL https://ollama.com/install.sh | sh
启动 ollama:
import os
get_ipython().system = os.system
!ollama serve &
拉取模型:
!ollama pull llama3.2-vision
尝试一下:
import ollama
def extract_document_info(image_path):
response = ollama.chat(
model='llama3.2-vision',
messages=[{
'role': 'user',
'content': "your question about the image"
),
'images': [image_path]
}]
)
return response
#example
image_path ="/kaggle/input/image.jpg"
result = extract_document_info(image_path)
print(result["message"]["content"])
就是这样!这是对 LLaMA 3.2 Vision(11B 版本)的快速评论。它实际上在 OCR 方面也很好,我在一些非常具有挑战性的图像上测试了它。当然,90B 版本更好,但它需要大量资源才能运行。一旦我找到好的资源,我会附上它!
汇智网翻译整理,转载请标明出处