Segment Anything (SA) project: a new task, model, and dataset for image segmentation.
we built the largest segmentation dataset to date (by far:迄今为止), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.The Segment Anything Model (SAM) and corresponding dataset (SA-1B) releasing at SA to foster research into foundation models for computer vision.
Large language models pre-trained on web-scale datasets are revolutionizing NLP (彻底改变)with strong zero-shot and few-shot generalization. These “foundation models” can generalize to tasks and data distributions beyond those seen during training. (zero-shot and few-shot generalization零样本和少样本泛化)
Foundation models have also been explored in computer vision ,albeit to a lesser extent. (尽管程度较小)
Our goal is to build a foundation model for image segmentation. That is, we seek to develop a promptable model and pre-train it on a broad dataset using a task that enables powerful generalization. With this model, we aim to solve a range of downstream segmentation problems on new data distributions using prompt engineering.
The success of this plan hinges on(取决于) three components: task, model, and data. To develop them, we address the following questions about image segmentation:
1. What task will enable zero-shot generalization?
2. What is the corresponding model architecture?
3. What data can power this task and model?
These questions are entangled and require a comprehen- sive solution.(错综复杂需要一个综合的解决方案。)
Surprisingly, we find that a simple design satisfies all three constraints: a powerful image encoder computes an image embedding, a prompt encoder embeds prompts, and then the two information sources are combined in a lightweight mask decoder that predicts segmentation masks. We refer to this model as the Segment Anything Model, or SAM .
data engine has three stages:
2. Segment Anything Task