wrap up

dnth · dnth · commit 703d3d915ce0 · 2023-10-19T16:27:24.000+08:00
diff --git a/examples/enrichment-ram-groundingdino-sam.ipynb b/examples/enrichment-ram-groundingdino-sam.ipynb
@@ -280,7 +280,10 @@
   {
    "cell_type": "markdown",
    "id": "fdaad7cd-5835-4ae5-9fa1-f8ba3d4e10c4",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
    "source": [
     "### Inference on a single image\n",
     "We can use these models in fastdup in a few lines of code.\n",
@@ -427,7 +430,10 @@
   {
    "cell_type": "markdown",
    "id": "aaddc03a-d330-4bb3-ac4e-a3e8730282ad",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
    "source": [
     "### Inference on a DataFrame of images\n",
     "\n",
@@ -778,7 +784,10 @@
   {
    "cell_type": "markdown",
    "id": "41feec30-7b9d-4b94-94c4-dca8cef14466",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
    "source": [
     "### Inference on single image\n",
     "fastdup provides an easy way to load the Grounding DINO model and run an inference.\n",
@@ -1093,8 +1102,13 @@
   {
    "cell_type": "markdown",
    "id": "9fb0bdfd-cc1a-420c-bb16-c4e13b75bcda",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
    "source": [
+    "### Inference on a DataFrame of images\n",
+    "\n",
     "To run the enrichment on a DataFrame, use the `.enrich` method and specify `model=grounding-dino`. By default fastdup loads the smaller variant (Swin-T) backbone for enrichment. \n",
     "\n",
     "Also specify the DataFrame to run the enrichment on and the name of the column as the input to the Grounding DINO model. In this example, we take the text prompt from the `ram_tags` column which we have computed earlier."
@@ -1399,7 +1413,10 @@
   {
    "cell_type": "markdown",
    "id": "8c114568-abeb-4159-bbdb-22ee1d1f4b3e",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
    "source": [
     "### Searching for Specific Objects with Custom Text Prompt\n",
     "\n",
@@ -1921,17 +1938,28 @@
   {
    "cell_type": "markdown",
    "id": "79a1d14c-4075-424f-abda-b640c3630bd9",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "source": [
-    "## Zero-Shot Segmentation with SAM"
+    "## Zero-Shot Segmentation with SAM\n",
+    "\n",
+    "In addition to the zer-shot classification and detection modes, fastdup also supports zero-shot segmentation using the [Segment Anything Model (SAM)](https://github.com/facebookresearch/segment-anything) from MetaAI.\n",
+    "\n",
+    "SAM produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image."
    ]
   },
   {
    "cell_type": "markdown",
    "id": "dc58c743-d8e3-45b8-ae32-7cfc6474afd1",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
    "source": [
-    "For single image and single bounding box."
+    "### Inference on a single image\n",
+    "\n",
+    "To run an inference using the SAM model, import the `SegmentAnythingModel` class and provide an image-bounding box pair as the input."
    ]
   },
   {
@@ -1958,12 +1986,24 @@
     "result = model.run_inference(image_path=\"coco_minitrain_25k/images/val2017/000000449996.jpg\", bboxes=torch.tensor((1.47, 1.45, 638.46, 241.37)))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "9072cc27-98ab-486e-b559-a02f2be0a64c",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "The result is a list of binary masks. "
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "249b9531-77ca-47a9-a94a-304c85a192dc",
    "metadata": {},
    "source": [
-    "Load other variants of SAM. Checkpoint can be downloaded from the [official SAM repo](https://github.com/facebookresearch/segment-anything)."
+    "You can also load other variants of SAM from the [official SAM repo](https://github.com/facebookresearch/segment-anything) or even your own custom model.\n",
+    "\n",
+    "To do so, download the `sam_vit_b` [weights](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth) and the `sam_vit_l` [weights](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth) from into your local folder and load them into the constructor as follows."
    ]
   },
   {
@@ -2011,9 +2051,15 @@
   {
    "cell_type": "markdown",
    "id": "d137da5d-ac81-4af1-9b5a-1b4d7b79464d",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "source": [
-    "For multiple images and multiple bounding boxes in a DataFrame."
+    "### Inference on a DataFrame of images\n",
+    "\n",
+    "Similar to all previous examples, you can use the `enrich` method to add masks to your DataFrame of images.\n",
+    "\n",
+    "In the following code snippet, we load the SAM model and specify `input_col='grounding_dino_bboxes'` to allow SAM to use the bounding boxes as inputs."
    ]
   },
   {
@@ -2036,6 +2082,14 @@
     "df = fd.enrich(task='zero-shot-segmentation', model='segment-anything', input_df=df, input_col='grounding_dino_bboxes')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "16cbf4be-29c0-4854-a8e0-025f43d55ec0",
+   "metadata": {},
+   "source": [
+    "Next, drop rows in the DataFrame without masks for the purpose of visualization."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 33,
@@ -2048,6 +2102,14 @@
     "df.dropna(subset=['sam_masks'], inplace=True)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "130ba407-ae28-417d-993c-10efd008b431",
+   "metadata": {},
+   "source": [
+    "Plot the images with bounding boxes and masks."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 34,
@@ -2076,7 +2138,9 @@
    "id": "0f4b137a-25ea-44ab-b44e-ed734428de86",
    "metadata": {},
    "source": [
-    "## Convert Annotations to COCO Format"
+    "## Convert Annotations to COCO Format\n",
+    "\n",
+    "Once the enrichment is complete, you can also conveniently export the DataFrame into the COCO .json annotation format. For now, only the bounding boxes and labels are exported. Masks will be added in a future release."
    ]
   },
   {
@@ -2098,8 +2162,7 @@
    "metadata": {},
    "source": [
     "## Run fastdup\n",
-    "\n",
-    "Now let's load the embeddings into fastdup and run an analysis to surface dataset issues."
+    "You can optionally analyze the exported annotations in fastdup to evalute the quality of the annotations."
    ]
   },
   {
@@ -4713,7 +4776,7 @@
    "metadata": {},
    "source": [
     "## Wrap Up\n",
-    "In this tutorial, we showed how you can compute embeddings on your dataset using TIMM and run fastdup on top of it to surface dataset issues.\n",
+    "In this tutorial, we showed how you can run zero-shot models to enrich your dataset.\n",
     "\n",
     "Questions about this tutorial? Reach out to us on our [Slack channel](https://visuallayer.slack.com/)!\n",
     "\n",