update zero shot detection writeup

dnth · dnth · commit 88be45a32aec · 2023-10-19T15:21:16.000+08:00
diff --git a/examples/enrichment-ram-groundingdino-sam.ipynb b/examples/enrichment-ram-groundingdino-sam.ipynb
@@ -452,7 +452,7 @@
     }
    ],
    "source": [
-    "NUM_ROWS_TO_ENRICH = 20\n",
+    "NUM_ROWS_TO_ENRICH = 20 # For demonstration, only run on 20 rows. \n",
     "\n",
     "df = fd.enrich(task='zero-shot-classification',\n",
     "               model='recognize-anything-model', \n",
@@ -769,7 +769,10 @@
    "id": "000a9c1d-d499-4200-86d4-16e14eceb679",
    "metadata": {},
    "source": [
-    "## Zero-Shot Detection with Grounding DINO"
+    "## Zero-Shot Detection with Grounding DINO\n",
+    "Apart from classification models, fastdup also supports zero-shot detection models like [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO) (and more to come).\n",
+    "\n",
+    "Grounding DINO is a powerful open-set zero-shot detection model. It accepts image-text pair as inputs and outputs a bounding box."
    ]
   },
   {
@@ -812,7 +815,11 @@
    "id": "94a74bdb-78d0-4b95-8f2d-3fd2d2f18e20",
    "metadata": {},
    "source": [
-    "You'll have to import the module and provide it with an image and text prompt. Text prompts must be separated with `\" . \"`."
+    "You'll have to import the module and provide it with an image-text input pair. \n",
+    "\n",
+    "Note: Text prompts must be separated with `\" . \"`.\n",
+    "\n",
+    "By default fastdup uses the smaller variant of Grounding DINO (Swin-T backbone)."
    ]
   },
   {
@@ -902,6 +909,14 @@
     "results"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "e639eb37-8716-4563-80c0-cc269147f440",
+   "metadata": {},
+   "source": [
+    "Let's plot the image and results using the `annotate_image` convenience function."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 17,
@@ -931,7 +946,9 @@
    "id": "55d02a9a-32c1-430d-8ca8-ca10ac4b8712",
    "metadata": {},
    "source": [
-    "Load another SwinB variant of Grounding DINO. Weights and config can be downloaded from the [official Grounding DINO repo](https://github.com/IDEA-Research/GroundingDINO)."
+    "You can optionally load another variant of Grounding DINO (Swin-B backbone) from the [official Grounding DINO repo](https://github.com/IDEA-Research/GroundingDINO).\n",
+    "\n",
+    "Download the [weights](https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swinb_cogcoor.pth) and [config](https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinB_cfg.py) into your local directory and pass them as arguments to the `GroundingDINO` contructor. "
    ]
   },
   {
@@ -982,6 +999,16 @@
     "                    text_threshold=0.25)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3c8dc27f-e5fc-4285-bc44-11334bfac2a2",
+   "metadata": {},
+   "source": [
+    "Fine tune the detection output by varying the `box_threshold` and `text_threshold` values.\n",
+    "\n",
+    "The outputs are stored in a Python `dict`. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 19,
@@ -1068,7 +1095,9 @@
    "id": "9fb0bdfd-cc1a-420c-bb16-c4e13b75bcda",
    "metadata": {},
    "source": [
-    "Outputs the columns grounding_dino bboxes, scores and labels."
+    "To run the enrichment on a DataFrame, use the `.enrich` method and specify `model=grounding-dino`. By default fastdup loads the smaller variant (Swin-T) backbone for enrichment. \n",
+    "\n",
+    "Also specify the DataFrame to run the enrichment on and the name of the column as the input to the Grounding DINO model. In this example, we take the text prompt from the `ram_tags` column which we have computed earlier."
    ]
   },
   {
@@ -1134,6 +1163,14 @@
     "df = fd.enrich(task='zero-shot-detection', model='grounding-dino', input_df=df, input_col='ram_tags', device=\"cuda\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "59c8e8d0-1c00-403b-84d9-226458b9268a",
+   "metadata": {},
+   "source": [
+    "Once, done you'll notice that 3 new columns are appened into the DataFrame namely - `grounding_dino_bboxes`, `grounding_dino_scores`, and `grounding_dino_labels`. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 22,
@@ -1327,6 +1364,14 @@
     "df"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "abada383-b3a2-42ee-8f9d-df6c27e46fb5",
+   "metadata": {},
+   "source": [
+    "Now let's plot the results of the enrichment using the `plot_annotations` function."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 23,
@@ -1356,7 +1401,9 @@
    "id": "8c114568-abeb-4159-bbdb-22ee1d1f4b3e",
    "metadata": {},
    "source": [
-    "### Custom Text Prompt"
+    "### Searching for Specific Objects with Custom Text Prompt\n",
+    "\n",
+    "Let's suppose you'd like to search for specific objects in your dataset, you can create a column in the DataFrame specifying the objects of interest and run the `.enrich` method."
    ]
   },
   {
@@ -1577,6 +1624,14 @@
     "df_custom_prompt"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "6ebb9a5a-4e65-4c60-83ef-daae93209e5a",
+   "metadata": {},
+   "source": [
+    "Note that we specify `input_col='custom_prompt'` so that the model uses the text from the 'custom_prompt' column."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 26,
@@ -1825,7 +1880,7 @@
    "id": "7a979b19-eaef-422b-944b-0285115e24d6",
    "metadata": {},
    "source": [
-    "Remove rows with empty detection."
+    "Not all images contain \"face\", \"eye\" and \"hair\", let's remove the columns with no detections and plot the colums with detections."
    ]
   },
   {