|
62 | 62 | "\n",
|
63 | 63 | "First, let's install the neccessary packages:\n",
|
64 | 64 | "\n",
|
65 |
| - "- fastdup - To analyze issues in the dataset.\n", |
| 65 | + "- [fastdup](https://github.com/visual-layer/fastdup) - To analyze issues in the dataset.\n", |
66 | 66 | "- [TIMM (PyTorch Image Models)](https://github.com/huggingface/pytorch-image-models) - To acquire pre-trained models."
|
67 | 67 | ]
|
68 | 68 | },
|
|
130 | 130 | "metadata": {},
|
131 | 131 | "source": [
|
132 | 132 | "## List TIMM Models\n",
|
133 |
| - "There are over a thousand models on TIMM. Let's list down models that match the keyword `dino`." |
| 133 | + "There are currently 1212 computer vision models on TIMM. Pick a model of your choice to compute the embedding with.\n", |
| 134 | + "\n", |
| 135 | + "Now, pick a model of your choice. For demonstration, we will go with a relatively new model `vit_small_patch14_dinov2.lvd142m` from MetaAI. \n", |
| 136 | + "\n", |
| 137 | + "Let's list down models that match the keyword `dino`." |
134 | 138 | ]
|
135 | 139 | },
|
136 | 140 | {
|
|
171 | 175 | "id": "633dce0c-47eb-4039-8cd4-a36874c49b8a",
|
172 | 176 | "metadata": {},
|
173 | 177 | "source": [
|
174 |
| - "Now, pick a model of your choice. For demonstration, we will go with a relatively new model `vit_small_patch14_dinov2.lvd142m` from MetaAI. \n", |
175 |
| - "\n", |
176 | 178 | "DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. Read more about DINOv2 [here](https://github.com/facebookresearch/dinov2).\n",
|
177 | 179 | "\n",
|
178 | 180 | "It makes sense for us to use DINOv2 as a model to create an embedding of the dataset."
|
|
288 | 290 | "source": [
|
289 | 291 | "## Run fastdup\n",
|
290 | 292 | "\n",
|
291 |
| - "Now what's left is to load the embeddings into fastdup and run an analysis to surface dataset issues." |
| 293 | + "Now let's load the embeddings into fastdup and run an analysis to surface dataset issues." |
292 | 294 | ]
|
293 | 295 | },
|
294 | 296 | {
|
|
2467 | 2469 | "metadata": {},
|
2468 | 2470 | "source": [
|
2469 | 2471 | "## Wrap Up\n",
|
2470 |
| - "In this tutorial, we showed how you can run fastdup using pre-computed feature vectors. Running over pre-computed feature vectors significantly reduces run time compared to running over raw image files.\n", |
| 2472 | + "In this tutorial, we showed how you can compute embeddings on your dataset using TIMM and run fastdup on top of it to surface dataset issues.\n", |
| 2473 | + "\n", |
| 2474 | + "Questions about this tutorial? Reach out to us on our [Slack channel](https://visuallayer.slack.com/)!\n", |
| 2475 | + "\n", |
| 2476 | + "\n", |
2471 | 2477 | "\n",
|
2472 | 2478 | "Next, feel free to check out other tutorials -\n",
|
2473 | 2479 | "\n",
|
|
0 commit comments