diff --git a/learn/search/image/image-retrieval-ebook/bag-of-visual-words/bag-of-visual-words.ipynb b/learn/search/image/image-retrieval-ebook/bag-of-visual-words/bag-of-visual-words.ipynb
index 738bb4cc..5aff8ecd 100644
--- a/learn/search/image/image-retrieval-ebook/bag-of-visual-words/bag-of-visual-words.ipynb
+++ b/learn/search/image/image-retrieval-ebook/bag-of-visual-words/bag-of-visual-words.ipynb
@@ -23,11 +23,11 @@
"\n",
"The model derives from **bag of words** in natural language processing (NLP). Which is where a chunk of text is split into words or sub-words and those components are collated into an unordered list, the so called \"bag of words\" (BoW). \n",
"\n",
- "
\n",
+ " \n",
"\n",
"Similarly, in bag of *visual* words the images are represented by patches, and their unique patterns (or *visual features*) are extracted into an imaginary bag. However, these visual features are not \"visual words\" just yet.\n",
"\n",
- " "
+ " "
]
},
{
@@ -63,7 +63,7 @@
"source": [
"K-means divides the data into $k$ clusters, where $k$ is chosen by us. Once the data is grouped, k-means calculates the mean for each cluster, i.e., a central point between all of the vectors in a group. The central point represents a *centroid*, i.e., a *visual word*. \n",
"\n",
- " \n",
+ " \n",
"\n",
"After finding the centroids, k-means iterates through each data point (visual feature) and checks which centroid (visual word) is nearest. If the nearest centroid has changed, the data point switches grouping, being assigned to the new nearest centroid. Then, we repeat the centroid computation of above.\n",
"\n",
@@ -90,7 +90,7 @@
"source": [
"When we perform the mapping from new visual feature vectors to the nearest centroid (i.e., visual word), we are essentially categorizing visual features into a more limited set of visual words. This process of reducing the number of possible unique values/vectors is called *quantization*. So for us, this is *vector quantization*.\n",
"\n",
- " \n",
+ " \n",
"\n",
"Using a limited set of visual words allows us to compress our image descriptions into a set of visual word IDs. And, more importantly, represent similar features shared across images using a shared set of visual words.\n",
"\n",
@@ -107,7 +107,7 @@
"\n",
"If we consider $2$ images, we can represent the image histograms as follows:\n",
"\n",
- " \n",
+ " \n",
"\n",
"To create these representations, we have converted each image into a sparse vector where each value in the vector represents an item in the codebook (i.e., the x-axis in the histograms). Most of the values in each vector will be *zero* because most images will only contain a small percentage of total number of visual words, this is why we refer to them as *sparse* vectors.\n",
"\n",
@@ -122,7 +122,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- " "
+ " "
]
},
{
@@ -161,7 +161,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- " "
+ " "
]
},
{
@@ -211,7 +211,7 @@
"\n",
"It equals $1$ if the vectors are pointing in the same direction (the angle equals $0$), and $0$ if vectors are orthogonal.\n",
"\n",
- " "
+ " "
]
},
{
@@ -225,7 +225,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- " "
+ " "
]
},
{
diff --git a/learn/search/image/image-retrieval-ebook/clip-object-detection/zero-shot-object-detection-clip.ipynb b/learn/search/image/image-retrieval-ebook/clip-object-detection/zero-shot-object-detection-clip.ipynb
index ddd3798e..d61a0b0e 100644
--- a/learn/search/image/image-retrieval-ebook/clip-object-detection/zero-shot-object-detection-clip.ipynb
+++ b/learn/search/image/image-retrieval-ebook/clip-object-detection/zero-shot-object-detection-clip.ipynb
@@ -330,7 +330,7 @@
"source": [
"The first step is done. We are now almost ready to process those patches using CLIP. Before doing it, we might want to work through these patches by grouping them into a 6x6 window.\n",
"\n",
- " "
+ " "
]
},
{
@@ -1153,7 +1153,7 @@
"The first column represents non-zero row positions, while the second column non-zero column positions. We can visualize the above co-ordinates as follows:\n",
"\n",
"\n",
- " \n",
+ " \n",
"Co-ordinates of patches with similarity score > 0.5."
]
},
@@ -1162,7 +1162,7 @@
"metadata": {},
"source": [
"The bounding box we want to design, will be around the area of interest.\n",
- " \n",
+ " \n",
"Bounding box."
]
},
@@ -1172,7 +1172,7 @@
"source": [
"We can build it by calculating its corners, which correspond to the minimum and maximum $x$ and $y$ co-ordinates, i.e., $(x_{min}, y_{min})$, $(x_{max}, y_{min})$, $(x_{min}, y_{max})$, and $(x_{max}, y_{max})$.\n",
"\n",
- " \n",
+ " \n",
"Bounding box corners' co-ordinates."
]
},
@@ -1261,7 +1261,7 @@
"source": [
"We use `(y_max - y_min)` and `(x_max - x_min)` to calculate the height and width of the bounding box respectively...\n",
"\n",
- " \n",
+ " \n",
"Height and width calculations.\n"
]
},