|
8 | 8 | "source": [
|
9 | 9 | "# CQL2 Filtering\n",
|
10 | 10 | "\n",
|
11 |
| - "This notebook demonstrates the use of pystac-client to use [CQL2 filtering](https://github.com/radiantearth/stac-api-spec/tree/master/fragments/filter). The server needs to support this and advertise conformance as the `https://api.stacspec.org/v1.0.0-rc.1/item-search#filter` class in the `conformsTo` attribute of the root API.\n", |
| 11 | + "This notebook demonstrates using pystac-client to filter STAC items with [CQL2](https://docs.ogc.org/is/21-065r2/21-065r2.html) as described in the [STAC API Filter Extension](https://github.com/stac-api-extensions/filter). \n", |
12 | 12 | "\n",
|
13 |
| - "**This should be considered an experimental feature. This notebook uses the Microsoft Planetary Computer API, as it is currently the only public CQL2 implementation.**" |
| 13 | + "Note: Not all STAC APIs support the Filter Extension. APIs advertise conformance by including `https://api.stacspec.org/v1.0.0/item-search#filter` in the `conformsTo` attribute of the root API." |
14 | 14 | ]
|
15 | 15 | },
|
16 | 16 | {
|
|
20 | 20 | "metadata": {},
|
21 | 21 | "outputs": [],
|
22 | 22 | "source": [
|
23 |
| - "# set pystac_client logger to DEBUG to see API calls\n", |
24 |
| - "import logging\n", |
25 |
| - "from copy import deepcopy\n", |
| 23 | + "import json\n", |
26 | 24 | "\n",
|
27 | 25 | "import geopandas as gpd\n",
|
28 | 26 | "import pandas as pd\n",
|
29 |
| - "from shapely.geometry import shape\n", |
30 |
| - "\n", |
31 |
| - "from pystac_client import Client\n", |
32 |
| - "\n", |
33 |
| - "logging.basicConfig()\n", |
34 |
| - "logger = logging.getLogger(\"pystac_client\")\n", |
35 |
| - "logger.setLevel(logging.INFO)\n", |
36 | 27 | "\n",
|
37 |
| - "\n", |
38 |
| - "# convert a list of STAC Items into a GeoDataFrame\n", |
39 |
| - "def items_to_geodataframe(items):\n", |
40 |
| - " _items = []\n", |
41 |
| - " for i in items:\n", |
42 |
| - " _i = deepcopy(i)\n", |
43 |
| - " _i[\"geometry\"] = shape(_i[\"geometry\"])\n", |
44 |
| - " _items.append(_i)\n", |
45 |
| - " gdf = gpd.GeoDataFrame(pd.json_normalize(_items))\n", |
46 |
| - " for field in [\"properties.datetime\", \"properties.created\", \"properties.updated\"]:\n", |
47 |
| - " if field in gdf:\n", |
48 |
| - " gdf[field] = pd.to_datetime(gdf[field])\n", |
49 |
| - " gdf.set_index(\"properties.datetime\", inplace=True)\n", |
50 |
| - " return gdf" |
| 28 | + "from pystac_client import Client" |
| 29 | + ] |
| 30 | + }, |
| 31 | + { |
| 32 | + "cell_type": "markdown", |
| 33 | + "id": "c8ac88bb", |
| 34 | + "metadata": {}, |
| 35 | + "source": [ |
| 36 | + "The first step as always with pystac-client is opening the catalog:" |
51 | 37 | ]
|
52 | 38 | },
|
53 | 39 | {
|
|
60 | 46 | "# STAC API root URL\n",
|
61 | 47 | "URL = \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n",
|
62 | 48 | "\n",
|
63 |
| - "# custom headers\n", |
64 |
| - "headers = []\n", |
65 |
| - "\n", |
66 |
| - "cat = Client.open(URL, headers=headers)" |
| 49 | + "catalog = Client.open(URL)" |
67 | 50 | ]
|
68 | 51 | },
|
69 | 52 | {
|
|
73 | 56 | "source": [
|
74 | 57 | "## Initial Search Parameters\n",
|
75 | 58 | "\n",
|
76 |
| - "Here we perform a search with the `Client.search` function, providing a geometry (`intersects`) a datetime range (`datetime`), and filtering by Item properties (`filter`) using CQL2-JSON." |
| 59 | + "Here we set up some initial search parameters to use with the `Client.search` function. We are providing a maximum number of items to return (`max_items`), a collection to look within (`collections`), a geometry (`intersects`), and a datetime range (`datetime`)." |
77 | 60 | ]
|
78 | 61 | },
|
79 | 62 | {
|
80 | 63 | "cell_type": "code",
|
81 | 64 | "execution_count": null,
|
82 |
| - "id": "d8af6334", |
| 65 | + "id": "5e961981", |
83 | 66 | "metadata": {},
|
84 | 67 | "outputs": [],
|
85 | 68 | "source": [
|
86 |
| - "import json\n", |
87 |
| - "\n", |
88 |
| - "import hvplot.pandas # noqa: F401\n", |
89 |
| - "\n", |
90 | 69 | "# AOI around Delfzijl, in the north of The Netherlands\n",
|
91 | 70 | "geom = {\n",
|
92 | 71 | " \"type\": \"Polygon\",\n",
|
|
106 | 85 | " \"collections\": \"landsat-8-c2-l2\",\n",
|
107 | 86 | " \"intersects\": geom,\n",
|
108 | 87 | " \"datetime\": \"2018-01-01/2020-12-31\",\n",
|
109 |
| - "}\n", |
| 88 | + "}" |
| 89 | + ] |
| 90 | + }, |
| 91 | + { |
| 92 | + "cell_type": "markdown", |
| 93 | + "id": "d6f1dd5f", |
| 94 | + "metadata": {}, |
| 95 | + "source": [ |
| 96 | + "## Using Filters\n", |
110 | 97 | "\n",
|
| 98 | + "In addition to the parameters described above in the following examples we will filter by Item properties (`filter`) using CQL2-JSON. Here is a little function that does the search constructs a `GeoDataFrame` of the results and then plots `datetime` vs `eo:cloud_cover`.\n", |
111 | 99 | "\n",
|
112 |
| - "# reusable search function\n", |
113 |
| - "def search_fetch_plot(params, filt):\n", |
114 |
| - " # limit sets the # of items per page so we can see multiple pages getting fetched\n", |
115 |
| - " params[\"filter\"] = filt\n", |
116 |
| - " search = cat.search(**params)\n", |
117 |
| - " items = list(search.items_as_dicts()) # safe b/c we set max_items = 100\n", |
118 |
| - " # DataFrame\n", |
119 |
| - " items_df = pd.DataFrame(items_to_geodataframe(items))\n", |
120 |
| - " print(f\"{len(items_df.index)} items found\")\n", |
121 |
| - " field = \"properties.eo:cloud_cover\"\n", |
122 |
| - " return items_df.hvplot(\n", |
123 |
| - " y=field, label=json.dumps(filt), frame_height=500, frame_width=800\n", |
124 |
| - " )" |
| 100 | + "Remember that in this whole notebook we are only looking at STAC metadata, there is no part where we are reading the data itself." |
| 101 | + ] |
| 102 | + }, |
| 103 | + { |
| 104 | + "cell_type": "code", |
| 105 | + "execution_count": null, |
| 106 | + "id": "8b26e89b", |
| 107 | + "metadata": {}, |
| 108 | + "outputs": [], |
| 109 | + "source": [ |
| 110 | + "def search_and_plot(filter):\n", |
| 111 | + " search = catalog.search(**params, filter=filter)\n", |
| 112 | + "\n", |
| 113 | + " gdf = gpd.GeoDataFrame.from_features(search.item_collection_as_dict())\n", |
| 114 | + " gdf[\"datetime\"] = pd.to_datetime(gdf[\"datetime\"])\n", |
| 115 | + " print(f\"Found {len(gdf)} items\")\n", |
| 116 | + "\n", |
| 117 | + " gdf.plot.line(x=\"datetime\", y=\"eo:cloud_cover\", title=json.dumps(filter))" |
| 118 | + ] |
| 119 | + }, |
| 120 | + { |
| 121 | + "cell_type": "markdown", |
| 122 | + "id": "11afcc19", |
| 123 | + "metadata": {}, |
| 124 | + "source": [ |
| 125 | + "We can test out the function by passing an empty dict to do no filtering at all." |
| 126 | + ] |
| 127 | + }, |
| 128 | + { |
| 129 | + "cell_type": "code", |
| 130 | + "execution_count": null, |
| 131 | + "id": "b6293c11", |
| 132 | + "metadata": {}, |
| 133 | + "outputs": [], |
| 134 | + "source": [ |
| 135 | + "search_and_plot({})" |
125 | 136 | ]
|
126 | 137 | },
|
127 | 138 | {
|
|
131 | 142 | "source": [
|
132 | 143 | "## CQL2 Filters\n",
|
133 | 144 | "\n",
|
134 |
| - "Below are examples of several different CQL2 filters on the `eo:cloud_cover` property. Up to 100 Items are fetched and the eo:cloud_cover values plotted." |
| 145 | + "We will use `eo:cloud_cover` as an example and filter for all the STAC Items where `eo:cloud_cover <= 10%`." |
135 | 146 | ]
|
136 | 147 | },
|
137 | 148 | {
|
|
141 | 152 | "metadata": {},
|
142 | 153 | "outputs": [],
|
143 | 154 | "source": [
|
144 |
| - "filt = {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n", |
| 155 | + "filter = {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n", |
145 | 156 | "\n",
|
146 |
| - "search_fetch_plot(params, filt)" |
| 157 | + "search_and_plot(filter)" |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "markdown", |
| 162 | + "id": "75e835f1", |
| 163 | + "metadata": {}, |
| 164 | + "source": [ |
| 165 | + "Next let's look for all the STAC Items where `eo:cloud_cover >= 80%`." |
147 | 166 | ]
|
148 | 167 | },
|
149 | 168 | {
|
|
153 | 172 | "metadata": {},
|
154 | 173 | "outputs": [],
|
155 | 174 | "source": [
|
156 |
| - "filt = {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n", |
| 175 | + "filter = {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n", |
157 | 176 | "\n",
|
158 |
| - "search_fetch_plot(params, filt)" |
| 177 | + "search_and_plot(filter)" |
| 178 | + ] |
| 179 | + }, |
| 180 | + { |
| 181 | + "cell_type": "markdown", |
| 182 | + "id": "0ad984bf", |
| 183 | + "metadata": {}, |
| 184 | + "source": [ |
| 185 | + "We can combine multiple CQL2 statements to express more complicated logic:" |
159 | 186 | ]
|
160 | 187 | },
|
161 | 188 | {
|
|
165 | 192 | "metadata": {},
|
166 | 193 | "outputs": [],
|
167 | 194 | "source": [
|
168 |
| - "filt = {\n", |
| 195 | + "filter = {\n", |
| 196 | + " \"op\": \"and\",\n", |
| 197 | + " \"args\": [\n", |
| 198 | + " {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n", |
| 199 | + " {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n", |
| 200 | + " ],\n", |
| 201 | + "}\n", |
| 202 | + "\n", |
| 203 | + "search_and_plot(filter)" |
| 204 | + ] |
| 205 | + }, |
| 206 | + { |
| 207 | + "cell_type": "markdown", |
| 208 | + "id": "617c7416", |
| 209 | + "metadata": {}, |
| 210 | + "source": [ |
| 211 | + "You can see the power of this syntax. Indeed we can replace `datetime` and `intersects` from our original search parameters with a more complex CQL2 statement." |
| 212 | + ] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "code", |
| 216 | + "execution_count": null, |
| 217 | + "id": "7b0dc965", |
| 218 | + "metadata": {}, |
| 219 | + "outputs": [], |
| 220 | + "source": [ |
| 221 | + "filter = {\n", |
169 | 222 | " \"op\": \"and\",\n",
|
170 | 223 | " \"args\": [\n",
|
171 |
| - " {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n", |
172 |
| - " {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n", |
| 224 | + " {\"op\": \"s_intersects\", \"args\": [{\"property\": \"geometry\"}, geom]},\n", |
| 225 | + " {\"op\": \">=\", \"args\": [{\"property\": \"datetime\"}, \"2018-01-01\"]},\n", |
| 226 | + " {\"op\": \"<=\", \"args\": [{\"property\": \"datetime\"}, \"2020-12-31\"]},\n", |
| 227 | + " {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n", |
| 228 | + " {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n", |
173 | 229 | " ],\n",
|
174 | 230 | "}\n",
|
| 231 | + "search = catalog.search(max_items=100, collections=\"landsat-8-c2-l2\", filter=filter)\n", |
| 232 | + "\n", |
| 233 | + "print(f\"Found {len(search.item_collection())} items\")" |
| 234 | + ] |
| 235 | + }, |
| 236 | + { |
| 237 | + "cell_type": "markdown", |
| 238 | + "id": "56503c7b", |
| 239 | + "metadata": {}, |
| 240 | + "source": [ |
| 241 | + "### CQL2 Text\n", |
| 242 | + "\n", |
| 243 | + "The examples above all use CQL2-json but pystac-client also supports passing `filter` as CQL2 text.\n", |
175 | 244 | "\n",
|
176 |
| - "search_fetch_plot(params, filt)" |
| 245 | + "NOTE: As of right now in pystac-client if you use CQL2 text you need to change the search HTTP method to GET." |
| 246 | + ] |
| 247 | + }, |
| 248 | + { |
| 249 | + "cell_type": "code", |
| 250 | + "execution_count": null, |
| 251 | + "id": "5e8f62f5", |
| 252 | + "metadata": {}, |
| 253 | + "outputs": [], |
| 254 | + "source": [ |
| 255 | + "search = catalog.search(**params, method=\"GET\", filter=\"eo:cloud_cover<=10\")\n", |
| 256 | + "\n", |
| 257 | + "print(f\"Found {len(search.item_collection())} items\")" |
| 258 | + ] |
| 259 | + }, |
| 260 | + { |
| 261 | + "cell_type": "markdown", |
| 262 | + "id": "9b865c1f", |
| 263 | + "metadata": {}, |
| 264 | + "source": [ |
| 265 | + "Just like CQL2 json, CQL2 text statements can be combined to express more complex logic:" |
| 266 | + ] |
| 267 | + }, |
| 268 | + { |
| 269 | + "cell_type": "code", |
| 270 | + "execution_count": null, |
| 271 | + "id": "c06f40cf", |
| 272 | + "metadata": {}, |
| 273 | + "outputs": [], |
| 274 | + "source": [ |
| 275 | + "search = catalog.search(\n", |
| 276 | + " **params, method=\"GET\", filter=\"eo:cloud_cover<=60 and eo:cloud_cover>=40\"\n", |
| 277 | + ")\n", |
| 278 | + "\n", |
| 279 | + "print(f\"Found {len(search.item_collection())} items\")" |
| 280 | + ] |
| 281 | + }, |
| 282 | + { |
| 283 | + "cell_type": "markdown", |
| 284 | + "id": "35cbf612", |
| 285 | + "metadata": {}, |
| 286 | + "source": [ |
| 287 | + "## Queryables\n", |
| 288 | + "\n", |
| 289 | + "pystac-client provides a method for accessing all the arguments that can be used within CQL2 filters for a particular collection. These are provided as a json schema document, but for readability we are mostly interested in the names of the fields within `properties`.\n", |
| 290 | + "\n", |
| 291 | + "NOTE: When getting the collection, you might notice that we use \"landsat-c2-l2\" as the collection id rather than \"landsat-8-c2-l2\". This is because \"landsat-8-c2-l2\" doesn't actually exist as a collection. It is just used in some places as a collection id on items. This is likely a remnant of some former setup in the Planetary Computer STAC." |
| 292 | + ] |
| 293 | + }, |
| 294 | + { |
| 295 | + "cell_type": "code", |
| 296 | + "execution_count": null, |
| 297 | + "id": "90f1cc6d", |
| 298 | + "metadata": {}, |
| 299 | + "outputs": [], |
| 300 | + "source": [ |
| 301 | + "collection = catalog.get_collection(\"landsat-c2-l2\")\n", |
| 302 | + "queryables = collection.get_queryables()\n", |
| 303 | + "\n", |
| 304 | + "list(queryables[\"properties\"].keys())" |
| 305 | + ] |
| 306 | + }, |
| 307 | + { |
| 308 | + "cell_type": "markdown", |
| 309 | + "id": "c407ffec", |
| 310 | + "metadata": {}, |
| 311 | + "source": [ |
| 312 | + "## Read More\n", |
| 313 | + "\n", |
| 314 | + "- For more involved CQL2 examples in a STAC context read the [STAC API Filter Extension Examples](https://github.com/stac-api-extensions/filter?tab=readme-ov-file#examples)\n", |
| 315 | + "\n", |
| 316 | + "- For examples of all the different CQL2 operations take a look at the [playground on the CQL2-rs docs](https://developmentseed.org/cql2-rs/latest/playground/)." |
177 | 317 | ]
|
178 | 318 | }
|
179 | 319 | ],
|
180 | 320 | "metadata": {
|
181 |
| - "interpreter": { |
182 |
| - "hash": "6b6313dbab648ff537330b996f33bf845c0da10ea77ae70864d6ca8e2699c7ea" |
183 |
| - }, |
184 | 321 | "kernelspec": {
|
185 |
| - "display_name": "Python 3.9.11 ('.venv': venv)", |
| 322 | + "display_name": "Python 3 (ipykernel)", |
186 | 323 | "language": "python",
|
187 | 324 | "name": "python3"
|
188 | 325 | },
|
|
196 | 333 | "name": "python",
|
197 | 334 | "nbconvert_exporter": "python",
|
198 | 335 | "pygments_lexer": "ipython3",
|
199 |
| - "version": "3.9.11" |
| 336 | + "version": "3.12.11" |
200 | 337 | }
|
201 | 338 | },
|
202 | 339 | "nbformat": 4,
|
|
0 commit comments