Skip to content

Commit 5715e3b

Browse files
committed
Update CQL2 tutorial
1 parent 3c7b16d commit 5715e3b

File tree

1 file changed

+201
-64
lines changed

1 file changed

+201
-64
lines changed

docs/tutorials/cql2-filter.ipynb

Lines changed: 201 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
"source": [
99
"# CQL2 Filtering\n",
1010
"\n",
11-
"This notebook demonstrates the use of pystac-client to use [CQL2 filtering](https://github.com/radiantearth/stac-api-spec/tree/master/fragments/filter). The server needs to support this and advertise conformance as the `https://api.stacspec.org/v1.0.0-rc.1/item-search#filter` class in the `conformsTo` attribute of the root API.\n",
11+
"This notebook demonstrates using pystac-client to filter STAC items with [CQL2](https://docs.ogc.org/is/21-065r2/21-065r2.html) as described in the [STAC API Filter Extension](https://github.com/stac-api-extensions/filter). \n",
1212
"\n",
13-
"**This should be considered an experimental feature. This notebook uses the Microsoft Planetary Computer API, as it is currently the only public CQL2 implementation.**"
13+
"Note: Not all STAC APIs support the Filter Extension. APIs advertise conformance by including `https://api.stacspec.org/v1.0.0/item-search#filter` in the `conformsTo` attribute of the root API."
1414
]
1515
},
1616
{
@@ -20,34 +20,20 @@
2020
"metadata": {},
2121
"outputs": [],
2222
"source": [
23-
"# set pystac_client logger to DEBUG to see API calls\n",
24-
"import logging\n",
25-
"from copy import deepcopy\n",
23+
"import json\n",
2624
"\n",
2725
"import geopandas as gpd\n",
2826
"import pandas as pd\n",
29-
"from shapely.geometry import shape\n",
30-
"\n",
31-
"from pystac_client import Client\n",
32-
"\n",
33-
"logging.basicConfig()\n",
34-
"logger = logging.getLogger(\"pystac_client\")\n",
35-
"logger.setLevel(logging.INFO)\n",
3627
"\n",
37-
"\n",
38-
"# convert a list of STAC Items into a GeoDataFrame\n",
39-
"def items_to_geodataframe(items):\n",
40-
" _items = []\n",
41-
" for i in items:\n",
42-
" _i = deepcopy(i)\n",
43-
" _i[\"geometry\"] = shape(_i[\"geometry\"])\n",
44-
" _items.append(_i)\n",
45-
" gdf = gpd.GeoDataFrame(pd.json_normalize(_items))\n",
46-
" for field in [\"properties.datetime\", \"properties.created\", \"properties.updated\"]:\n",
47-
" if field in gdf:\n",
48-
" gdf[field] = pd.to_datetime(gdf[field])\n",
49-
" gdf.set_index(\"properties.datetime\", inplace=True)\n",
50-
" return gdf"
28+
"from pystac_client import Client"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"id": "c8ac88bb",
34+
"metadata": {},
35+
"source": [
36+
"The first step as always with pystac-client is opening the catalog:"
5137
]
5238
},
5339
{
@@ -60,10 +46,7 @@
6046
"# STAC API root URL\n",
6147
"URL = \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n",
6248
"\n",
63-
"# custom headers\n",
64-
"headers = []\n",
65-
"\n",
66-
"cat = Client.open(URL, headers=headers)"
49+
"catalog = Client.open(URL)"
6750
]
6851
},
6952
{
@@ -73,20 +56,16 @@
7356
"source": [
7457
"## Initial Search Parameters\n",
7558
"\n",
76-
"Here we perform a search with the `Client.search` function, providing a geometry (`intersects`) a datetime range (`datetime`), and filtering by Item properties (`filter`) using CQL2-JSON."
59+
"Here we set up some initial search parameters to use with the `Client.search` function. We are providing a maximum number of items to return (`max_items`), a collection to look within (`collections`), a geometry (`intersects`), and a datetime range (`datetime`)."
7760
]
7861
},
7962
{
8063
"cell_type": "code",
8164
"execution_count": null,
82-
"id": "d8af6334",
65+
"id": "5e961981",
8366
"metadata": {},
8467
"outputs": [],
8568
"source": [
86-
"import json\n",
87-
"\n",
88-
"import hvplot.pandas # noqa: F401\n",
89-
"\n",
9069
"# AOI around Delfzijl, in the north of The Netherlands\n",
9170
"geom = {\n",
9271
" \"type\": \"Polygon\",\n",
@@ -106,22 +85,54 @@
10685
" \"collections\": \"landsat-8-c2-l2\",\n",
10786
" \"intersects\": geom,\n",
10887
" \"datetime\": \"2018-01-01/2020-12-31\",\n",
109-
"}\n",
88+
"}"
89+
]
90+
},
91+
{
92+
"cell_type": "markdown",
93+
"id": "d6f1dd5f",
94+
"metadata": {},
95+
"source": [
96+
"## Using Filters\n",
11097
"\n",
98+
"In addition to the parameters described above in the following examples we will filter by Item properties (`filter`) using CQL2-JSON. Here is a little function that does the search constructs a `GeoDataFrame` of the results and then plots `datetime` vs `eo:cloud_cover`.\n",
11199
"\n",
112-
"# reusable search function\n",
113-
"def search_fetch_plot(params, filt):\n",
114-
" # limit sets the # of items per page so we can see multiple pages getting fetched\n",
115-
" params[\"filter\"] = filt\n",
116-
" search = cat.search(**params)\n",
117-
" items = list(search.items_as_dicts()) # safe b/c we set max_items = 100\n",
118-
" # DataFrame\n",
119-
" items_df = pd.DataFrame(items_to_geodataframe(items))\n",
120-
" print(f\"{len(items_df.index)} items found\")\n",
121-
" field = \"properties.eo:cloud_cover\"\n",
122-
" return items_df.hvplot(\n",
123-
" y=field, label=json.dumps(filt), frame_height=500, frame_width=800\n",
124-
" )"
100+
"Remember that in this whole notebook we are only looking at STAC metadata, there is no part where we are reading the data itself."
101+
]
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": null,
106+
"id": "8b26e89b",
107+
"metadata": {},
108+
"outputs": [],
109+
"source": [
110+
"def search_and_plot(filter):\n",
111+
" search = catalog.search(**params, filter=filter)\n",
112+
"\n",
113+
" gdf = gpd.GeoDataFrame.from_features(search.item_collection_as_dict())\n",
114+
" gdf[\"datetime\"] = pd.to_datetime(gdf[\"datetime\"])\n",
115+
" print(f\"Found {len(gdf)} items\")\n",
116+
"\n",
117+
" gdf.plot.line(x=\"datetime\", y=\"eo:cloud_cover\", title=json.dumps(filter))"
118+
]
119+
},
120+
{
121+
"cell_type": "markdown",
122+
"id": "11afcc19",
123+
"metadata": {},
124+
"source": [
125+
"We can test out the function by passing an empty dict to do no filtering at all."
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": null,
131+
"id": "b6293c11",
132+
"metadata": {},
133+
"outputs": [],
134+
"source": [
135+
"search_and_plot({})"
125136
]
126137
},
127138
{
@@ -131,7 +142,7 @@
131142
"source": [
132143
"## CQL2 Filters\n",
133144
"\n",
134-
"Below are examples of several different CQL2 filters on the `eo:cloud_cover` property. Up to 100 Items are fetched and the eo:cloud_cover values plotted."
145+
"We will use `eo:cloud_cover` as an example and filter for all the STAC Items where `eo:cloud_cover <= 10%`."
135146
]
136147
},
137148
{
@@ -141,9 +152,17 @@
141152
"metadata": {},
142153
"outputs": [],
143154
"source": [
144-
"filt = {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n",
155+
"filter = {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n",
145156
"\n",
146-
"search_fetch_plot(params, filt)"
157+
"search_and_plot(filter)"
158+
]
159+
},
160+
{
161+
"cell_type": "markdown",
162+
"id": "75e835f1",
163+
"metadata": {},
164+
"source": [
165+
"Next let's look for all the STAC Items where `eo:cloud_cover >= 80%`."
147166
]
148167
},
149168
{
@@ -153,9 +172,17 @@
153172
"metadata": {},
154173
"outputs": [],
155174
"source": [
156-
"filt = {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n",
175+
"filter = {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n",
157176
"\n",
158-
"search_fetch_plot(params, filt)"
177+
"search_and_plot(filter)"
178+
]
179+
},
180+
{
181+
"cell_type": "markdown",
182+
"id": "0ad984bf",
183+
"metadata": {},
184+
"source": [
185+
"We can combine multiple CQL2 statements to express more complicated logic:"
159186
]
160187
},
161188
{
@@ -165,24 +192,134 @@
165192
"metadata": {},
166193
"outputs": [],
167194
"source": [
168-
"filt = {\n",
195+
"filter = {\n",
196+
" \"op\": \"and\",\n",
197+
" \"args\": [\n",
198+
" {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n",
199+
" {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n",
200+
" ],\n",
201+
"}\n",
202+
"\n",
203+
"search_and_plot(filter)"
204+
]
205+
},
206+
{
207+
"cell_type": "markdown",
208+
"id": "617c7416",
209+
"metadata": {},
210+
"source": [
211+
"You can see the power of this syntax. Indeed we can replace `datetime` and `intersects` from our original search parameters with a more complex CQL2 statement."
212+
]
213+
},
214+
{
215+
"cell_type": "code",
216+
"execution_count": null,
217+
"id": "7b0dc965",
218+
"metadata": {},
219+
"outputs": [],
220+
"source": [
221+
"filter = {\n",
169222
" \"op\": \"and\",\n",
170223
" \"args\": [\n",
171-
" {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n",
172-
" {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n",
224+
" {\"op\": \"s_intersects\", \"args\": [{\"property\": \"geometry\"}, geom]},\n",
225+
" {\"op\": \">=\", \"args\": [{\"property\": \"datetime\"}, \"2018-01-01\"]},\n",
226+
" {\"op\": \"<=\", \"args\": [{\"property\": \"datetime\"}, \"2020-12-31\"]},\n",
227+
" {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n",
228+
" {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n",
173229
" ],\n",
174230
"}\n",
231+
"search = catalog.search(max_items=100, collections=\"landsat-8-c2-l2\", filter=filter)\n",
232+
"\n",
233+
"print(f\"Found {len(search.item_collection())} items\")"
234+
]
235+
},
236+
{
237+
"cell_type": "markdown",
238+
"id": "56503c7b",
239+
"metadata": {},
240+
"source": [
241+
"### CQL2 Text\n",
242+
"\n",
243+
"The examples above all use CQL2-json but pystac-client also supports passing `filter` as CQL2 text.\n",
175244
"\n",
176-
"search_fetch_plot(params, filt)"
245+
"NOTE: As of right now in pystac-client if you use CQL2 text you need to change the search HTTP method to GET."
246+
]
247+
},
248+
{
249+
"cell_type": "code",
250+
"execution_count": null,
251+
"id": "5e8f62f5",
252+
"metadata": {},
253+
"outputs": [],
254+
"source": [
255+
"search = catalog.search(**params, method=\"GET\", filter=\"eo:cloud_cover<=10\")\n",
256+
"\n",
257+
"print(f\"Found {len(search.item_collection())} items\")"
258+
]
259+
},
260+
{
261+
"cell_type": "markdown",
262+
"id": "9b865c1f",
263+
"metadata": {},
264+
"source": [
265+
"Just like CQL2 json, CQL2 text statements can be combined to express more complex logic:"
266+
]
267+
},
268+
{
269+
"cell_type": "code",
270+
"execution_count": null,
271+
"id": "c06f40cf",
272+
"metadata": {},
273+
"outputs": [],
274+
"source": [
275+
"search = catalog.search(\n",
276+
" **params, method=\"GET\", filter=\"eo:cloud_cover<=60 and eo:cloud_cover>=40\"\n",
277+
")\n",
278+
"\n",
279+
"print(f\"Found {len(search.item_collection())} items\")"
280+
]
281+
},
282+
{
283+
"cell_type": "markdown",
284+
"id": "35cbf612",
285+
"metadata": {},
286+
"source": [
287+
"## Queryables\n",
288+
"\n",
289+
"pystac-client provides a method for accessing all the arguments that can be used within CQL2 filters for a particular collection. These are provided as a json schema document, but for readability we are mostly interested in the names of the fields within `properties`.\n",
290+
"\n",
291+
"NOTE: When getting the collection, you might notice that we use \"landsat-c2-l2\" as the collection id rather than \"landsat-8-c2-l2\". This is because \"landsat-8-c2-l2\" doesn't actually exist as a collection. It is just used in some places as a collection id on items. This is likely a remnant of some former setup in the Planetary Computer STAC."
292+
]
293+
},
294+
{
295+
"cell_type": "code",
296+
"execution_count": null,
297+
"id": "90f1cc6d",
298+
"metadata": {},
299+
"outputs": [],
300+
"source": [
301+
"collection = catalog.get_collection(\"landsat-c2-l2\")\n",
302+
"queryables = collection.get_queryables()\n",
303+
"\n",
304+
"list(queryables[\"properties\"].keys())"
305+
]
306+
},
307+
{
308+
"cell_type": "markdown",
309+
"id": "c407ffec",
310+
"metadata": {},
311+
"source": [
312+
"## Read More\n",
313+
"\n",
314+
"- For more involved CQL2 examples in a STAC context read the [STAC API Filter Extension Examples](https://github.com/stac-api-extensions/filter?tab=readme-ov-file#examples)\n",
315+
"\n",
316+
"- For examples of all the different CQL2 operations take a look at the [playground on the CQL2-rs docs](https://developmentseed.org/cql2-rs/latest/playground/)."
177317
]
178318
}
179319
],
180320
"metadata": {
181-
"interpreter": {
182-
"hash": "6b6313dbab648ff537330b996f33bf845c0da10ea77ae70864d6ca8e2699c7ea"
183-
},
184321
"kernelspec": {
185-
"display_name": "Python 3.9.11 ('.venv': venv)",
322+
"display_name": "Python 3 (ipykernel)",
186323
"language": "python",
187324
"name": "python3"
188325
},
@@ -196,7 +333,7 @@
196333
"name": "python",
197334
"nbconvert_exporter": "python",
198335
"pygments_lexer": "ipython3",
199-
"version": "3.9.11"
336+
"version": "3.12.11"
200337
}
201338
},
202339
"nbformat": 4,

0 commit comments

Comments
 (0)