Skip to content

Update CQL2 tutorial #802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 201 additions & 64 deletions docs/tutorials/cql2-filter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
"source": [
"# CQL2 Filtering\n",
"\n",
"This notebook demonstrates the use of pystac-client to use [CQL2 filtering](https://github.com/radiantearth/stac-api-spec/tree/master/fragments/filter). The server needs to support this and advertise conformance as the `https://api.stacspec.org/v1.0.0-rc.1/item-search#filter` class in the `conformsTo` attribute of the root API.\n",
"This notebook demonstrates using pystac-client to filter STAC items with [CQL2](https://docs.ogc.org/is/21-065r2/21-065r2.html) as described in the [STAC API Filter Extension](https://github.com/stac-api-extensions/filter). \n",
"\n",
"**This should be considered an experimental feature. This notebook uses the Microsoft Planetary Computer API, as it is currently the only public CQL2 implementation.**"
"Note: Not all STAC APIs support the Filter Extension. APIs advertise conformance by including `https://api.stacspec.org/v1.0.0/item-search#filter` in the `conformsTo` attribute of the root API."
]
},
{
Expand All @@ -20,34 +20,20 @@
"metadata": {},
"outputs": [],
"source": [
"# set pystac_client logger to DEBUG to see API calls\n",
"import logging\n",
"from copy import deepcopy\n",
"import json\n",
"\n",
"import geopandas as gpd\n",
"import pandas as pd\n",
"from shapely.geometry import shape\n",
"\n",
"from pystac_client import Client\n",
"\n",
"logging.basicConfig()\n",
"logger = logging.getLogger(\"pystac_client\")\n",
"logger.setLevel(logging.INFO)\n",
"\n",
"\n",
"# convert a list of STAC Items into a GeoDataFrame\n",
"def items_to_geodataframe(items):\n",
" _items = []\n",
" for i in items:\n",
" _i = deepcopy(i)\n",
" _i[\"geometry\"] = shape(_i[\"geometry\"])\n",
" _items.append(_i)\n",
" gdf = gpd.GeoDataFrame(pd.json_normalize(_items))\n",
" for field in [\"properties.datetime\", \"properties.created\", \"properties.updated\"]:\n",
" if field in gdf:\n",
" gdf[field] = pd.to_datetime(gdf[field])\n",
" gdf.set_index(\"properties.datetime\", inplace=True)\n",
" return gdf"
"from pystac_client import Client"
]
},
{
"cell_type": "markdown",
"id": "c8ac88bb",
"metadata": {},
"source": [
"The first step as always with pystac-client is opening the catalog:"
]
},
{
Expand All @@ -60,10 +46,7 @@
"# STAC API root URL\n",
"URL = \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n",
"\n",
"# custom headers\n",
"headers = []\n",
"\n",
"cat = Client.open(URL, headers=headers)"
"catalog = Client.open(URL)"
]
},
{
Expand All @@ -73,20 +56,16 @@
"source": [
"## Initial Search Parameters\n",
"\n",
"Here we perform a search with the `Client.search` function, providing a geometry (`intersects`) a datetime range (`datetime`), and filtering by Item properties (`filter`) using CQL2-JSON."
"Here we set up some initial search parameters to use with the `Client.search` function. We are providing a maximum number of items to return (`max_items`), a collection to look within (`collections`), a geometry (`intersects`), and a datetime range (`datetime`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8af6334",
"id": "5e961981",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"import hvplot.pandas # noqa: F401\n",
"\n",
"# AOI around Delfzijl, in the north of The Netherlands\n",
"geom = {\n",
" \"type\": \"Polygon\",\n",
Expand All @@ -106,22 +85,54 @@
" \"collections\": \"landsat-8-c2-l2\",\n",
" \"intersects\": geom,\n",
" \"datetime\": \"2018-01-01/2020-12-31\",\n",
"}\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "d6f1dd5f",
"metadata": {},
"source": [
"## Using Filters\n",
"\n",
"In addition to the parameters described above in the following examples we will filter by Item properties (`filter`) using CQL2-JSON. Here is a little function that does the search constructs a `GeoDataFrame` of the results and then plots `datetime` vs `eo:cloud_cover`.\n",
"\n",
"# reusable search function\n",
"def search_fetch_plot(params, filt):\n",
" # limit sets the # of items per page so we can see multiple pages getting fetched\n",
" params[\"filter\"] = filt\n",
" search = cat.search(**params)\n",
" items = list(search.items_as_dicts()) # safe b/c we set max_items = 100\n",
" # DataFrame\n",
" items_df = pd.DataFrame(items_to_geodataframe(items))\n",
" print(f\"{len(items_df.index)} items found\")\n",
" field = \"properties.eo:cloud_cover\"\n",
" return items_df.hvplot(\n",
" y=field, label=json.dumps(filt), frame_height=500, frame_width=800\n",
" )"
"Remember that in this whole notebook we are only looking at STAC metadata, there is no part where we are reading the data itself."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b26e89b",
"metadata": {},
"outputs": [],
"source": [
"def search_and_plot(filter):\n",
" search = catalog.search(**params, filter=filter)\n",
"\n",
" gdf = gpd.GeoDataFrame.from_features(search.item_collection_as_dict())\n",
" gdf[\"datetime\"] = pd.to_datetime(gdf[\"datetime\"])\n",
" print(f\"Found {len(gdf)} items\")\n",
"\n",
" gdf.plot.line(x=\"datetime\", y=\"eo:cloud_cover\", title=json.dumps(filter))"
]
},
{
"cell_type": "markdown",
"id": "11afcc19",
"metadata": {},
"source": [
"We can test out the function by passing an empty dict to do no filtering at all."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6293c11",
"metadata": {},
"outputs": [],
"source": [
"search_and_plot({})"
]
},
{
Expand All @@ -131,7 +142,7 @@
"source": [
"## CQL2 Filters\n",
"\n",
"Below are examples of several different CQL2 filters on the `eo:cloud_cover` property. Up to 100 Items are fetched and the eo:cloud_cover values plotted."
"We will use `eo:cloud_cover` as an example and filter for all the STAC Items where `eo:cloud_cover <= 10%`."
]
},
{
Expand All @@ -141,9 +152,17 @@
"metadata": {},
"outputs": [],
"source": [
"filt = {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n",
"filter = {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n",
"\n",
"search_fetch_plot(params, filt)"
"search_and_plot(filter)"
]
},
{
"cell_type": "markdown",
"id": "75e835f1",
"metadata": {},
"source": [
"Next let's look for all the STAC Items where `eo:cloud_cover >= 80%`."
]
},
{
Expand All @@ -153,9 +172,17 @@
"metadata": {},
"outputs": [],
"source": [
"filt = {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n",
"filter = {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n",
"\n",
"search_fetch_plot(params, filt)"
"search_and_plot(filter)"
]
},
{
"cell_type": "markdown",
"id": "0ad984bf",
"metadata": {},
"source": [
"We can combine multiple CQL2 statements to express more complicated logic:"
]
},
{
Expand All @@ -165,24 +192,134 @@
"metadata": {},
"outputs": [],
"source": [
"filt = {\n",
"filter = {\n",
" \"op\": \"and\",\n",
" \"args\": [\n",
" {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n",
" {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n",
" ],\n",
"}\n",
"\n",
"search_and_plot(filter)"
]
},
{
"cell_type": "markdown",
"id": "617c7416",
"metadata": {},
"source": [
"You can see the power of this syntax. Indeed we can replace `datetime` and `intersects` from our original search parameters with a more complex CQL2 statement."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b0dc965",
"metadata": {},
"outputs": [],
"source": [
"filter = {\n",
" \"op\": \"and\",\n",
" \"args\": [\n",
" {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n",
" {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n",
" {\"op\": \"s_intersects\", \"args\": [{\"property\": \"geometry\"}, geom]},\n",
" {\"op\": \">=\", \"args\": [{\"property\": \"datetime\"}, \"2018-01-01\"]},\n",
" {\"op\": \"<=\", \"args\": [{\"property\": \"datetime\"}, \"2020-12-31\"]},\n",
" {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n",
" {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n",
" ],\n",
"}\n",
"search = catalog.search(max_items=100, collections=\"landsat-8-c2-l2\", filter=filter)\n",
"\n",
"print(f\"Found {len(search.item_collection())} items\")"
]
},
{
"cell_type": "markdown",
"id": "56503c7b",
"metadata": {},
"source": [
"### CQL2 Text\n",
"\n",
"The examples above all use CQL2-json but pystac-client also supports passing `filter` as CQL2 text.\n",
"\n",
"search_fetch_plot(params, filt)"
"NOTE: As of right now in pystac-client if you use CQL2 text you need to change the search HTTP method to GET."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e8f62f5",
"metadata": {},
"outputs": [],
"source": [
"search = catalog.search(**params, method=\"GET\", filter=\"eo:cloud_cover<=10\")\n",
"\n",
"print(f\"Found {len(search.item_collection())} items\")"
]
},
{
"cell_type": "markdown",
"id": "9b865c1f",
"metadata": {},
"source": [
"Just like CQL2 json, CQL2 text statements can be combined to express more complex logic:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c06f40cf",
"metadata": {},
"outputs": [],
"source": [
"search = catalog.search(\n",
" **params, method=\"GET\", filter=\"eo:cloud_cover<=60 and eo:cloud_cover>=40\"\n",
")\n",
"\n",
"print(f\"Found {len(search.item_collection())} items\")"
]
},
{
"cell_type": "markdown",
"id": "35cbf612",
"metadata": {},
"source": [
"## Queryables\n",
"\n",
"pystac-client provides a method for accessing all the arguments that can be used within CQL2 filters for a particular collection. These are provided as a json schema document, but for readability we are mostly interested in the names of the fields within `properties`.\n",
"\n",
"NOTE: When getting the collection, you might notice that we use \"landsat-c2-l2\" as the collection id rather than \"landsat-8-c2-l2\". This is because \"landsat-8-c2-l2\" doesn't actually exist as a collection. It is just used in some places as a collection id on items. This is likely a remnant of some former setup in the Planetary Computer STAC."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90f1cc6d",
"metadata": {},
"outputs": [],
"source": [
"collection = catalog.get_collection(\"landsat-c2-l2\")\n",
"queryables = collection.get_queryables()\n",
"\n",
"list(queryables[\"properties\"].keys())"
]
},
{
"cell_type": "markdown",
"id": "c407ffec",
"metadata": {},
"source": [
"## Read More\n",
"\n",
"- For more involved CQL2 examples in a STAC context read the [STAC API Filter Extension Examples](https://github.com/stac-api-extensions/filter?tab=readme-ov-file#examples)\n",
"\n",
"- For examples of all the different CQL2 operations take a look at the [playground on the CQL2-rs docs](https://developmentseed.org/cql2-rs/latest/playground/)."
]
}
],
"metadata": {
"interpreter": {
"hash": "6b6313dbab648ff537330b996f33bf845c0da10ea77ae70864d6ca8e2699c7ea"
},
"kernelspec": {
"display_name": "Python 3.9.11 ('.venv': venv)",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -196,7 +333,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.11"
"version": "3.12.11"
}
},
"nbformat": 4,
Expand Down