diff --git a/docs/tutorials/cql2-filter.ipynb b/docs/tutorials/cql2-filter.ipynb index 5e05ee21..a517f11a 100644 --- a/docs/tutorials/cql2-filter.ipynb +++ b/docs/tutorials/cql2-filter.ipynb @@ -8,9 +8,9 @@ "source": [ "# CQL2 Filtering\n", "\n", - "This notebook demonstrates the use of pystac-client to use [CQL2 filtering](https://github.com/radiantearth/stac-api-spec/tree/master/fragments/filter). The server needs to support this and advertise conformance as the `https://api.stacspec.org/v1.0.0-rc.1/item-search#filter` class in the `conformsTo` attribute of the root API.\n", + "This notebook demonstrates using pystac-client to filter STAC items with [CQL2](https://docs.ogc.org/is/21-065r2/21-065r2.html) as described in the [STAC API Filter Extension](https://github.com/stac-api-extensions/filter). \n", "\n", - "**This should be considered an experimental feature. This notebook uses the Microsoft Planetary Computer API, as it is currently the only public CQL2 implementation.**" + "Note: Not all STAC APIs support the Filter Extension. APIs advertise conformance by including `https://api.stacspec.org/v1.0.0/item-search#filter` in the `conformsTo` attribute of the root API." ] }, { @@ -20,34 +20,20 @@ "metadata": {}, "outputs": [], "source": [ - "# set pystac_client logger to DEBUG to see API calls\n", - "import logging\n", - "from copy import deepcopy\n", + "import json\n", "\n", "import geopandas as gpd\n", "import pandas as pd\n", - "from shapely.geometry import shape\n", - "\n", - "from pystac_client import Client\n", - "\n", - "logging.basicConfig()\n", - "logger = logging.getLogger(\"pystac_client\")\n", - "logger.setLevel(logging.INFO)\n", "\n", - "\n", - "# convert a list of STAC Items into a GeoDataFrame\n", - "def items_to_geodataframe(items):\n", - " _items = []\n", - " for i in items:\n", - " _i = deepcopy(i)\n", - " _i[\"geometry\"] = shape(_i[\"geometry\"])\n", - " _items.append(_i)\n", - " gdf = gpd.GeoDataFrame(pd.json_normalize(_items))\n", - " for field in [\"properties.datetime\", \"properties.created\", \"properties.updated\"]:\n", - " if field in gdf:\n", - " gdf[field] = pd.to_datetime(gdf[field])\n", - " gdf.set_index(\"properties.datetime\", inplace=True)\n", - " return gdf" + "from pystac_client import Client" + ] + }, + { + "cell_type": "markdown", + "id": "c8ac88bb", + "metadata": {}, + "source": [ + "The first step as always with pystac-client is opening the catalog:" ] }, { @@ -60,10 +46,7 @@ "# STAC API root URL\n", "URL = \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n", "\n", - "# custom headers\n", - "headers = []\n", - "\n", - "cat = Client.open(URL, headers=headers)" + "catalog = Client.open(URL)" ] }, { @@ -73,20 +56,16 @@ "source": [ "## Initial Search Parameters\n", "\n", - "Here we perform a search with the `Client.search` function, providing a geometry (`intersects`) a datetime range (`datetime`), and filtering by Item properties (`filter`) using CQL2-JSON." + "Here we set up some initial search parameters to use with the `Client.search` function. We are providing a maximum number of items to return (`max_items`), a collection to look within (`collections`), a geometry (`intersects`), and a datetime range (`datetime`)." ] }, { "cell_type": "code", "execution_count": null, - "id": "d8af6334", + "id": "5e961981", "metadata": {}, "outputs": [], "source": [ - "import json\n", - "\n", - "import hvplot.pandas # noqa: F401\n", - "\n", "# AOI around Delfzijl, in the north of The Netherlands\n", "geom = {\n", " \"type\": \"Polygon\",\n", @@ -106,22 +85,54 @@ " \"collections\": \"landsat-8-c2-l2\",\n", " \"intersects\": geom,\n", " \"datetime\": \"2018-01-01/2020-12-31\",\n", - "}\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "d6f1dd5f", + "metadata": {}, + "source": [ + "## Using Filters\n", "\n", + "In addition to the parameters described above in the following examples we will filter by Item properties (`filter`) using CQL2-JSON. Here is a little function that does the search constructs a `GeoDataFrame` of the results and then plots `datetime` vs `eo:cloud_cover`.\n", "\n", - "# reusable search function\n", - "def search_fetch_plot(params, filt):\n", - " # limit sets the # of items per page so we can see multiple pages getting fetched\n", - " params[\"filter\"] = filt\n", - " search = cat.search(**params)\n", - " items = list(search.items_as_dicts()) # safe b/c we set max_items = 100\n", - " # DataFrame\n", - " items_df = pd.DataFrame(items_to_geodataframe(items))\n", - " print(f\"{len(items_df.index)} items found\")\n", - " field = \"properties.eo:cloud_cover\"\n", - " return items_df.hvplot(\n", - " y=field, label=json.dumps(filt), frame_height=500, frame_width=800\n", - " )" + "Remember that in this whole notebook we are only looking at STAC metadata, there is no part where we are reading the data itself." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b26e89b", + "metadata": {}, + "outputs": [], + "source": [ + "def search_and_plot(filter):\n", + " search = catalog.search(**params, filter=filter)\n", + "\n", + " gdf = gpd.GeoDataFrame.from_features(search.item_collection_as_dict())\n", + " gdf[\"datetime\"] = pd.to_datetime(gdf[\"datetime\"])\n", + " print(f\"Found {len(gdf)} items\")\n", + "\n", + " gdf.plot.line(x=\"datetime\", y=\"eo:cloud_cover\", title=json.dumps(filter))" + ] + }, + { + "cell_type": "markdown", + "id": "11afcc19", + "metadata": {}, + "source": [ + "We can test out the function by passing an empty dict to do no filtering at all." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6293c11", + "metadata": {}, + "outputs": [], + "source": [ + "search_and_plot({})" ] }, { @@ -131,7 +142,7 @@ "source": [ "## CQL2 Filters\n", "\n", - "Below are examples of several different CQL2 filters on the `eo:cloud_cover` property. Up to 100 Items are fetched and the eo:cloud_cover values plotted." + "We will use `eo:cloud_cover` as an example and filter for all the STAC Items where `eo:cloud_cover <= 10%`." ] }, { @@ -141,9 +152,17 @@ "metadata": {}, "outputs": [], "source": [ - "filt = {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n", + "filter = {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 10]}\n", "\n", - "search_fetch_plot(params, filt)" + "search_and_plot(filter)" + ] + }, + { + "cell_type": "markdown", + "id": "75e835f1", + "metadata": {}, + "source": [ + "Next let's look for all the STAC Items where `eo:cloud_cover >= 80%`." ] }, { @@ -153,9 +172,17 @@ "metadata": {}, "outputs": [], "source": [ - "filt = {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n", + "filter = {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 80]}\n", "\n", - "search_fetch_plot(params, filt)" + "search_and_plot(filter)" + ] + }, + { + "cell_type": "markdown", + "id": "0ad984bf", + "metadata": {}, + "source": [ + "We can combine multiple CQL2 statements to express more complicated logic:" ] }, { @@ -165,24 +192,134 @@ "metadata": {}, "outputs": [], "source": [ - "filt = {\n", + "filter = {\n", + " \"op\": \"and\",\n", + " \"args\": [\n", + " {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n", + " {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n", + " ],\n", + "}\n", + "\n", + "search_and_plot(filter)" + ] + }, + { + "cell_type": "markdown", + "id": "617c7416", + "metadata": {}, + "source": [ + "You can see the power of this syntax. Indeed we can replace `datetime` and `intersects` from our original search parameters with a more complex CQL2 statement." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b0dc965", + "metadata": {}, + "outputs": [], + "source": [ + "filter = {\n", " \"op\": \"and\",\n", " \"args\": [\n", - " {\"op\": \"lte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n", - " {\"op\": \"gte\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n", + " {\"op\": \"s_intersects\", \"args\": [{\"property\": \"geometry\"}, geom]},\n", + " {\"op\": \">=\", \"args\": [{\"property\": \"datetime\"}, \"2018-01-01\"]},\n", + " {\"op\": \"<=\", \"args\": [{\"property\": \"datetime\"}, \"2020-12-31\"]},\n", + " {\"op\": \"<=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 60]},\n", + " {\"op\": \">=\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 40]},\n", " ],\n", "}\n", + "search = catalog.search(max_items=100, collections=\"landsat-8-c2-l2\", filter=filter)\n", + "\n", + "print(f\"Found {len(search.item_collection())} items\")" + ] + }, + { + "cell_type": "markdown", + "id": "56503c7b", + "metadata": {}, + "source": [ + "### CQL2 Text\n", + "\n", + "The examples above all use CQL2-json but pystac-client also supports passing `filter` as CQL2 text.\n", "\n", - "search_fetch_plot(params, filt)" + "NOTE: As of right now in pystac-client if you use CQL2 text you need to change the search HTTP method to GET." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e8f62f5", + "metadata": {}, + "outputs": [], + "source": [ + "search = catalog.search(**params, method=\"GET\", filter=\"eo:cloud_cover<=10\")\n", + "\n", + "print(f\"Found {len(search.item_collection())} items\")" + ] + }, + { + "cell_type": "markdown", + "id": "9b865c1f", + "metadata": {}, + "source": [ + "Just like CQL2 json, CQL2 text statements can be combined to express more complex logic:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c06f40cf", + "metadata": {}, + "outputs": [], + "source": [ + "search = catalog.search(\n", + " **params, method=\"GET\", filter=\"eo:cloud_cover<=60 and eo:cloud_cover>=40\"\n", + ")\n", + "\n", + "print(f\"Found {len(search.item_collection())} items\")" + ] + }, + { + "cell_type": "markdown", + "id": "35cbf612", + "metadata": {}, + "source": [ + "## Queryables\n", + "\n", + "pystac-client provides a method for accessing all the arguments that can be used within CQL2 filters for a particular collection. These are provided as a json schema document, but for readability we are mostly interested in the names of the fields within `properties`.\n", + "\n", + "NOTE: When getting the collection, you might notice that we use \"landsat-c2-l2\" as the collection id rather than \"landsat-8-c2-l2\". This is because \"landsat-8-c2-l2\" doesn't actually exist as a collection. It is just used in some places as a collection id on items. This is likely a remnant of some former setup in the Planetary Computer STAC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "90f1cc6d", + "metadata": {}, + "outputs": [], + "source": [ + "collection = catalog.get_collection(\"landsat-c2-l2\")\n", + "queryables = collection.get_queryables()\n", + "\n", + "list(queryables[\"properties\"].keys())" + ] + }, + { + "cell_type": "markdown", + "id": "c407ffec", + "metadata": {}, + "source": [ + "## Read More\n", + "\n", + "- For more involved CQL2 examples in a STAC context read the [STAC API Filter Extension Examples](https://github.com/stac-api-extensions/filter?tab=readme-ov-file#examples)\n", + "\n", + "- For examples of all the different CQL2 operations take a look at the [playground on the CQL2-rs docs](https://developmentseed.org/cql2-rs/latest/playground/)." ] } ], "metadata": { - "interpreter": { - "hash": "6b6313dbab648ff537330b996f33bf845c0da10ea77ae70864d6ca8e2699c7ea" - }, "kernelspec": { - "display_name": "Python 3.9.11 ('.venv': venv)", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -196,7 +333,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.11" + "version": "3.12.11" } }, "nbformat": 4,