PySTAC Decoded: A Step-by-Step Tutorial

Hi all! In the previous post I had discussed about how to search satellite imagery and other spatio temporal data using STAC in QGIS. Honestly it's very convenient to find a satellite imagery for a certain time in a certain area from multiple domains using STAC. We only need an URL link and the STAC tool will search for collections of data that available. Seamlessly it will enhancing and fostering the use of spatio temporal data in a GIS application to solve a real world problem.

More than that, the use of STAC to find any related spatio temporal data can be more powerful combining with a programming language like Python. It's not only about searching a specific data but also elaborate other spatial computation and analysis in a row.

The question is: How to implement STAC in Python? In this tutorial we'll get through it. This post will explain step by step approach how to use STAC in Python from searching data, display it and do analysis for the data. Keep reading!

Installing Libraries

We start this tutorial by importing some required libraries. For this tutorial we need some libraries such as pystac-client, odc-stac, and matplotlib. Make sure you have the libraries, if you don't please install it first. For the installation you can use pip or conda depends on your environment.   

1
2
3
from pystac_client import Client
from odc.stac import load
import matplotlib.pyplot as plt

Searching for Images/Assets with PySTAC

After import libraries, let's open a STAC with the URL: https://earth-search.aws.element84.com/v1 with the following code. Then we call all the collections with get_collections() method.

1
2
3
4
5
client = Client.open("https://earth-search.aws.element84.com/v1")
collections=list(client.get_collections())
number_collections=len(collections)
print(f"{number_collections} collections")
print(collections)

If the collections variable is printed, we'll get result like this:

8 collections
[<CollectionClient id=cop-dem-glo-30>, 
<CollectionClient id=naip>, 
<CollectionClient id=sentinel-2-l2a>, 
<CollectionClient id=sentinel-2-l1c>, 
<CollectionClient id=cop-dem-glo-90>, 
<CollectionClient id=landsat-c2-l2>, 
<CollectionClient id=sentinel-1-grd>,
<CollectionClient id=sentinel-2-c1-l2a>]

The collections result return as a list. Then we can simply count the number of collections using the len function.

Now let's take sentinel-2-l2a collection. Define a boundary extent in the bbox variable and define a date range. Using those parameters do searching in the collection using client.search() method. Last two lines in the following code will print the item's id and percentage of cloud coverage.

1
2
3
4
5
6
7
collection = "sentinel-2-l2a"
bbox = [34.2201,31.1394,34.6694,31.6493]# min lon,lat, max lon, lat
date="2023-12-01/2023-12-31"
search = client.search(collections=[collection], bbox=bbox, datetime=date)
print(f"{search.matched()} items found")
for item in search.items():
    print(item.id,":",item.properties['eo:cloud_cover'])

Running the code we'll get the following result.

14 items found
S2B_36RXV_20231231_0_L2A : 1.321937
S2B_36SXA_20231231_0_L2A : 16.668269
S2A_36RXV_20231226_0_L2A : 0.003849
S2A_36SXA_20231226_0_L2A : 0.006606
S2B_36RXV_20231221_0_L2A : 100
S2B_36SXA_20231221_0_L2A : 100
S2A_36RXV_20231216_0_L2A : 48.060358
S2A_36SXA_20231216_0_L2A : 88.949937
S2B_36RXV_20231211_0_L2A : 9.257196
S2B_36SXA_20231211_0_L2A : 4.992877
S2A_36RXV_20231206_0_L2A : 25.865424
S2A_36SXA_20231206_0_L2A : 72.468609
S2B_36RXV_20231201_0_L2A : 1.460705
S2B_36SXA_20231201_0_L2A : 6.149376

From the result we know that the item with id S2A_36RXV_20231226_0_L2A has the lowest cloud coverage 0.003849%. We can use this item to work on. But it will be better if we can filter the result to select the lowest cloud coverage. The next code is used for this purpose.

1
2
selected_item=min(search.items(),key=lambda item: item.properties["eo:cloud_cover"])
print(selected_item.id, selected_item.properties["eo:cloud_cover"])

We have a collection item, now let's find out what assets are available in it. To get all available assets the assets.items() method is used. Then using it's key and and title properties we print the result.

1
2
3
assets=selected_item.assets.items()
for key,title in assets:
    print(key,'|',title.title)

Below is the sample result from the code.

aot | Aerosol optical thickness (AOT)
blue | Blue (band 2) - 10m
coastal | Coastal aerosol (band 1) - 60m
granule_metadata | None
green | Green (band 3) - 10m
nir | NIR 1 (band 8) - 10m
nir08 | NIR 2 (band 8A) - 20m
nir09 | NIR 3 (band 9) - 60m
red | Red (band 4) - 10m
rededge1 | Red edge 1 (band 5) - 20m
rededge2 | Red edge 2 (band 6) - 20m
rededge3 | Red edge 3 (band 7) - 20m

Preview/Plot The Image Using PySTAC

Assets are all available data within an item. We can select an asset and do something on it. But firstly let's see the scene using the thumbnail asset.

1
2
from IPython.display import Image
Image(url=selected_item.assets["thumbnail"].href, width=500)

Image Preview
Figure 1. Image Preview

That's the preview for the whole scene. But how to view to the area of interest within the defined boundary area? For that we need to load the raw data using the load class from odc-stat module. I suggest not to load all the bands cause it will take time, it would be better to load only bands that are required. In this case I loaded four bands as you can see in the selected_bands list variable.

1
2
3
4
selected_bands = ["nir", "red","green","blue"]
data = load([selected_item], bands=selected_bands, bbox=bbox).isel(time=0)
fig, ax = plt.subplots(figsize=(10, 10))
data[["red", "green", "blue"]].to_array().plot.imshow(robust=True, ax=ax)

In the line 3-4, here comes code to display the image for the respected area. I want to view the natural color with bands combination red-green-blue. Figure 2 shows the plot of the image.

Natural Color Image
Figure 2. Natural Color Image

NDVI Calculation Using PySTAC

So far, we have done searching data and view it in a plot. Now let's do a computation by doing a NDVI calculation. For NDVI calculation we need two bands: Red and Near Infrared band. For that we assign for each band with red and nir and perform NDVI calculation using the formula as in line 3 in the following code.  

1
2
3
4
5
6
red = data["red"].astype("float")
nir = data["nir"].astype("float")
ndvi = (nir - red) / (nir + red)

fig, ax = plt.subplots(figsize=(7, 7))
ndvi.plot.imshow(ax=ax, cmap="Greens")

Lastly let's view the result by plotting using the rest of the code.

NDVI Image
Figure 3. NDVI Image

This tutorial covers the essentials of using PySTAC. Throughout this tutorial, we've learned how to connect with a data provider, access its collections, and retrieve available items. We've also explored filtering Sentinel images to identify those with the lowest cloud coverage, plotting the images, and calculating the NDVI.

We hope this tutorial has provided you with a solid starting point for utilizing PySTAC. There's a wealth of additional functionality to explore with PySTAC, so be sure to visit the PySTAC website for further details. Thank you for reading!

Anyway I provide the code in a notebook based on request. If you're interested, please make your request here.


Related Posts

Disqus Comments