Get to Know Marigold (Part 4): Complex Masking, K-Means Clustering Classification

Written by Descartes Labs | Sep 18, 2024

Welcome to Marigold! This is part 4 of our Get to Know Marigold blog series with helpful videos. Previous blog series can be found here. Part 1: Functionality and Processing Tools | Part 2: Introduction to the Bare Earth Composites | Part 3: Vegetation and Water Masking

In the following video snippets, we will walk through how to use K-Means clustering to mask out areas you would like to remove from your imagery using the Descartes Labs’ Marigold software built-in capabilities.

Video: Pre-Masking Vegetation

When using an unsupervised classification like K-Means clustering in areas with complex surface cover, the first areas identified are often the groups of pixels that are most spectrally and statistically different from the rest of the data. Multiple stages of masking make it easier to identify and mask more subtle features. In this video, we use data that we previously masked for vegetation in another training video and apply a new shadow mask created through K-Means Clustering.

If you need a refresher on creating and applying Vegetation Masks, watch our training video on Vegetation Masking for those instructions.

K-Means Diagram

K-Means clustering is an unsupervised classification algorithm that segments the input dataset into a "K" number of different clusters. From these clusters, areas that require masking can be identified, and then turned into mask layers to apply to datasets.

While the example in this video focuses on shadow, this technique can also be applied to areas of:

No data
Vegetation
Water
Snow or Ice
Shadow
Cloud or cloud shadows
Non-bedrock surficial materials (i.e. dunes)

Video: K-Means Generation

In this example, areas of topographic shadow are visibly obvious in the Sentinel-2 Bare Earth Composite. Shadows are areas that have very low reflectance values and therefore have a very different spectral response to the rest of the data. This makes it a perfect target for an unsupervised classification.

To create a K-Means cluster of this image:

Go to the Classification dropdown in the Processing Toolbox and select “K-Means clustering.”
Select your dataset of choice, in this case the Descartes Labs’ Sentinel-2 Bare Earth Composite to which we previously applied a Vegetation mask.
Next, select the wavelength ranges you want to use in this classification. Here, we are selecting all of the bands in the visible-to-near infrared and shortwave infrared.
Select an area of interest, which is the area that this classification will be calculated over. By default, it will be calculated over the viewport but you could instead upload or draw a vector.
Next, select the number of K-values you require. This is the range of distinct groups the algorithm will separate the data into. As a standard, the range from two to ten should be sufficient, but you can adjust these until the algorithm outputs are what you want. You will be able to adjust the number of clusters on the next screen.
Click “Train Model.” The algorithm then groups the pixels into the number of cluster groups selected in your range. The default number of clusters displayed is halfway between the set range.
Scroll down and adjust the slider to increase or decrease the number of clusters created from your classification. In this example, the 5 cluster output maps shadows well in a purple color.
If needed, you can turn on and off the layer visibility, using the eye icon, to compare the created clusters to the original imagery.
Once satisfied, provide a name for your layer and complete your model by clicking “Run Model”.

Video: Pixel Inspector

Next, identify the number for the cluster group that you will turn into a mask, in this case the purple shadows.

Toggle on the pixel inspector tool in the top navigation. This tool shows data values from individual pixels and layers, and also provides lat/long coordinates for a given location.
Using the pixel inspector, click on a purple area in the map, classified as shadow. Note down the value shown in the Pixel Inspector information box, in this case cluster group four.
It’s recommended to toggle off the Pixel Inspector whenever you’re not using it to optimize Marigold’s performance speed.

Tip: Toggle off the Pixel Inspector to optimize Marigold’s performance speed.

Video: Creating a Mask

Now you can create a shadow mask:

Go to the “Band algebra” dropdown in the Processing Toolbox and select the “Raster Calculator” tool.
Select your k-means clusters output layer as your product.
Then select the “classes” band. Under “Operations,” click the “double equal signs” button and then the number of the cluster group previously identified, in this example, cluster four.
Provide a name for your layer and click “Calculate.” This will generate your mask.

Video: Applying a Mask

To apply this mask to a dataset of your choice, select the Apply Mask tool in your Processing Toolbox.

Then, choose the dataset you wish to mask, in this case the Sentinel-2 Bare Earth Composite data that we masked for vegetation in a previous training video.

Tip: Applying multiple masks to the same dataset will help to reduce multiple potential sources of false positives.

Next, select the mask you wish to apply to that data, in this case the shadow mask that we created. Provide the output with a name, and then click Mask Layer. By turning off other layers and changing the basemap, you will now see where your data has been masked, in this example removing the shadows from the imagery.

K-Means Tips

For any additional stages of masking using K-Means, we recommend that you re-run your K-Means clustering on the masked out data, following the previous steps in this video. This will identify more subtle groups for masking, since more statistically-significant pixels and groups have already been removed from the imagery.

Tip: Re-run your K-Means clustering on the masked out data to identify more subtle groups for masking.

You now know how to use an unsupervised classification technique to mask out shadows as well as other surface cover types. Once you have satisfactorily masked your data, you can use Marigold’s processing tools to identify lithologies, alteration, and mineralization with even greater confidence.

In the next Get to Know Marigold blog, we'll show you how to create a series of standard multispectral derived products in Marigold.

View full post