A first-of-its-kind global inventory of photovoltaic solar facilities built using satellite imagery and machine learning.

Access the original article in Nature and explore the full dataset on ResourceWatch from WRI.

It's become increasingly clear that the world must transition to a low-carbon economy as quickly as possible to prevent the worst impacts of climate change. While this remains a monumental task, the recent 2022 IPCC report shows signs of progress - the annual increase in global emissions is slowing [e.g., Reuters], and countries are already starting to act with more than 80% of new electricity capacity coming from renewables, with 91% of that from solar and wind [e.g., Irena]. Moreover, the movement to decarbonize the world's economy has evolved from fringe to mainstream, with 80% of the world's economy now under some form of decarbonization commitments [e.g., Oxford report].

While there are no silver bullets, advances in technology play a critical role. Until now, it has been infeasible to track where all of these new solar plants are located. Due to advances in analyzing satellite imagery with machine learning, it is now possible to map infrastructure, climate impacts, and trends across the globe to bring new levels of transparency and actionable information.

A solar genesis

Global Inventory of Solar Energy Installations
The resulting dataset on the Global Inventory of Solar Energy Installations is publicly available on ResourceWatch.

In 2018, Kyle Story and Lucas Kruitwagen met at the Stanford Natural Capital symposium. We had both been thinking about the same question - what if you could map all of the solar facilities in the world?

But first, why is this important, and why should anyone care? It’s clear that solar energy will be a key component of the renewable energy systems that will replace the current fossil-fuel intensive sources. Photovoltaic (PV) energy generating capacity has grown more than 40% per year since 2009 and is projected to increase nearly ten-fold by 20401. Building this enormous increase in PV capacity comes with many tradeoffs - how much capacity has been built, where should new facilities be sited, what are the potential impacts on biodiversity and land protection priorities, and what can be done to mitigate climate change risks? These are inherently spatial questions, requiring you to know the exact locations of solar facilities.

Currently available inventories cannot fully address these needs. Most inventories provide total generating capacity without explicit locations (e.g. IEA, IRENA, or BP), and those that have facility locations are generally self-reported and therefore incomplete (i.e. WRI’s GPPD, OpenPV US), expensive to update, and often proprietary (IHS’ Electric Plants). So we asked ourselves, what if you could use satellite imagery and machine learning to simply map all of the solar facilities in the world?

That’s exactly what we set out to do.

Mapping global solar with machine learning

Solar_PV_facilities_remotesensing_ML
Machine learning to detect solar panels. Each row shows a different example location. The columns are: (1) Sentinel-2 image; (2) Airbus SPOT image; (3) Sentinel-2 model prediction map; (4) SPOT model prediction map; (5) facility footprint polygons in the final database.

Here’s how we developed a machine learning pipeline to map solar facilities in satellite imagery.

The first critical choice was what satellite imagery to use. We chose to work with two sources: Airbus SPOT and Sentinel 2. Airbus SPOT provides high spatial resolution 4-band imagery at 1.5m per pixel, which is sufficient to see the pattern of solar panels laid out in arrays. The high spatial resolution allowed us to map accurate footprints for solar facilities, which is important because the generating capacity is directly related to the panel collecting area.

We also used imagery from ESA’s Sentinel-2 satellite (S2). S2 has a 12-band camera and takes a new picture of each point on the earth roughly every 5 days. We found that the more complete spectral information was enough to still reliably find solar facilities, and the continuously updating imagery meant that we could find even the most recently installed facilities - which were often missing from older SPOT imagery. Additionally, once we found a solar facility, we could look back in time through all of the images taken by S2 to determine when the facility was constructed.

The second critical choice was what type of model to use. Machine learning is the only feasible way we know to analyze data at this scale. We wanted to map the footprint of each facility, not just detect the locations, so we chose a semantic segmentation model approach where the model takes in a satellite image and outputs a full prediction map - the likelihood that each pixel is in a solar facility. Modern deep learning models like convolutional neural networks have consistently shown state-of-the-art performance for image analysis tasks, including mapping solar panels2. After some experimentation, we chose to use UNet architectures.

The third critical choice was training data. We used extensive data available on Open Street Maps (OSM) as a starting point. The OSM data was primarily from North America and Europe but was lacking in Asia. To ensure we were able to map solar facilities worldwide, we also hand-labeled a significant number of facilities in China and other Asian countries.

We then trained separate UNet models for SPOT and for S2 imagery.

Machine learning pipeline

Real-world applications are rarely accomplished with a single model, and mapping global solar facilities was no exception. An iterative experimentation process led us to develop a multi-step analysis pipeline shown in the figure below. We split the process into two steps, a global initial search, then a series of steps to filter for true detections. The pipeline had separate branches for Sentinel-2 and for SPOT imagery, with final detections merged at the last step. In the first step, we used the UNets for each imagery source to search the globe3 for solar facility candidates. The resulting set had a lot of false positives, so we devised a set of Machine Learning-based filtering steps to separate out the true detections.

The SPOT results were filtered using a trained ResNet image classifier, and the S2 results were filtered using an RNN. At this stage (“RNN-2”), we measured the installation date of each candidate by deploying a second UNet and RNN model over the entire back-catalog of S2 imagery. At some point, you have to stop iterating on the models, so at this stage, the two of us looked through all of the remaining candidates by hand to produce a completely pure dataset of only verified facilities. Was this fun? No - It took us tens of hours of late-night manual inspection. But given we’d just used ML to search the entire globe, in retrospect this was actually pretty manageable. We processed the confirmed detections into polygon footprints and finally merged the datasets from each pipeline branch into a final master dataset.

We deployed this pipeline on the Descartes Labs geospatial platform. This required a computational feat of strength. The SPOT global search processed ~170TB of imagery in 4.5 continuous days using a cluster of 175 GPU nodes in the cloud. The Sentinel-2 pipeline processed ~380TB of data over approximately 2 months in real-time, using around 1 million CPU-hours. These are massive computations, no doubt.

The fact that it is feasible to search the entire globe in a matter of days changes the possibilities - and should change how we approach global climate solutions in the future.

ML Pipeline_solar
Machine learning pipeline for detecting solar facilities

Results

Aggregated arrangement of the global dataset
Our aggregated global solar dataset, color-coded by installation date.

The result was a first-of-its-kind global dataset of solar facilities. We located 68,661 facilities, which is 432% more than previously best-available datasets. We enriched this dataset by including installation dates, identifying the land-cover class that the facility was installed on, and matching to existing asset-level databases. This dataset is now publicly available on WRI’s Resource Watch.

Pre-existing land cover for new solar PV installations
Our aggregated global dataset, color-coded by land-cover class where the facilities were installed. The lower panel includes: (b) time series of installations; (c) distribution of installation size by land cover type; local bias (d) and global bias (e) between PV land cover and local/global land cover class - see paper for details.

Parting Thoughts

This dataset is important for managing the role of solar energy in the transition to decarbonizing the global economy. Spatial data like this is needed to accelerate every aspect of the energy transition - from mitigating the intermittency of renewables with generation now-casting and forecasting; to adapting power grids to emerging climate change risks; to evaluating the efficacy of policy interventions to drive the deployment of renewables faster.

Just as important for us is that this is a demonstration of what’s possible - mapping at a global scale. Climate change is a global challenge, and with this technology, we’ve shown that it is feasible to map and monitor infrastructure like this across the entire globe. In the coming years, humankind must and will develop a much more complete understanding of how our economic activities impact and are impacted by changes in the natural world. We are optimistic and excited to see how technology like this will drive forward climate solutions in the future!


Footnotes

1 See, e.g.: International Energy Agency. World Energy Outlook 2018. Tech. Rep., Paris, France (2018). International Renewable Energy Agency. Renewable capacity statistics 2019. Tech. Rep., Abu Dhabi (2019).

2 There are numerous papers mapping solar panels including:

  • Yu, J., Wang, Z., Majumdar, A. & Rajagopal, R. Deepsolar: A machine learning framework to efficiently construct a solar deployment database in the united states. Joule 2, 2605 – 2617 (2018)
  • Hou, X. et al. Solarnet: A deep learning framework to map solar plants in china from satellite imagery. In Climate Change AI Workshop, ICLR2020 (ICLR, 2020)
  • Imamoglu, N., Kimura, M., Miyamoto, H., Fujita, A. & Nakamura, R. Solar power plant detection on multi-spectral satellite imagery using weakly-supervised cnn with feedback features and m-pcnn fusion. arXiv preprint arXiv:1704.06410 (2017)
  • Malof, J. M., Bradbury, K., Collins, L. M. & Newell, R. G. Automatic detection of solar photovoltaic arrays in high-resolution aerial imagery. Applied Energy 183, 229-240 (2016).
  • Camilo, J. A., Wang, R., Collins, L. M., Bradbury, K. & Malof, J. M. Application of a semantic segmentation convolutional neural network for accurate automatic detection and mapping of solar photovoltaic arrays in aerial imagery. CoRR abs/1801.04018 (2018).

3 Assuming that installations will be reasonably close to human populations, we defined the search area by dilating a global human population-density map. See paper for details.