How Do You Count Every Solar Panel in the U.S.? Machine Learning and a Billion Satellite Images

Solar power installations have been increasing in both the residential and commercial spaces, but it is nearly impossible to know exact numbers and locations of these panels. As reported by the Department of Energy, since 2008 solar installations across the country have grown from 1.2 gigawatts (GW) to nearly 30 GW today.  Much of this growth is thanks to the falling cost of solar photovoltaic panels, which cost 60 percent less than they did in 2010. 

The cost to install solar has dropped by more than 70 percent since 2010, leading the industry to expand into new markets and deploy thousands of systems nationwide. Prices as of Q3 2018 are at or near their lowest historical level across all market segments. An average-sized residential system has dropped from more than $40,000 in 2010 to nearly $17,000 today, before incentives, while recent utility-scale prices range from $28/MWh - $45/MWh, competitive with all other forms of generation. (Source: Solar Energy Industries Association.)

But knowing the estimated power output of U.S. solar doesn’t provide much in the way of specific data on where these panels are and how many have been installed—especially when it comes to individual residential installations.

Knowing which American homes now have solar panels installed on their roofs, and understanding the reason an individual chose to install solar panels will provide extremely useful information for the energy industry’s management of the changing U.S. electricity system landscape and its transition toward cleaner energy and renewables. This data will also contribute to understanding what barriers prevent homeowners from accessing and using renewable resources—the use of which is a critical component of many solutions for fighting climate change.

Until recently, most working numbers regarding how many solar installations there are in the U.S. have generally been estimates. These imprecise numbers are what DeepSolar was intended to improve—by counting the exact number of rooftop and other photovoltaic panels that can be seen in satellite images.

DeepSolar uses satellite data to locate and count solar panel installations in the U.S. (Image courtesy of DeepSolar.)

The DeepSolar Project, developed by engineers and computer scientists at Stanford University, is a machine learning framework that analyzes a dataset of satellite images in order to identify the size and location of installed solar panels.

To accurately count the panels, the DeepSolar team used a machine learning algorithm to analyze more than a billion high-resolution satellite images. The algorithm identified what the team believes to be almost every solar power installation across the contiguous 48 states.

The DeepSolar analysis reached a total of 1.47 million solar installations in the U.S., a much higher number than either of the two most commonly cited estimates.

“We can use recent advances in machine learning to know where all these assets are, which has been a huge question, and generate insights about where the grid is going and how we can help get it to a more beneficial place,” said Ram Rajagopal, associate professor of civil and environmental engineering, who supervised the project with Arun Majumdar, professor of mechanical engineering.

Finding and Counting Solar Panels Across the Country

The machine learning program used by DeepSolar is based on a model incorporating both image classification to localize the panels, and semantic segmentation to estimate the size of panels—what’s known as a convolutional neural network. They provided the program with a data set containing approximately 370,000 images, each of which covered about 100 ft by 100 ft. Labels on each image indicated whether the image contained a solar panel or not, and DeepSolar’s algorithm used this information to learn how to identify the characteristics indicating the presence of solar panels, such as color, texture and size.

“We don’t actually tell the machine which visual feature is important,” said Jiafan Yu, a doctoral candidate in electrical engineering who built the system with Zhecheng Wang, a doctoral candidate in civil and environmental engineering. “All of these need to be learned by the machine.”

This image shows examples of original satellite images, the corresponding Class Activation Maps (CAMs) and the segmentation results. Such a segmentation method never used ground truth segmentation results for training, but only required ground truth class label ("positive" or "negative") for minimizing classification errors. Therefore, it is "semi-supervised," which is useful when the ground truth segmentation labeling is extremely expensive to acquire, (Image and caption courtesy of DeepSolar.)

As DeepSolar learned to identify the characteristics of panels in the images, the system was eventually able to correctly identify that an image contained solar panels 93 percent of the time. About 10 percent of the time, the system missed that an image had solar installations. According to the author’s report on the project, on both correct identifications and number of missed identifications, DeepSolar proved itself to be more accurate than previous models.

Once the training period was completed and the team knew that DeepSolar could identify most solar panels in images, they moved on to the real task: analyzing the billion satellite images of the U.S. to find solar installations. A project of this scope would usually take human reviewers using existing technologies years to complete, but with the efficiencies offered through machine learning, DeepSolar finished the job in one month.

The database produced by DeepSolar contains residential solar installations, panels on the roofs of businesses, and various large, utility-owned solar power plants.

However, since most solar panel installations are in urban areas, the team had DeepSolar skip sparsely populated areas.  They reasoned that even if some buildings in these rural areas had solar panels, it would be unlikely they would be connected to the power grid. Based on their overall data, the research team estimated that these rural areas would contain only around 5 percent of residential and commercial solar installations.

Stanford scientists found and analyzed 1.47 million solar rooftop installations in the United States, a much higher figure than generally estimated. (Image courtesy of Stanford.)

So, what are the next steps for the DeepSolar project?  Currently, the research team is planning to expand the solar deployment database to include those solar installations that are located in rural areas. They also hope to expand into analyzing solar installations in other countries—basically wherever there is a sufficient data set of high-resolution satellite images available.

Also on the docket are additional features that will be able to calculate details such as the angle and orientation of a solar installation. This could help to accurately estimate the power generation of a given installation. Right now, DeepSolar only measures size as a proxy for potential power output.

Given the relative ease and speed of the machine learning system, DeepSolar can also be used to regularly update the database of U.S. installations with new batches of satellite images. Having this data readily available, and more importantly, current, would prove valuable to efforts to optimize regional and national energy grids and electricity production.

A Census for Solar

Locating solar panels and estimating their energy production is useful, but the DeepSolar team wanted more than just numbers.  By integrating U.S. Census data, and other socioeconomic data, with their solar installation catalogue, the research team was able to identify a number of factors that contributed to—or impeded—solar power adoption.

The team used publicly available socioeconomic data for U.S. Census tracts, which on average cover around 1,700 households—approximately half the size of a ZIP code, and representing 4 percent of a typical county.

Example of map data generated from the DeepSolar database. (Image courtesy of DeepSolar.)

Utility companies, industry regulators and solar panel marketers are a few of the groups that will benefit from the data and insights generated by the DeepSolar project. For example, having access to detailed information about the number of solar panels installed in a given neighborhood can enable local electric utilities to balance the area’s power supply and demand—a key factor for energy reliability.

Utilities can also take this data and determine what number of new solar installations could benefit the area’s power system, and use it as a push to pursue new larger-scale infrastructure projects for solar generation, as well as encouraging individuals to install solar.

In particular, the solar panel inventory highlighted factors that enabled or impeded the deployment of solar panels at the individual homeowner level. For example, they determined that household income levels were an important factor, but only up to around $150,000 per year. Once a household’s income surpassed that level, income alone quickly stopped playing a role in the decision whether or not to install solar panels.

In comparison, households in a low to medium income bracket often don’t install solar power generation even when they are located in areas that have higher long-term profitability, such as regions with high levels of sunshine as well as relatively high electricity rate prices. In these locations, a household’s utility bill savings would be more than the monthly cost of a solar panel system.

This example image of the DeepSolar interactive map shows solar panel distribution by county in the San Francisco Bay Area. (Image courtesy of DeepSolar.)

The data shows that the impediment here is that low- and medium-income households most likely can not afford the initial upfront costs of installation. By knowing this information, solar installers, manufacturers or sales can aim to develop different financial models that make solar panels more accessible to these low and middle income households.

There are a few other interesting conclusions the DeepSolar data offered, such as the fact that once solar penetration in a given neighbourhood reaches a certain level, new installations skyrocket. However, a neighborhood that features significant income inequality won’t see that activator switch on.

Another interesting point was found using geographic data, where the researchers determined a significant threshold for the amount of sunlight a given area needs in order to spur solar adoption.

“We found some insights, but it’s just the tip of the iceberg of what we think other researchers, utilities, solar developers and policymakers can further uncover,” Majumdar said, and it’s why the DeepSolar team has made all the data publicly available on the project’s website. “We are making this public so that others find solar deployment patterns, and build economic and behavioral models.”


For more about solar and other renewable energy technologies, check out these stories:

Architecture 2030: How to Build a Better World

Store Wind and Solar Energy with "Sun in a Box"

Tin-Based Hybrid Perovskite Improves Solar Cell Efficiency