How accurate is satellite data for estimating population where censuses can’t reach? We benchmark satellite-derived models in Colombia—and reveals when machine learning falls short and why Bayesian models matter.
In a world where more than a quarter of countries haven’t conducted a census in over 10 years, the need for alternative ways to estimate population is urgent. From conflict zones to remote rainforests, reliable data is often out of reach—but decisions on healthcare, education, disaster relief, and infrastructure still need to be made.
Our new study, published in Population, Space and Place, explores one of the most promising alternatives: using satellite imagery to estimate population counts in data-scarce regions.
Traditional censuses are expensive, logistically complex, and sometimes simply not possible—especially in areas affected by violence, displacement, or lack of infrastructure. In such contexts, satellite imagery has become an increasingly attractive solution. But how accurate is it?
To find out, we used the 2018 Colombian census—one of the most complete and detailed in the world—as a testing ground. We compared different satellite-derived settlement maps and modeling approaches to see how well they predicted actual population counts.
We compared:
Six settlement maps, including building footprints from Google and Microsoft, and pixel-based “built-up area” maps.
Two modeling approaches:
A Bayesian probabilistic model, which can incorporate uncertainty and adjust for bias.
A random forest machine learning model, commonly used for pattern recognition in data-rich settings.
Building footprints are best: Maps that show individual buildings (like those from Google and Microsoft) were the most accurate for estimating population, especially in urban areas.
Bayesian models win in tough settings: When data was sparse, biased, or incomplete—as it often is in remote or forested regions—Bayesian models outperformed machine learning.
Aggregated results are more reliable: Predictions were more accurate at larger spatial scales (e.g., municipalities) than at fine-grained levels like individual neighborhoods.
Remote regions are still hard: Accuracy dropped significantly in regions like the Amazon and Pacific coast, where buildings are harder to detect from above.
Take home message: In data-scarce settings, we can’t rely on standard algorithms alone—probabilistic models are essential to correct for bias and uncertainty.
As more organizations rely on satellite data for planning and policy, our research highlights some critical lessons:
Open, high-resolution building data is essential.
Statistical models need to be adapted for local realities, especially in contexts where conventional data is lacking or biased.
Ground-truthing with even small samples can dramatically improve model performance.
Want to dive deeper? Read the full paper
For attribution, please cite this work as
Darin (2025, Aug. 1). Meet Edith: Can Satellites Replace Censuses? What We Learned from Colombia. Retrieved from https://edarin.github.io/thatsme/posts/2025-08-01-can-satellites-replace-censuses-what-we-learned-from-colombia/
BibTeX citation
@misc{darin2025can, author = {Darin, Edith}, title = {Meet Edith: Can Satellites Replace Censuses? What We Learned from Colombia}, url = {https://edarin.github.io/thatsme/posts/2025-08-01-can-satellites-replace-censuses-what-we-learned-from-colombia/}, year = {2025} }