Static Maps and Spatial Join with GeoPandas

The case study

My first major mapping project at the University of Chicago is a choropleth map of Sri Lankan Army (SLA) officer deaths between 1981 and 1999. The data is collected painstakingly by my colleague, Winston Berg. I am deeply grateful to Professor Paul Staniland, who made the project possible by hiring me. Any mistakes remain my own. You can see a copy of the codes here and the final products on my GitHub repository.

The data

The data table includes the names of army officers who died as well as their place of death, which may be a village or city. In order to assign each place of death a larger geographical category — province, district, or division — we must attach a longitude and a latitude to each place in the dataset. Thankfully, GeoPandas has just the function.

import geopandas as gpd 
import pandas as pd 
df1=gpd.tools.geocode(['Ampara', 'Anuradhapura', 'Badulla', 'Batticaloa', 'Colombo', '...']) 
df1['geometry'] = df1['geometry'].astype(str) df2 = df1['geometry'].str.strip('POINT ( )').str.split(' ', expand=True).rename(columns={0:'Longitude',1:'Latitude'}) 
df3=df1.merge(df2, left_index=True, right_index=True) 
df3.to_csv("out.csv")

“out.csv” describes deaths data in which every place name is matched with a longitude and latitude.

Base Map Shapefile

Go to http://www.diva-gis.org/gdata and download the shapefile for Sri Lanka. Province level, district level, and administrative division levels are available. A later blog will explain how to make shapefiles based on existing ones.

Get mapping!

Now that we have both the data and the shapefile base map, we can start mapping! In this example, I use Sri Lankan districts.

Open a jupyter notebook in your working directory and import the following

%matplotlib inline
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

Read your shapefile into a geo-dataframe:

geo_df = gpd.read_file("data1/LKA_adm1.shp")
geo_df.set_index(geo_df["ID_1"].astype(int), inplace = True)
geo_df.head()

Plot it:

geo_df.plot()

Make a new column called coordinates, which is based on the “geometry” column:

geo_df['coords'] = geo_df['geometry'].apply(lambda x: x.representative_point().coords[:])
geo_df['coords'] = [coords[0] for coords in geo_df['coords']]

Label the districts:

print(geo_df.crs)

# try 2163 (albers), 3857 (web), 4269 (plate)
ax = geo_df.to_crs(epsg=4269).plot()
ax.set_axis_off()
for idx, row in geo_df.iterrows():
  plt.annotate(s=row['NAME_1'], xy=row['coords'],
  horizontalalignment='center')

Spatial join

To make a heat/choropleth map, we have to count how many deaths there are per district. For this, we use GeoPanda’s spatial join function. The spatial join function matches points in the deaths dataset to the polygons in the geo-dataframe. Here, I use “within”. You can also use “contains ” or  “intersects”.

from shapely.geometry import Point

deaths_df = pd.read_excel("SLA_RoH_Deng.xlsx", usecols = [8, 9])
deaths_df.dropna(inplace = True)

geometry = [Point(xy) for xy in zip(deaths_df.Longitude, deaths_df.Latitude)]
deaths_coords = gpd.GeoDataFrame(deaths_df, crs = geo_df.crs, geometry=geometry)

located_deaths = gpd.sjoin(deaths_coords, geo_df, how = 'left', op = 'within')
located_deaths = located_deaths.dropna(axis=1, how='all')

Plot it:

located_deaths.plot()

Rename the spatially joined table so the column heads make sense. Next, count and sum up the deaths in each district. Save it as an Excel sheet and check for formatting and errors:

located_deaths.rename(columns = {"NAME_1" : "Districts", "index_right" : "deaths_per_district"}, inplace = True)

deaths_geo_count = located_deaths.groupby("Districts").count()[["deaths_per_district"]]
deaths_geo_count.to_excel("deaths_geo_count.xlsx")

Import the Excel sheet and draw a bar graph with Pandas:

deaths_geo_count = pd.read_excel("deaths_geo_count.xlsx")
deaths_geo_count.set_index("Districts")["deaths_per_district"].sort_values(ascending = False).plot(kind = "bar", figsize = (15, 3))

Rename the column in the geo-dataframe so it matches with the deaths dataframe:

geo_df.rename(columns = {"NAME_1" : "Districts"}, inplace = True)

Merge the geo-dataframe with the deaths dataframe:

geo_merge = geo_df.merge(deaths_geo_count, on='Districts', how='left')
geo_merge
col_name =geo_merge.columns[0]
geo_merge.head()

Produce the map. The default for “scheme” is quantiles. You can also use “equal_interval” or “fisher_jenks”. Fisher_jenks sets categories by minimizing in-group variance and maximizing inter-group variance.

ft = "deaths_per_district"

plate = geo_merge.to_crs(epsg=4269)
ax = plate.plot(column = ft, scheme = "quantiles", k = 9, cmap = "Reds", legend = True,
 alpha = 0.7, linewidth = 0.5, figsize = (50, 25))

ax.set_title("Sri Lankan Army Fatalities by District", fontsize = 30)
ax.set_axis_off()
for idx, row in geo_df.iterrows():
    plt.annotate(s=row['Districts'], xy=row['coords'],
    horizontalalignment='center')

Your final product would look something like this:

download

“Alpha” in the codes above changes the shade of colors. Of course, you can also count the deaths by year and create a series of maps like this:

SLA_deaths

Happy mapping!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s