Hot spot analyis of water bodies in Madison County
Problem: You would like to determine where spatial features have unusually high or low concentrations
At the beginning of October, I will be moving to Huntsville, Alabama aka “Rocket City.” I have been thinking a lot about the city that is going to be my new home lately, and I have also been learning a lot about spatial statistics.
The city of Huntsville falls mostly within the county of Madison. The Census Bureau provides county-wide water body layers, as well as the area of water in each census block.

I thought I could use these data layers to help me decide where I should look for a house if I want to live near a lot of water. I am going to do Hot Spot Analysis on both of the layers.
Note: Hot Spot analysis is better suited for use with cultural, biologic or economic data, rather than physical landscape features. But I am working with what GIS files I could find, and I thought it would be interesting nonetheless.
Before proceeding, I need to project my layers out of geographic coordinates. I chose
NAD 1983 StatePlane Alabama East because that is used by the City of Huntsville GIS Department.
Let’s start with the water bodies. I am going to get a count of how many water bodies fall within each census block. To do this, first I converted the layer to points through ArcToolbox’s Feature to Point tool. Then, I right-clicked on the census blocks layer to do a spatial join with the water body point locations. This function will add a Count_ field to the attribute table of the output.

Next I ran hot spot analysis on that Count_ field. In 9.3, this is in ArcToolbox under Spatial Statistics Tools –> Mapping Clusters.

In the result, the red “hot” zones show where lots of different water bodies are clustered close together. The beige zones are average and the blue “cool” zones show where the water bodies are dispersed especially far away.

Now I will run the same analysis on the area of water within each census block. If you look at the attribute table, you’ll see that we’re given the land area and the water area. Those numbers aren’t very useful the way they are. I would rather have a percentage of the block area that is water. So I added another field: TotArea, that is the sum of the land area and water area. Then I added a Ratio field, that is the water area divided by the total area.

Hot spot analysis of this ratio field produces a map that shows where blocks with a high percentage of area covered by water are clustered close together.

Not surprisingly, the hottest areas are near the Tennessee River. A bit of a different story than the first map, which highlighted areas with lots of little lakes.
In these first runs, I let ESRI take the defaults for my parameters. I can improve my results if I pick the search distance on my own. The Spatial Statistics Tools –> Analyzing Patterns toolset has two tools that I used for this purpose.
The first: Spatial Autocorreclation (Morans I) will determine the correct scale for analysis by selecting the distance at which the factors promoting clustering are most pronounced. The tool takes a distance value as input and returns a z score as output. The absolute value of this z score indicates the amount of clustering. We want to find the distance that produces the z score farthest away from zero (either positive or negative).
I ran the tool a bunch of times, using distances from 0.25 – 1 mile in increments of 1/4 mile.

I found that 0.75 miles (3960 feet) produced the largest z score for the water area layer.
Note: A z score between -1.96 and +1.96 indicates a random distribution of features, which can be explained by the null hypothesis. Hot spot analysis is useless in this case, because there is no pattern to analyze. Any score outside that range indicates an unusually clustered or dispersed distribution of features that is influenced by some factor (or factors) other than random chance.

I knew to stop incrementing the distance when my z scores peaked and then started going back down. When I tried the Spatial Autocorrelation tool on the water count layer, this didn’t happen. The z score just kept going up. So I switched to the Multi-Distance Spatial Cluster Analysis (Ripleys K) tool.
I tried a bunch different settings on this tool, and got the clearest results with the ones below. Note I used the water body centerpoints as my input and employed a boundary correction method to reduce errors at the edge of the study area.
The tool produces a Diff K value. Like the z score, it indicates the amount of clustering, and we are looking for the largest number.

My best distance would be 7920 feet (1.5 miles) where Diff = 1763.49. Using these distances in the “Distance Band or Threshold Distance” box in the Hot Spot analysis tool produces maps that are more statistically meaningful.
This map is a merge of the hot and cold spots I generated for both the water body count and percent water area, using the best threshold distances in each case. It is telling me to move downtown if I want to avoid mosquitoes and to the outskirts if I want natural beauty.
Tags: cluster, hot spot, spatial autocorrelation, statistics

