Hot spot analysis is a great tool that allows us to pinpoint the location of clustering and dispersion in our data. This is especially helpful when we are dealing with lots of data incidents, such as crime data over time, where many incidents overlap one another, making it difficult to visually determine exactly where the “hot” and “cold” spots are in our data. It is also useful for temporal analysis, helping us determine seasonal locational shifts in the data being examined.
For this workshop, we will be using crime data downloaded from the Los
Angeles Data Portal:
For the purposes of the workshop, the data has been cleaned up, divided into separate layers per year, and converted into a geodatabase. Download the following class data to your local drive.
[TBS_ALERT color=”info” heading=”Is your data projected?”]
Geoprocessing should always be conducted with projected data. If your data is not projected, ie, it is in a geographic coordinate system (with coordinates in decimal degrees), make sure to project your data first. The data in this tutorial was originally downloaded from the LA Open Data Portal, with crime incidents recorded in decimal degrees (latitude and longitude degrees). This data was then projected to UTM N Zone 11 to conform to a preferred projection for data in Los Angeles. If you are using your own data, make sure to project it to the region your data belongs to.
[/TBS_ALERT]
Setting up your project
Through the process of this workshop, you will be creating many new data layers. It is always good practice to designate a path to the geodatabase that you will use to store the layers.
Go to File, Map Document Properties…
Change the default geodatabase by finding the path to the workshop geodatabase you just downloaded.
Also click on the checkbox next to Store relative pathnames to data sources
We will also be performing various geoprocessing tasks. In order to make it easy for us to interpret our results in real time, let’s “disable” background processing of our geoprocessing tasks.
Click on Geoprocessing from the menu, and go to Geoprocessing options
Make sure that Background Processing is unchecked
Is there clustering?
In order to begin hot spot analysis, we must first determine whether, statistically speaking, there is clustering evident in our data. One approach to do so is to run our data through the Spatial Autocorrelation (Global Moran’s I). This tool helps us determine whether or not our data is randomly distributed or not. In other words, what are the chances that the incidents in your data are located where they are randomly? Or perhaps, are there certain incidents located closer to other incidents? And what may explain this clustering? Let’s find out if the data we will use in this class (crime data in Los Angeles) has evidence of clustering or dispersion.
Choose a neighborhood
We could perform our hot spot analysis on the entire dataset, but two reasons prevent us from doing so. One, the data is very big (hundreds of thousands of records), and performing large scale statistical analysis on this amount of data will be very time consuming. Second, the scale is too big, meaning, that we would not get much variation at the local level. For the purposes of this tutorial, we will work at the neighborhood level, to allow us to see hot spots within individual neighborhood. Let’s begin by load the following layers to our map:
lacity_neighborhoods
la_crime_2010
Once loaded, turn off the crime data for better efficiency and visibility (it’s a huge dataset, so only turn it on when necessary). Next, select a neighborhood to analyze. For example, to choose Downtown:
Click on the select tool
Click on the Downtown polygon on the map
Select all the crime incidents that occurred within the downtown boundaries.
Go to Selection, Select by Location, and enter the following information:
Turn on the crime layer. You should see the crime incidents inside the downtown polygon selected. In the table of contents, right click on la_crime_2010 and Export Data:
Export the data into your geodatabase:
Click “yes” to add the data to the map
Count overlapping incidents
The LAPD records arrest locations to the closest intersection of where it occurred. What this means is that many incidents that happen close-by are visually stacked on top of one another, appearing as a single point. In order to provide an aggregate of overlapping points, let’s run the Collect Events tool.
Then expand Current Session, Spatial Autocorrelation, and double click on the Report File
[TBS_ALERT color=”info” heading=”What do the results tell us?”]
[/TBS_ALERT]
Hot Spot Analysis
Now that we have determined that there is, indeed, statistically significant spatial clustering in our data, let’s find out where there are hot spots and cold spots in our data. Hot spots are areas that show statistically higher tendencies to cluster spatially. This is determined by looking at each incident within the context of neighboring features. In other words, a single point with high values isn’t necessarily a hot spot. It becomes a hot spot only when its neighbors also have high values.
Let’s run the hot spot analysis tool on our downtown crime data. In your Spatial Statistics Tools, expand Mapping Clusters, and double click on Optimized Hot Spot Analysis.
In the pop up window, select downtown_crime_2010, and make sure that COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS is selected. This will create a bunch of grid cells wherever there are incidents of crime present.
Nice! We have now converted our overlapping incidents into color coded grid cells.
Notice the legend for our results.
Also open the attribute table. Right click on downtown_crime_2010_HotSpot, and Open Attribute Table
The table represents each displayed cell.
JOIN_COUNT tells us how many incidents fall within a cell
GiZScore tells us the Z score (positive values indicates it is proportionally above the values for the entire dataset)
GiPValue gives us the P value
NNeighbors is the number of neighboring incidents that it has taken into account to compare to the sum of the entire dataset
Gi_Bin gives us a number that is associated to the confidence level displayed on the map
Finally, let’s label the grid cells with the JOIN_COUNT to give us an idea of why certain areas are hot, and others are cold.
Right click on downtown_crime_2010_HotSpot, click on Properties, and click on the Labels tab.
Check the box to label the features, and choose JOIN_COUNT for the label field
[TBS_ALERT color=”info” heading=”What do these numbers tell us?”]
Spatial Statistics: Hot spot analysis
Hot spot analysis is a great tool that allows us to pinpoint the location of clustering and dispersion in our data. This is especially helpful when we are dealing with lots of data incidents, such as crime data over time, where many incidents overlap one another, making it difficult to visually determine exactly where the “hot” and “cold” spots are in our data. It is also useful for temporal analysis, helping us determine seasonal locational shifts in the data being examined.
For this workshop, we will be using crime data downloaded from the Los
Angeles Data Portal:
For the purposes of the workshop, the data has been cleaned up, divided into separate layers per year, and converted into a geodatabase. Download the following class data to your local drive.
[TBS_ALERT color=”info” heading=”Is your data projected?”]
Geoprocessing should always be conducted with projected data. If your data is not projected, ie, it is in a geographic coordinate system (with coordinates in decimal degrees), make sure to project your data first. The data in this tutorial was originally downloaded from the LA Open Data Portal, with crime incidents recorded in decimal degrees (latitude and longitude degrees). This data was then projected to UTM N Zone 11 to conform to a preferred projection for data in Los Angeles. If you are using your own data, make sure to project it to the region your data belongs to.
[/TBS_ALERT]
Setting up your project
Through the process of this workshop, you will be creating many new data layers. It is always good practice to designate a path to the geodatabase that you will use to store the layers.
We will also be performing various geoprocessing tasks. In order to make it easy for us to interpret our results in real time, let’s “disable” background processing of our geoprocessing tasks.
Is there clustering?
In order to begin hot spot analysis, we must first determine whether, statistically speaking, there is clustering evident in our data. One approach to do so is to run our data through the Spatial Autocorrelation (Global Moran’s I). This tool helps us determine whether or not our data is randomly distributed or not. In other words, what are the chances that the incidents in your data are located where they are randomly? Or perhaps, are there certain incidents located closer to other incidents? And what may explain this clustering? Let’s find out if the data we will use in this class (crime data in Los Angeles) has evidence of clustering or dispersion.
Choose a neighborhood
We could perform our hot spot analysis on the entire dataset, but two reasons prevent us from doing so. One, the data is very big (hundreds of thousands of records), and performing large scale statistical analysis on this amount of data will be very time consuming. Second, the scale is too big, meaning, that we would not get much variation at the local level. For the purposes of this tutorial, we will work at the neighborhood level, to allow us to see hot spots within individual neighborhood. Let’s begin by load the following layers to our map:
Once loaded, turn off the crime data for better efficiency and visibility (it’s a huge dataset, so only turn it on when necessary). Next, select a neighborhood to analyze. For example, to choose Downtown:
Select all the crime incidents that occurred within the downtown boundaries.
Count overlapping incidents
The LAPD records arrest locations to the closest intersection of where it occurred. What this means is that many incidents that happen close-by are visually stacked on top of one another, appearing as a single point. In order to provide an aggregate of overlapping points, let’s run the Collect Events tool.
Determine whether there is clustering
[TBS_ALERT color=”info” heading=”What do the results tell us?”]
[/TBS_ALERT]
Hot Spot Analysis
Now that we have determined that there is, indeed, statistically significant spatial clustering in our data, let’s find out where there are hot spots and cold spots in our data. Hot spots are areas that show statistically higher tendencies to cluster spatially. This is determined by looking at each incident within the context of neighboring features. In other words, a single point with high values isn’t necessarily a hot spot. It becomes a hot spot only when its neighbors also have high values.
Let’s run the hot spot analysis tool on our downtown crime data. In your Spatial Statistics Tools, expand Mapping Clusters, and double click on Optimized Hot Spot Analysis.
In the pop up window, select downtown_crime_2010, and make sure that COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS is selected. This will create a bunch of grid cells wherever there are incidents of crime present.
Nice! We have now converted our overlapping incidents into color coded grid cells.
Notice the legend for our results.
Also open the attribute table. Right click on downtown_crime_2010_HotSpot, and Open Attribute Table
The table represents each displayed cell.
Finally, let’s label the grid cells with the JOIN_COUNT to give us an idea of why certain areas are hot, and others are cold.
[TBS_ALERT color=”info” heading=”What do these numbers tell us?”]
[/TBS_ALERT]
Resources
Data Sources