Spatializing Data with R
While WYSIWYG software packages like ArcGIS and QGIS remain a standard in cartographic representations, open source, scripting solutions have always had a strong cadre of support, especially for advanced spatial research initiatives. With a development history that harks back to the late 1990s, R has been championed as the free and open source programming tool for statisticians the world over. With its growth and foray into data viz, more and more researchers are adopting R as a solution for their spatial needs, due to its scripting capability, and strong statistical integration.
This workshop presumes that you have a basic knowledge on spatial research, and follows up on the Getting Started Spatial Research workshop. Data can be obtained from the LA Data Portal following this tutorial.
You can also use any of the data here:
- Los Angeles Arrests in September 2019
- Stolen Bikes in Los Angeles 2019 (source)
- Child Abuse 2019 (source)
R + RStudio
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio. Follow these instructions to ensure that you have the proper environment to get started with your R-based research.
Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select “Run as administrator” instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.
Launch RStudio to get started. Use the console to follow the commands throughout this workshop. Start with the installation of the “tmap” package. Note that this may take up to 5 minutes to install (yes, there are lots of dependencies here):
Now that the tmap package is installed, add the following libraries to your current session. Note that “sf” is the library that stands for “Simple Features” which allows data to be spatialized for R.
Load the data. Data can be obtained from the LA Data Portal following this tutorial, or you can download sample datasets from the links provided in the Data section above. Once downloaded, rename the data file to “arrests.csv,” and take note of its path in your computer. Enter the following command to assign a variable “arrests” to your data and change the path accordingly.
arrests <- read.csv("~/Downloads/gis/arrests.csv")
Windows example (use double back slashes):
arrests <- read.csv("C:\\Users\\yohman\\Downloads\\arrests.csv")
Check your environment tab to see your data, and make sure the “lat” and “lon” columns exist:
Convert “arrests” into a “simple feature” – a spatial geometry format that is tmap readable – by identifying the longitude and latitude columns, and create a new variable “arrests_sf.” Also, set the projection (crs) to 4326, which stands for WGS84, a geographic coordinate system that reads decimal degrees globally:
arrests_sf <- st_as_sf(arrests, coords = c("Lon", "Lat"), crs = 4326)
Visualize the tabular data and take notice on the new “geometry” column:
Now starts the tmap magic. With just one command “qtm” which stands for “quick thematic map plot,” you can instantly output a map of your data
Let’s change the map into an interactive leaflet map.
Change it back to the original view:
To give our data spatial bearing, let’s bring in a layer for Los Angeles Council Districts. You can download it in SHP (shapefile) format from the LA Times data portal. Save the file and create a variable for the data.
la_council_districts <- read_sf("C:\\Users\\yohman\\Downloads\\l.a. city council district (2012).shp")
Visualize the council districts.
qtm(la_council_districts, borders = 'gray',fill=NULL)
Visualize arrests and districts together.
qtm(arrests_sf)+ qtm(la_council_districts, borders = 'gray',fill=NULL)
Color dots by charge description.
qtm(arrests_sf, dots.col = "Charge.Group.Description",dots.size = 0.1)+ qtm(la_council_districts, borders = 'darkgray',fill=NULL)
One map per charge description.
qtm(arrests_sf, dots.col = "Charge.Group.Description",dots.size = 0.1,by = "Charge.Group.Description")+ qtm(la_council_districts, borders = 'darkgray',fill=NULL)