Here at Sharp Sight, we make a lot of maps.
There are a few reasons for this.
First, good maps are typically ‘information dense.’ You can get a lot of information at a glance from a good map. They are good visualization tools for finding and communicating insights.
Second, it’s extremely easy to get data that you can use to make a map. From a variety of sources, you’ll find data about cities, states, counties, and countries. If you know how to retrieve this data and wrangle it into shape, it will be easy to find data that you can use to make a map.
Finally, map making is just good practice. To create a map like the one we’re about to make, you’ll typically need to use a variety of data wrangling and data visualization tools. Maps make for excellent practice for intermediate data scientists who have already mastered some of the basics.
With that in mind, this week we’ll make a map of “world cities.” This set of cities has been identified by the Globalization and World Cities (GaWC) Research Network as being highly connected and influential in the world economy.
We’re going to initially create a very basic map, but we’ll also create a small multiple version of the map (broken out by GaWC ranking).
Let’s get started.
First, we’ll load the packages that we’ll need.
#============== # LOAD PACKAGES #============== library(tidyverse) library(ggmap) library(forcats)
Next, we’ll input the cities by hard coding them as data frames. To be clear, there is more than one way to do this (e.g., we could scrape the data), but there isn’t that much data here, so doing this manually is acceptable.
#=================== # INPUT ALPHA CITIES #=================== df_alpha_plus_plus
Now, we’ll create a new variable called
rating. This will contain the global city rating.
Notice that this is a very straightforward use of
dplyr::mutate(), one of the tidyverse functions you should definitely master.
#======================= # ADD GLOBAL CITY RATING #======================= df_alpha_plus_plus % mutate(rating = 'alpha++') df_alpha_plus % mutate(rating = 'alpha+') df_alpha % mutate(rating = 'alpha') df_alpha_minus % mutate(rating = 'alpha-')
Next, we’ll combine the different data frames into one using
#====================================== # COMBINE DATAFRAMES INTO ONE DATAFRAME #====================================== alpha_cities
Now that the data are combined into a single data frame, we’ll get the longitude and latitude using
#======== # GEOCODE #======== latlong
Once we have the longitude and latitude data, we need to combine it with the original data in the
alpha_citiesdata frame. To do this, we will use cbind().
#============================ # BIND LAT/LONG TO CITY NAMES #============================ alpha_cities % rename(long = lon) alpha_cities #names(alpha_cities)
Now we have the data that we need, but we’ll need to clean things up a little.
In the visualization we’ll make, we will need to use the faceting technique from
To reorder the factor levels of
#================================================ # REORDER LEVELS OF GLOBAL CITY RATING # - the global city ratings should be ordered # i.e., alpha++, then alpha+ .... # - to do this, we'll use forecats::fct_relevel() #================================================ alpha_cities
Because we will be building a map, we’ll need to retrive a map of the world. We can get a world map by using
#============== # GET WORLD MAP #============== map_world
Ok. We basically have everything we need. Now we will make a simple first draft.
#================ # FIRST DRAFT MAP #================ ggplot() + geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) + geom_point(data = alpha_cities, aes(x = long, y = lat), color = 'red')
… and now we’ll use the faceting technique to break out our plot using the
#========================== # CREATE SMALL MULTIPLE MAP #========================== ggplot() + geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) + geom_point(data = alpha_cities, aes(x = long, y = lat), color = 'red') + #facet_grid(. ~ rating) #facet_grid(rating ~ .) facet_wrap(~ rating)
Once again, this is a good example of an intermediate-level project that you could do to practice your data wrangling and data visualization skills.
Having said that, before you attempt to do something like this yourself, I highly recommend that you first master the individual tools that we used here (i.e., the tools from
To master data science, you need to master the essential tools.
And to make rapid progress, you need to know what to learn, what not to learn, and you need to know how to practice what you learn.
Sharp Sight is dedicated to teaching you how to master the tools of data science as quickly as possible.
Sign up now for our email list, and you’ll receive regular tutorials and lessons.
- What data science tools you should learn (and what not to learn)
- How to practice those tools
- How to put those tools together to execute analyses and machine learning projects
- … and more
If you sign up for our email list right now, you’ll also get access to our “Data Science Crash Course” for free.
SIGN UP NOW
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…