Posts by kjytay
Author: kjytay
A short note on the startsWith function
Feed: R-bloggers. Author: kjytay. The startsWith function comes with base R, and determines whether entries of an input start with a given prefix. (The endsWith function does the same thing but for suffixes.) The following code checks if each of “ant”, “banana” and “balloon” starts with “a”: startsWith(c("ant", "banana", "balloon"), "a")
# [1] TRUE FALSE FALSE
The second argument (the prefix to check) can also be a vector. The code below checks if “ant” starts with “a” and if “ant” starts with “b”: startsWith("ant", c("a", "b"))
# [1] TRUE FALSE
Where things might get a bit unintuitive is when both ... Read More
What is a horizon chart?
Feed: R-bloggers. Author: kjytay. A horizon chart is a compact version of an area chart. In the words of Jonathan Schwabish (Reference 1, page 164), it is … an area chart that is sliced into equal horizontal intervals and collapsed down into single bands, which makes the graph more compact and similar to a heatmap… What are horizon charts good for? Here is Schwabish again: Horizon charts are especially useful when you are visualizing time series data that are so close in value so that the data marks in, for example, a line chart, would lie atop each other. … ... Read More
Something to note when using the merge function in R
Feed: R-bloggers. Author: kjytay. Base R has a merge function which does join operations on data frames. As the documentation says, the function [merges] two data frames by common columns or row names, or do other versions of database join operations. One thing that I realized which may not be obvious is that merge can have somewhat unexpected behavior regarding the ordering of rows in the result. Let’s see an example with the mtcars dataset: data(mtcars)
mtcars$ID <- 1:nrow(mtcars)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb ID
# Mazda RX4 21.0 6 160 110 3.90 ... Read More
Changing the column names for model.matrix output
Feed: R-bloggers. Author: kjytay. In this previous post, I showed how you can include a dummy variable for the baseline level in the output of the model.matrix function. In this post, I show how you can make changes to the column names of model.matrix‘s output to make downstream parsing a little easier. Let’s use the iris dataset again: data(iris)
str(iris)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ ... Read More
How to include all levels of a factor variable in a model matrix in R
Feed: R-bloggers. Author: kjytay. In R, the model.matrix function is used to create the design matrix for regression. In particular, it is used to expand factor variables into dummy variables (also known as “one-hot encoding“). Let’s see this in action on the iris dataset: data(iris)
str(iris)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# ... Read More
Switching testthat editions and how it affects testing functions and formulas
Feed: R-bloggers. Author: kjytay. testthat is a popular R package used for unit testing. From v3.0.0, testthat introduces the idea of “editions”. This is testthat‘s way of maintaining backward compatibility. At the time of writing, the 3rd edition is the latest and incorporates the package developer’s latest recommendations, some of which could be backward incompatible. If the user wants testthat‘s old behavior, they can use earlier editions of the package. Use edition_get() to find out which edition of testthat is currently active, and use local_edition() to change the active edition: library(testthat)
edition_get() # your return value may be different
... Read More
Comparing the Bradley Terry model to betting odds
Feed: R-bloggers. Author: kjytay. In this previous post, I described the Bradley-Terry model and showed how we could use it to predict game outcomes in the NBA 2018-19 regular season. After ffitting the Bradley-Terry model on the first half of the regular season (with and without home advantage), I used the model to predict win probabilities for the second half of the season. The models gave test Brier scores of 0.213 and 0.219, which I said was no longer than random 50-50 guessing (which has a Brier score of 0.25). In private correspondence, Justin Dyer pointed out to me that ... Read More
What is the Bradley-Terry model?
Feed: R-bloggers. Author: kjytay. The Bradley-Terry model The Bradley-Terry model, named after R. A. Bradley and M. E. Terry, is a probability model for predicting the outcome of a paired comparison. Imagine that we have teams competing against each other. The model assigns team a score , with higher scores corresponding to better teams. Given two teams and , the model asserts that (Notice that the model implies that either i beats j or j beats i: there is no room for ties.) If we parameterize the scores by , then the model above is equivalent to Thus, estimating the ... Read More
Playing Wordle in R
Feed: R-bloggers. Author: kjytay. The game Wordle has taken the world (or at least my facebook feed) by storm. It’s a really simple word game that’s a lot like the classic Mastermind. Here are the rules from the Wordle website: The logic behind the game is pretty simple, so I thought I’d code up an R version so that those of you who can’t get enough of it can play it on your own! The full code is available here. In my version, I allow the user to set 3 parameters: dictionary: A vector of possible words that the computer ... Read More
Simulating dice bingo
Feed: R-bloggers. Author: kjytay. Note: This post was inspired by the “Classroom Bingo” probability puzzle in the Royal Statistical Society’s Significance magazine (Dec 2021 edition). Set-up Imagine that we are playing bingo, but where the numbers are generated by the roll of two 6-sided dice with faces 1, 2, …, 6. Each round, the two dice are rolled. If the sum of the two dice appears on your bingo card, you can strike it off. The game ends when you strike off all the numbers on your card. For this simple game, you are asked to write just two numbers ... Read More
Recent Comments