## Posts by John Mount

# Author: John Mount

#### Piping is Method Chaining

Feed: R-bloggers. Author: John Mount. What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called “method chaining” since the days of Smalltalk (~1972). Let’s take a look at method chaining in Python, in terms of pipe notation. Let’s work an example using Python‘s Pandas package (and classes). import pandas as pd data = [['alpha', 'a', ... Read More

#### R Photo

Feed: R-bloggers. Author: John Mount. A good friend is now a professor at the University of Auckland and knew to photograph and send us this. Thanks!!! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more... If you got this far, why not subscribe for updates ... Read More

#### Practical Data Science with R Book Update

Feed: R-bloggers. Author: John Mount. A good friend shared with us a great picture of Practical Data Science with R, 1st Edition hanging out in Cambridge at the MIT Press Bookstore. This is as good an excuse as any to share a book update. Nina Zumel and I (John Mount) are busy revising chapters 10 and 11 of Practical Data Science with R, 2nd Edition. We expect the second edition to be able to go into production (itself a serious phase) in a couple of months. Right now signing up for the 2nd edition Manning Early Access Program (MEAP) gets ... Read More

#### Not Always C++’s Fault

Feed: R-bloggers. Author: John Mount. From the recent developer.r-project.org “Staged Install” article: Incidentally, there were just two distinct (very long) lists of methods in the warnings across all installed packages in my run, but repeated for many packages. It turned out that they were lists of exported methods from dplyr and rlang packages. These two packages take very long to install due to C++ code compilation. Technical point. While dplyr indeed uses C++ (via Rcpp), rlang appears to currently be a C-package. So any problems associated with rlang are probably not due to C++ or Rcpp. Similarly other tidyverse packages ... Read More

#### Why RcppDynProg is Written in C++

Feed: R-bloggers. Author: John Mount. The (matter of opinion) claim: “When the use of C++ is very limited and easy to avoid, perhaps it is the best option to do that […]” (source discussed here) got me thinking: does our own RcppDynProg package actually use C++ in a significant way? Could/should I port it to C? Am I informed enough to use something as complicated as C++ correctly? RcppDynProg implements a nifty concise dynamic programming solution to a segmentation problem. It can automatically partition graphs such as the following: into the following: (details found here). But is the package really ... Read More

#### What are the Popular R Packages?

Feed: R-bloggers. Author: John Mount. “R is its packages”, so to know R we should know its popular packages (CRAN). Or put it another way: as R is a typical “the reference implementation is the specification” programming environment there is no true “de jure” R, only a de facto R. To look at popular R packages I defined “popular” as used (Depends/Imports/LinkingTo) by other packages on CRAN. One could use other definitions (e.g. Github stars), but this is the one I used for this particular study. My “quick look” (sure to anger everyone) is a couple of diagrams such as ... Read More

#### C++ is Often Used in R Packages

Feed: R-bloggers. Author: John Mount. The recent r-project article “Use of C++ in Packages” stated as its own summary of recommendation: don’t use C++ to interface with R. A careful reading of the article exposes at least two possible meanings of this: Don’t use C++ to directly call R or directly manipulate R structures. A technical point directly argued (for right or wrong) in the article. Don’t use C++/Rcpp to write R packages. A point implicit in the article. C++ and Rcpp (a package designed to allow the use of C++ from R) are not the same thing, but both ... Read More

#### Standard Evaluation Versus Non-Standard Evaluation in R

Feed: R-bloggers. Author: John Mount. There is a lot of unnecessary worry over “Non Standard Evaluation” (NSE) in R versus “Standard Evaluation” (SE, or standard “variables names refer to values” evaluation). This very author is guilty of over-discussing the issue. But let’s give this yet another try. The entire difference between NSE and regular evaluation can be summed up in the following simple table (which should be clear after we work some examples). Standard Evaluation In standard (or value oriented evaluation) code you type in is taken to be variable names, functions, names, operators, and even numeric literal values. String ... Read More

#### Operator Notation for Data Transforms

Feed: R-bloggers. Author: John Mount. As of cdata version 1.0.8 cdata implements an operator notation for data transform. The idea is simple, yet powerful. First let’s start with some data. d wrapr::build_frame( "id", "measure", "value" | 1 , "AUC" , 0.7 | 1 , "R2" , 0.4 | 2 , "AUC" , 0.8 | 2 , "R2" , 0.5 ) knitr::kable(d) 1 AUC 0.7 1 R2 0.4 2 AUC 0.8 2 R2 0.5 In the above data we have two measurements each for two individuals (individuals identified by the "id" column). Using cdata‘s new_record_spec() method we can capture a description ... Read More

#### How cdata Control Table Data Transforms Work

Feed: R-bloggers. Author: John Mount. With all of the excitement surrounding cdata style control table based data transforms (the cdata ideas being named as the “replacements” for tidyr‘s current methodology, by the tidyr authors themselves!) I thought I would take a moment to describe how they work. cdata defines two primary data manipulation operators: rowrecs_to_blocks() and blocks_to_rowrecs(). These are the fundamental transforms that convert between data representations. The two representations it converts between are: A world where all facts about an instance or record are in a single row (“rowrecs”). A world where all facts about an instance or record ... Read More

## Recent Comments