## Posts by arthur charpentier

# Author: arthur charpentier

#### Testing for Covid-19 in the U.S.

Feed: R-bloggers. Author: arthur charpentier. For almost a month, on a daily basis, we are working with colleagues (Romuald, Chi and Mathieu) on modeling the dynamics of the recent pandemic. I learn of lot of things discussing with them, but we keep struggling with the tests. Paul, in Montréal, helped me a little bit, but I think we will still have to more to get a better understand. To but honest, we stuggle with two very simple questions how many people are tested on a daily basis ? Recently, I discovered Modelling COVID-19 exit strategies for policy makers in the ... Read More

#### On the “correlation” between a continuous and a categorical variable

Feed: R-bloggers. Author: arthur charpentier. Let us get back on the Titanic dataset, 1 2 3 4 loc_fichier = "http://freakonometrics.free.fr/titanic.RData" download.file(loc_fichier, "titanic.RData") load("titanic.RData") base = base[!is.na(base$Age),] On consider two variables, the age x (the continuous one) and the survivor indicator y (the qualitative one) 1 2 X = base$Age Y = base$Survived It looks like the age might be a valid explanatory variable in the logistic regression, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 summary(glm(Survived~Age,data=base,family=binomial)) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.05672 0.17358 -0.327 0.7438 Age -0.01096 0.00533 -2.057 0.0397 ... Read More

#### Modeling Pandemics (3)

Feed: R-bloggers. Author: arthur charpentier. In Statistical Inference in a Stochastic Epidemic SEIR Model with Control Intervention, a more complex model than the one we’ve seen yesterday was considered (and is called the SEIR model). Consider a population of size N, and assume that S is the number of susceptible, E the number of exposed, I the number of infectious, and R for the number recovered (or immune) individuals, displaystyle{begin{aligned}{frac {dS}{dt}}&=-beta {frac {I}{N}}S\[8pt]{frac {dE}{dt}}&=beta {frac {I}{N}}S-aE\[8pt]{frac {dI}{dt}}&=aE-b I\[8pt]{frac {dR}{dt}}&=b Iend{aligned}}Between S and I, the transition rate is beta I, where beta is the average number of contacts per person per ... Read More

#### Modeling pandemics (1)

Feed: R-bloggers. Author: arthur charpentier. The most popular model to model epidemics is the so-called SIR model – or Kermack-McKendrick. Consider a population of size N, and assume that S is the number of susceptible, I the number of infectious, and R for the number recovered (or immune) individuals, displaystyle {begin{aligned}&{frac {dS}{dt}}=-{frac {beta IS}{N}},\[6pt]&{frac {dI}{dt}}={frac {beta IS}{N}}-gamma I,\[6pt]&{frac {dR}{dt}}=gamma I,end{aligned}}so that displaystyle{{frac{dS}{dt}}+{frac {dI}{dt}}+{frac {dR}{dt}}=0}which implies that S+I+R=N. In order to be more realistic, consider some (constant) birth rate mu, so that the model becomesdisplaystyle {begin{aligned}&{frac {dS}{dt}}=mu(N-S)-{frac {beta IS}{N}},\[6pt]&{frac {dI}{dt}}={frac {beta IS}{N}}-(gamma+mu) I,\[6pt]&{frac {dR}{dt}}=gamma I-mu R,end{aligned}}Note, in this model, that people ... Read More

#### Modeling pandemics (2)

Feed: R-bloggers. Author: arthur charpentier. When introducing the SIR model, in our initial post, we got an ordinary differential equation, but we did not really discuss stability, and periodicity. It has to do with the Jacobian matrix of the system. But first of all, we had three equations for three function, but actuallydisplaystyle{{frac{dS}{dt}}+{frac {dI}{dt}}+{frac {dR}{dt}}=0}so it means that our problem is here simply in dimension 2. Hencedisplaystyle {begin{aligned}&X={frac {dS}{dt}}=mu(N-S)-{frac {beta IS}{N}},\[6pt]&Y={frac {dI}{dt}}={frac {beta IS}{N}}-(mu+gamma)Iend{aligned}}and therefore, the Jacobian of the system isbegin{pmatrix}displaystyle{frac{partial X}{partial S}}&displaystyle{frac{partial X}{partial I}}\[9pt]displaystyle{frac{partial Y}{partial S}}&displaystyle{frac{partial Y}{partial I}}end{pmatrix}=begin{pmatrix}displaystyle{-mu-betafrac{I}{N}}&displaystyle{-betafrac{S}{N}}\[9pt]displaystyle{betafrac{I}{N}}&displaystyle{betafrac{S}{N}-(mu+gamma)}end{pmatrix}We should evaluate the Jacobian at the equilibrium, i.e. S^star=frac{gamma+mu}{beta}=frac{1}{R_0}andI^star=frac{mu(R_0-1)}{beta}We should then ... Read More

#### Function basis and regression

Feed: R-bloggers. Author: arthur charpentier. In the first part of the course on linear models, we’ve seen how to construct a linear model when the vector of covariates boldsymbol{x} is given, so that mathbb{E}(Y|boldsymbol{X}=boldsymbol{x}) is either simply boldsymbol{x}^topboldsymbol{beta} (for standard linear models) or a functional of boldsymbol{x}^topboldsymbol{beta} (in GLMs). But more generally, we can consider transformations of the covariates, so that a linear model can be used. In a very general setting, consider sum_{j=1}^mbeta_j h_j(boldsymbol{x})with h_j:mathbb{R}^prightarrowmathbb{R}. The standard linear model is obtained when m=p and h_j(boldsymbol{x})=x_j , but of course, much more general models can be obtained, for instance with ... Read More

#### Testing for a causal effect (with 2 time series)

Feed: R-bloggers. Author: arthur charpentier. A few days ago, I came back on a sentence I found (in a French newspaper), where someone was claiming that “… an old variable explains 85% of the change in a new variable. So we can talk about causality” and I tried to explain that it was just stupid : if we consider the regression of the temperature on day t+1 against the number of cyclist on day t, the R^2 exceeds 80%… but it is hard to claim that the number of cyclists on specific day will actually cause the temperature on the ... Read More

#### Quantile Regression (home made, part 2)

Feed: R-bloggers. Author: arthur charpentier. A few months ago, I posted a note with some home made codes for quantile regression… there was something odd on the output, but it was because there was a (small) mathematical problem in my equation. So since I should teach those tomorrow, let me fix them. Median Consider a sample {y_1,cdots,y_n}. To compute the median, solvemin_mu leftlbracesum_{i=1}^n|y_i-mu|rightrbracewhich can be solved using linear programming techniques. More precisely, this problem is equivalent tomin_{mu,mathbf{a},mathbf{b}}leftlbracesum_{i=1}^na_i+b_irightrbracewith a_i,b_igeq 0 and y_i-mu=a_i-b_i, forall i=1,cdots,n. Heuristically, the idea is to write y_i=mu+varepsilon_i, and then define a_i‘s and b_i‘s so that varepsilon_i=a_i-b_i and ... Read More

#### Lasso Regression (home made)

Feed: R-bloggers. Author: arthur charpentier. To compute Lasso regression, frac{1}{2}|mathbf{y}-mathbf{X}mathbf{beta}|_{ell_2}^2+lambda|mathbf{beta}|_{ell_1}define the soft-thresholding functionS(z,gamma)=text{sign}(z)cdot(|z|-gamma)_+=begin{cases}z-gamma&text{ if }gamma>|z|text{ and }z ... Read More

#### On Cochran Theorem (and Orthogonal Projections)

Feed: R-bloggers. Author: arthur charpentier. Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove that if boldsymbol{Y}simmathcal{N}(boldsymbol{0},mathbb{I}_d) is a random vector with d mathcal{N}(0,1) variable then (i) if A is a (squared) idempotent matrix boldsymbol{Y}^top Aboldsymbol{Y}simchi^2_r where r is the rank of matrix A, and (ii) conversely, if boldsymbol{Y}^top Aboldsymbol{Y}simchi^2_r then A is an idempotent matrix ... Read More

## Recent Comments