ISQS 5349, Spring 2016

Course Syllabus
Old midterms and finals
Class recordings


Supplemental books: 
Practical Regression and Anova using R, by Julian Faraway.
Probabilistic Modeling in Computer Science, by Norm S. Matloff of UC Davis, a free book licensed under creative commons.

Helpful R materials:
Winston Chang’s Cookbook for R, a free book licensed under creative commons.
From UCLA’s Statistical Consulting Group: http://www.ats.ucla.edu/stat/r/seminars/intro.htm
A start for this class showing how to access data and do basic things
An overview of R for statistics – everything you want, in a nutshell

A list of useful R functions

http://www.cyclismo.org/tutorial/R/
http://www.rstudio.com/ide/docs/
http://ww2.coastal.edu/kingw/statistics/R-tutorials/
http://www.stat.auckland.ac.nz/~ihaka/120/Notes/ch03.pdf (graphics, from a founder of R, Ross Ihaka)

                       

Class Topics (Rough schedule – this will change depending on student presentations. Dr. Westfall will update the class regularly on the schedule changes)

Preparation – Read and study everything in this column. There will be a quiz at the beginning of class on the day listed.  Refer back to these documents repeatedly. Links within the links are recommended and may aid understanding but are not required.

R codes, Homework, and etc.

1. 1/21 Smoothing; Scatterplots, LOESS smoothers, the classical regression model and its assumptions.

Nature favors continuity over discontinuity: Review Example 7.5 of Chapter 7 of Understanding Advanced Statistical Methods, by Westfall and Henning, to refresh your memory as to what a conditional distribution is, and as to why we do not use the word “population” in statistical modeling.

 

Read this discussion of the “Model produces data” concept as it related to the regression model, by Dr. Westfall. This provides further explanation of conditional distributions and of the flawed “population” terminology in statistical models.

 

Also read this discussion of the “Model produces data” concept as it related to the regression model, by Dr. Westfall. Hopefully, the points made in ISQS 5347 will be abundantly refreshed by now!

 

Read this introduction to LOESS:  LOESS curves are estimates of g(x), the “center” of the distribution of Y for each X=x. These curves are used to diagnose functional form of the regression model.  In particular, they are excellent tools to diagnose the adequacy of the assumption that g(x) is a linear function.

 

HW 1, due Thursday 1/22

 

Initial regression analyses using R; introduction to some data sets we’ll be using.

 

Models produce data.  How different do the data look when they are produced by the same model?  How different do the data look when they are produced by different models?

 

Estimating curvature using LOESS smoothing: Toluca, Peak Energy, Car Sales, and Product complexity examples.

 

Understanding how to interpret LOESS smooths by simulating data where the true mean function is known.

 

 

 

 

 

Why is probability needed in the regression model?

2. 1/26  Maximum likelihood and least squares. The Gauss-Markov theorem.

 

(Today’s quiz covers all the readings for 1/15 and 1/20, and counts double)

The quiz for Today’s class will be over two days’ worth of material, 1/16 (in the box above) and 1/21 (in this box). This quiz will count double.

 

Read this summary of the assumptions of the regression model, by Dr. Westfall.

 

Review this material from ISQS 5347.

 

Read the Wikipedia entry (all of it) about the R2 statistic (also called the coefficient of determination).

 

Read the Wikipedia entry about the Gauss-Markov theorem, up to the words “The theorem now states that the OLS estimator is a BLUE.” Focus your attention what the theorem is saying. In particular, focus your attention on the idea of what a “linear estimator” is, and how OLS is the best among all other linear unbiased estimators.

 

Why do assumptions matter?

Illustrating the Gauss-Markov property, both good and bad.

 

 

 

 

Why is probability needed in the regression model?

3. 1/28 Exact Inferences in the classical parametric regression model: p-values, confidence and prediction intervals.

Read this discussion of a confidence interval for the slope, from Doug Stirling, Massey University in Palmerston North, New Zealand.

 

Read this document on interpreting p-values, by Dr. Westfall.

 

Read this document on “Why you should never say “Accept Ho,” written by Dr. Westfall

 

Read this discussion of confidence intervals for E(Y|X=x) versus Prediction interval for Y|X=x, from “Musings on Using and Misusing Statistics,” by Martha K. Smith, retired UT professor.

 

Read this document on “Prediction and Generalization,” written by Dr. Westfall.

 

Read this document on “Confidence intervals and significance tests as predictions,” written by Dr. Westfall.

Confidence intervals for slope and intercepts.

 

Understanding the standard error: a simulation study.

 

Constructing frequentist confidence and prediction intervals; GPA/SAT example and Toluca example; Constructing the corresponding Bayesian intervals using the same examples.

Bayesian analysis using transformation to solve obvious problem with nonnormality of the GPA distribution

 

 


Why is probability needed in the regression model?

4. 2/2 Checking the assumptions of the classic model

Read the document, “How to check assumptions using data,” written by Dr. Westfall, and run the R code therein.

 

Read and run the code in the document, “Why do assumptions matter?” written by Dr. Westfall

 

Read the document, “Comments on Transformations,” written by Dr. Westfall.

HW2 due 2/3

 

Testing for curvature using quadratic regression: Toluca, Peak Energy, Car Sales, and Product Complexity examples

Statistical versus practical significance: A demonstration of the difference.

Estimating the relationship between mean absolute residual and predictor variable using LOESS smoothing, as well as testing for existence of heteroscedasticity: Toluca and Peak Energy examples.

Understanding how to interpret LOESS smooths of absolute residuals by simulating data where the true variance function is known.

Evaluating the normality assumption using q-q plots and Shapiro-Wilk hypothesis test: Toluca, Peak Energy, and Product Complexity examples.

Understanding how to interpret q-q plots by simulating data where the true error distribution is known.

 

Why is probability needed in the regression model?

5. 2/4  Using transformations to achieve a more reasonable model. The multiple regression model.

Read 1-9, 11-14 of these presentation slides, by  William G. Jacoby, Department of Political Science, Michigan State.

 

Read this introduction to multiple regression analysis

 

HW 3, due 2/17

 

Lance Car Sales example: Analysis of model using x-1 transformation.

Peak Energy Use example: Analysis of model using ln(y) transformation.

Box – Cox transformations


Why is probability needed in the regression model?

6. 2/9 The multiple regression model, added variable plots

 

Note on presentations: Good example PowerPoint student presentations can be found on last Fall’s ISQS 6348 page (multivariate analysis), in the left column. DO NOT use last year’s ISQS 5349 presentations as examples – they were trying to present as if at a conference, rather than teaching the material (which was my fault because I did not give adequate guidance.)

Read about added variable plots (also called partial regression plots)

 

Read Sections 1-7 of this matrix algebra preparation material, courtesy A. Colin Cameron, UC Davis (whoo hoo! My alma mater).

 

(Not required, but if you find yourself needing additional self-study on matrix algebra, see the SOS math tutorials, matrix0, matrix1, matrix2, and Introduction to matrix algebra using R.  See also Matrix and linear algebra tutorials; See also the MIT Open courseware (a free online course with separate lectures for separate linear and matrix algebra topics).)

Multiple regression analysis of how computer time to run a job relates to Ram and processor speed.

R code for Sales vs. Int rate and Gas Price example: How curvature can be explained by a third variable.

Visualizing the Multiple Regression model- 3-D and partial plots using EXCEL.

Simulation study showing that simple (Xj,Y) scatterplots and other diagnostics are not completely adequate to judge the adequacy of the multiple regression model.

Why is probability needed in the regression model?

7. 2/11  Matrix form of model, estimates, standard errors, t intervals and tests.

Read these presentation slides by Carlos Carvalho, UT Austin. Notes: (i) slide 7: it is more common that the first subscript denote row and the second denote column.  He has it reversed. (ii) slide 9: The term “standard error” here is different from the “standard error” of the beta estimate.  It refers to the estimated conditional standard deviation, and is called “residual standard error” in R. (iii) Slide 11. The “Y-hat” inside the summation expression needs an "i" subscript” as shown correctly on the next line down. (iv) slide 15: the last distribution should be the conditional distribution given the X data, because the Sb covariance matrix is a function of the specific observed X data ((X’X)-1 in particular. (The unconditional distribution would involve the average of the (X’X)-1 matrix over the distribution of X.)

 

Read this document on the matrix form of the regression model, from François Nielsen of UNC-Chapel Hill, sections 1, 2, 3, 4, 5.1 – 5.4, 5.6.1, 5.6.2.  Notes: (i) A slight quibble:  He states in 5.6.1 that the standard error is the standard deviation of the sampling distribution of the estimator.  This is not quite true, because the true standard deviation involves the unknown conditional standard deviation of Y|X=x, which we call s. The standard error is computed by plugging in an estimate of this s into the formula for the standard deviation of the sampling distribution of the estimator. In addition, as in the first document, these distributions should be labeled as conditional upon the X data. (ii) On page 7, bottom of the page, it should read “If this relation only holds when all the λ 's are zero then the columns are linearly independent.”

 

Read the document “Prediction as association and prediction as causation,” written by Dr. Westfall. The document shows that you cannot infer causation using the regression model.

 

The multivariate normal distribution (from Wikipedia)

Information on covariance matrices, from Wikipedia

The various matrices and regression calculations shown in R code.

You can also see many of the matrices using the "xpx" and "i" options in the "MODEL" statement of PROC REG.

Why is probability needed in the regression model?

8. 2/16  Causality

Read Linear Statistical Models for Causation: A Critical

Review, by the late David Freedman.  The document shows that you can infer causation from the “response schedule” regression model, also called the Neyman-Rubin causal model. But the assumptions cannot be verified.

 

Note: You can skip the material from p. 4, starting “If we are trying to find laws of nature that are stable

under intervention, standardizing may be a bad

idea,” through the  end of the section on “Hooke’s Law” on p. 5.  It is interesting material, but somewhat distracts from the main points about causality.

 

Also, be careful about taking words out of context.  When Freedman writes “Thus, DOut has a causal influence on DPaup, …” it should be clear from the thrust of his article and the context of the sentence that he means something like “Thus, the naïve social scientist may wish to claim that DOut has a causal influence on DPaup, …” 

 

The computer speed example will be good for discussing causality.

Also, the Yule data is here.

 

Why is probability needed in the regression model?

9. 2/18 Midterm 1

Why is probability needed in the regression model?

10. 2/23 Multicollinearity  The ANOVA table, the F test, and the R-squared statistic 

Read these presentation slides, slides 1-6, by Alicia Carriquiry of Iowa State.


Notes: (i) There is a mistake on Slide 4. It should read “…where R2j is the coefficient of determination obtained by regressing the jth predictor on all other predictors.” (In particular, the VIF has nothing to do with the Y data.)

 

Read this document on multicollinearity, by Dr. Westfall

 

File to illustrate problems with multicollinearity

 R code for diagnosis and interpretation of multicollinear variables; also indicates one of many potential solutions to the problem.

Full model - reduced model F test.

Why is probability needed in the regression model?

11. 2/25 Interactions; the inclusion principle

Read this document about the inclusion principle, compiled by Timothy Hansen, U. South Carolina.

 

Read this document about interactions in regression by Kristopher J. Preacher of Vanderbilt University.
Notes:
(i) The phrase “(if Z is dichotomous, these values correspond to the only two possible values of Z)” is nearly always wrong.  Ignore it.
(ii) The phrase “Centering reduces multicollinearity among predictor variables” is false in general.  Centering usually has no effect whatsoever on multicollinearity. The only time that centering reduces MC is when the model contain terms like x and x2 (a quadratic model), or terms like x1, x2 and x1x2 (an interaction model).
 

HW 3 notes.

3-d graphs of interaction and non-interaction surfaces

Examining interactions – an R demo

Moderator example, from Karl Wuensch's web page http://core.ecu.edu/psyc/wuenschk/. The publication is here.

A "Hand-drawn" graph using Excel of the moderating effect.

File to illustrate problems with violating the inclusion principle

 

Why is probability needed in the regression model?

12. 3/1 Dummy variables, ANOVA, ANCOVA, ANCOVA with interactions, graphical summaries

 

 

Read this document through slide 24. This is another one by Carlos Carvalho, UT Austin.

Note: On Slide 10, top equation, he is missing the “i” subscript on the “Exp” variable.
In general, you should either subscript all the Y, X, and epsilon variables with “i” (which indicates that you are referring to the model for row “i” in your data frame) or you should subscript none of the variables (which means you are modelling your process more generically.)  

ANOVA/ANCOVA, first file – comparing GPAs of Male and Female students (two-level ANOVA/ANCOVA).

ANOVA/ANCOVA, second file – comparing GPAs of students different degree plans.

Why is probability needed in the regression model?

13. 3/3 Variable and model selection

Read about the variance/bias trade-off here.

Comments: (1) Right below the first graph: characterized à characterize   (2) Two sentences later, g(x) is called an “estimator” of f(x), and this persists through the article.  This is not the usual usage of the term “estimator,” because an estimator is a function of random data.  If you put a “^” on top of g(x) you could call it an estimator. Better to call it a “candidate mean function” or “supposed model” or something like that. Later on, though, he refers to g(x) as a function of data, and this is indeed an estimator.

 


Read summary comments on variable selection, data snooping, and a strategy for variable selection, by Dr. Westfall



The Law of Total Variance

A simulation example: Predicting Hans’ graduate GPA from GRE scores and Undergrad GPA.

R code to demo to demonstrate the danger of overfitting.

An R simulation to illustrate that including extraneous variables does not cause bias, but does inflate the variance.

An R simulation file to illustrate the variance/bias tradeoff, and show why you might prefer biased estimates in terms of estimating the mean value.

An R simulation to illustrate the variance/bias trade-off in terms of parameter estimation.  Fitting the wrong (reduced) model results in biased parameter estimates, but they are sometimes more accurate than the unbiased estimates obtained from fitting the correct model.

R file for producing and comparing n-fold cross-validation statistics for different models

All subsets model selection for predicting doctors per capita using R.

 

Why is probability needed in the regression model?

14. 3/8 Variable and model selection.

.

Read this document.  Realize that there are R-counterparts to all the SAS-specific procedures. So you are responsible for understanding all content that is software-independent, but you are not responsible for knowing which SAS procedure does what.
Comments:  This is not the best paper in the universe, but it does make some good points that you should know. Here are some issues; all else other than what I have written below is ok in the paper, as far as things that I want you to know.
1. P. 1, point 7, “collinearity problems are exacerbated” is confusing.  With fewer X variables obviously there are less correlations among the X variables. I have an idea what they mean here, but just ignore the comment.

2. P. 2. The supposed problem that “models are too complex” when you perform stepwise methods is strange. How can a model with fewer X variables be more complex than a model with more X variables? I would simply ignore that point.
3. P. 2. They are sloppy in their statement of the linearity assumption.  It should state “linearity of the relationship between the IVs and the conditional means of the DV.” But you all caught that, right?

4. P. 2, “White noise” means “iid normal.”
5. P. 3, top of page, “Usually, when one does a regression, at least one of the independent variables is really related to the dependent variable, but

there are others that are not related.” à “Usually, when one does a regression, at least one of the independent variables is really related to the dependent variable, but there are others that have little relationship.

6. P. 4. The comment about partial least squares really only applies to cases where you have more than one Y variable; this is covered in ISQS 6348. Also, their X’Y’ formula should be X’Y, and to be consistent with their model on page 1, they should either have made this Y lower case or the one on page 1 upper case. My suggestion would be the latter.
7. P. 4. Their formula for beta-hat(lasso) is wrong.  What they are minimizing is just the ordinary sum of squared errors used in least squares, and they have completely botched the formula. The right parenthesis should be between the “2” superscript at the end, and “j” subscript on the last beta. Also the “0” in the middle should be in the subscript of that beta (after you have moved the parentheses as I just indicated) so that you have “beta_0”, i.e., the intercept. They have the constraint (the “subject to”) part correct. BTW, “argmin” means “arguments (or values of the parameters) that minimize.”

8. In their simulation analyses they seem to be stating that lars and lasso are good for protecting against the effects of outliers.  That is not what they are designed to do, although the fact that they "shrink" estimates towards zero means that they will not give really crazy results with large coefficients just because of an outlier. Still, they are not methods that will protect you against all kinds of problems caused by outliers.

The Law of Total Variance

R code to demo to demonstrate the danger of overfitting.

An R simulation to illustrate that including extraneous variables does not cause bias, but does inflate the variance.

An R simulation file to illustrate the variance/bias tradeoff, and show why you might prefer biased estimates in terms of estimating the mean value.

An R simulation to illustrate the variance/bias trade-off in terms of parameter estimation.  Fitting the wrong (reduced) model results in biased parameter estimates, but they are sometimes more accurate than the unbiased estimates obtained from fitting the correct model.

R file for producing and comparing n-fold cross-validation statistics for different models

All subsets model selection for predicting doctors per capita using R

Why is probability needed in the regression model?

15. 3/10 Heteroscedasticity: WLS, ML estimation, robust standard errors

Update: Modified readings. I really like Cosma Shalizi of Carnegie-Mellon.  Expect to see more from him. Read pages 1-9 from this document. 

 

Comments:
1. A variable with an arrow on top refers to a vector (list) of values. Not sure why he wants to use it for some vectors and not others.  For example, on p. 1, “beta” is also a vector but there is no arrow on top.

2. The assumed functional form of the heteroscedasticity, 1 + x^2/2, is somewhat unusual in that it makes the variance decrease then increase as x ranges from its min to its max. This might make sense in some cases where the X variable has zero in its range, as it does in Cosma’s simulation, but for typical cases where the X and Y data are positive, the heteroscedasticity function is either monotonically increasing or monotonically decreasing, with no “down then up” behavior.

3. “The oracle of regression” is what I sometimes refer to as “Hans.”

 

 

Optional: Read this critique of robust standard errors. The prof summarizes:  If you think you have a problem with heteroscedasticity, you probably have more serious problems and should not be using OLS anyway. Using robust standard errors with OLS is like putting a bright red bow on the head of a ugly pig lying in the stinking pig slop, and then saying that the pig now looks pretty.

HW 4 (last) due April 9.

 

First file to illustrate benefit of Weighted Least Squares – shows imprecise predictions of OLS in the presence of heteroscedasticity

Comparison of prediction limits: Homoscedastic vs. Heteroscedastic models – shows OLS prediction limits are incorrect in the presence of heteroscedasticity

Estimating the heteroscedastic variance GE returns as a function of trading volume via maximum likelihood using R . (Try different variance functions and compare log likelihoods to see which fit better.)

Obtaining heteroscedasticity-consistent standard errors using R

Why is probability needed in the regression model?

16. 3/22 Bootstrap

 

 

Read this most excellent document, written by Cosma Shalizi of Carnegie-Mellon. This document provides an excellent summary of the main points about probability models that I have emphasized in this course and in the ISQS 5347 course.

Helpful tips:
1. When you see the words “functional” you can usually substitute the term "beta" or “sigma” in your mind.
2. “Asymptotics” is shorthand for “large sample theory”. The Law of Large Numbers and the Central Limit Theorem are examples of “asymptotics.”
3. The “Gaussian distribution” is the same as the normal distribution.
4. Equation (13) is the bootstrap confidence interval that I gave you in Section 19.5 of my book (used in ISQS 5347).
5.  The term "size of a test” means “True Type I error rate.”  When you use the p<.05 rule, you are aiming for .05 as the size of your test, but due to failures of assumptions, the true size will be different from .05.
6.  The “Oracle” is “Hans.”
7. The “quantile method” for simulating random numbers is given in my ISQS 5347 book in Section 3.4.
8. Section 3: Simulation error goes away with more simulations. That’s why we want NSIM = infinity, although it takes too long.
9. The “smoothed bootstrap” uses the same trick as “jitter.”
10. In general – the R codes rely on functions defined in the authors earlier chapters, so will not always run. (You might try to find the functions on the author’s webpage if interested.)
11.  The comment about dependent observations at the end of Section 5 is important: Bootstrapping involves simulation, which by default produces independent observations (because that’s how random number generators work). You have to work a little harder to simulate non-iid data (as in the case of time series, repeated measures, panel data, longitudinal data, multilevel data, clustered data, …)

Some bootstrap examples

 

Why is probability needed in the regression model?

17. 3/24  Quantile regression and Winsorizing

Read quantile regression

 

 

Data on weekly salaries from the BLS, from 2002 to 2014. Note that the 0.10 and 0.90 quantiles have different slopes.

EXCEL spreadsheet to explain the quantile estimation method

EXCEL spreadsheet to explain the quantile estimation method in the regression case – The CAPM regression model

The CAPM model via quantile regression using PROC QUANTREG

Why is probability needed in the regression model?

18. 3/29  Generalized Least Squares, Correlated errors, Time series

Read this document from John Fox

 

 

John Fox code; also some code to show how the AR and MA data-generating processes work.

Simulation studies of inefficiency and Type I error rates when using OLS and ML/GLS:  An example with auto-correlated data.

Why is probability needed in the regression model?

19. 3/31 Intro to Mixed Effects Models.

Read this tutorial on lmer by Bodo Winter.

 

Some code from Bodo Winters’ tutorial

A data analysis indicating the problem with correlated observations: Standard errors are clearly wrong.

 

Why is probability needed in the regression model?

20. 4/5  Repeated measures and random effects,  Bayesian-style “shrinkage” estimates of random effects.

Read this article by Simon Jackman explaining why random effects are preferable to fixed effects, and also explaining the Bayesian formula for “shrinkage estimates”.

 

Read also from this article by Simon Jackman, Proposition 7.1 on p. 206-207, for a more direct representation of the Bayesian shrinkage estimate.

Simulation code to help understand the shrinkage estimates shown in Jackman’s papers.

Ranking of teaching in various majors at TTU using Bayesian-style random effects (“shrinkage”) estimates, with comparison to simple OLS fixed-effects estimates.

Why is probability needed in the regression model?

21. 4/7 Multilevel analysis

Required: Read the Preface and Sections 1, 2, 3 and 7 from this document.  Here is the R code from Section 7. There are a couple mistakes in their syntax, but I was able to replicate their results precisely once the code was cleaned up.


Notes:
1. Model (2) requires an additional subscript “k” on both “y” and “e”.
2. The last index of the mu terms in H0 on page 5 should be J not j. Also, the alternative should include “for at least one pair of means”
3. They use the word “population” a lot. Make the appropriate substitution of “data generating process.”
4. Note that their model (5) on page 8 is exactly the same model that I used in the previous class to estimate the shrinkage estimates for MAJOR.
5. What they call “full maximum likelihood” (FML) is just ordinary ML as you have learned it.  In particular the “full” in FML has nothing to do with the “full model/restricted model” comparison that you have learned.
6. The authors are confused about how to calculate AIC and BIC statistics in R when using the restricted maximum likelihood (REML) estimates. See my comments at the end of my R code from Section 7.

 

Optional: The Bliese tutorial paper.  You will find a lot of this paper very useful as well, but it’s not required reading. But Bliese wrote the R code and gives tutorials on the topic, so he is actually a better source than the simpler reading that I assigned.

Why is probability needed in the regression model?

“Variance between means”

A multilevel regression of trends (random coefficients regression modelling) in the TTU grad data.

R code from Section 7 of the reading.

22. 4/12 Finishing up mixed effects analyses - The Hausman “test” for fixed versus random effects



Read this critique of the over-used, misunderstood, and trained parrot-ish Hausman test for fixed versus random effects.

 

The most important sentence in this document appears on p. 11: "The Hausman test does not aid in evaluating this tradeoff."

 

Just found this paper (optional reading), which gives a simple fix to the problem of random effects being correlated with the X data. This makes the Hausman test even more useless than is indicated by the main reading.

 

Note also (optional) a similar solution from Andrew Gelman.

 

Take home message: Just like all tests for model assumptions, such as tests for normality, homoscedasticity, etc., the Hausman test for fixed versus random effects is not very useful.

How to perform the Hausman test using R.

Code showing the simulation method from the reading.

 

Why is probability needed in the regression model?

23. 4/14 Binary regression models

Read the Wikipedia page on logistic regression – it’s pretty good.  Read up to, but not including the section “As a two-way latent variable model” (even though, believe it or not, Daniel McFadden won a Nobel prize in 2000 for the material in that section).

Need some % change effects – Bayesian


Also read this R tutorial.
Mistake: About the Wald test in R, they wrote, “Sigma supplies the variance covariance matrix of the error terms…” which should instead be “Sigma supplies the variance covariance matrix of the parameter estimates...”

Some logistic regression curves

 

Code from the UCLA tutorial.

 

Code for in-class project: Bayesian logistic regression of “Trashball” – a variation of basketball.

 

Code for finding the p-value for the likelihood ratio test:

pval = 1 - pchisq(anova(t1,t2)$Deviance, anova(t1,t2)$Df)

Why is probability needed in the regression model?

24. 4/19 Ordinal response regression models

Read this document by Paul Johnson at KU. It’s kind funny, as well as informative, and thankfully, mostly correct. Some issues:

1. p. 2 "The function f is a probability density function" (PDF), which represents the probability that ei is equal to some particular value."

 

This is the wrong interpretation of a continuous density.  It is correct for a discrete distribution, but recall that "the probability that ei is equal to some particular value" is actually zero when ei is continuously distributed. 

 

2. p. 5, right before equation (11), should read, “More succinctly, for k+1 categories, we would write…”

HW4 notes (much courtesy of one student group in this semester’s class). BTW, all the stuff about probability (expected values, simulation, parameter interpretation, “by chance alone” that you may be struggling with will form the bulk of the final. So please pay more attention to these concepts.

R code for ordinal logistic (or probit) regression.

rating = c(1, 1, 2, 2, 1, 3, 3, 1, 1, 2, 3, 2, 1, 2)

salary = c(15, 30, 20, 30, 40, 45, 49, 16, 20, 40, 55, 56, 24, 31)

ordd = data.frame(rating, salary)

library(MASS)

ord.logit = polr(as.factor(rating) ~ salary, data = ordd)

ord.probit = polr(as.factor(rating) ~ salary, data = ordd, method="probit")

cbind(summary(ord.logit)$fitted.values, ordd)

cbind(summary(ord.probit)$fitted.values, ordd)

Some graphical presentations using Excel.

Comparing Probit and Logit distribution Functions

 

Why is probability needed in the regression model?

25. 4/21 Poisson, negative binomial and other count data regression models.

Read about the Poisson regression, through Section 3.2.

 

Read about Negative binomial regression, too.

 

Supplemental reading on count data models in R (not required but excellent)

More on the latent variable formulation

Simulating and analyzing data from Poisson and Negative binomial regression models.

Analysis of experimental data on wine sales using count data models.

Why is probability needed in the regression model?

26. 4/26 GLMMs with repeated or hierarchical data structures

Read this article from the UCLA Institute for Digital Research and Education

Summary of Generalized Linear Mixed Models, from the UCLA Institute for Digital Research and Education

 

Why is probability needed in the regression model?

27. 4/28 Nominal response regression models

 

 

Read this article from the UCLA Institute for Digital Research and Education  (Thank you UCLA/IDRE for supplying such nice materials!  You guys rock!)

 

Optional reading: “Applying discrete choice models to predict Academy Award winners, J. R. Statist. Soc. A (2008), 375-394, by Pardoe and Simonton.

 

Multinomial logistic regression using R. Summaries using EXCEL. 

Excel file showing ML Estimation for multinomial logistic regression.

 

Why is probability needed in the regression model?

28. 5/3 Tobit and censored regression models.

Update, 4/17.

Read about censored regression here.

 

Optional: Read sections 1,2,3 and 5 of  “Tobit Models: A Survey,” By Takeshi Amemiya, Journal of Econometrics, Volume 24, 1984.  Here is the link.

A graph illustrating the Tobit model.

 

ML estimation for TOBIT model using EXCEL.

R code for the above, with more



Programmer success case; time to completion.  Follow-up analysis using EXCEL, with info on the lognormal distribution.

Excel spreadsheet showing ML estimation for upper censored data, using both normal and lognormal distributions.

Why is probability needed in the regression model?

29. 5/5 Survival analysis regression models; Cox proportional hazards model.

 

 

Read this summary by Maartin Buis of Vrije Universiteit Amsterdam.   Section 6, “Unobserved Heterogeneity” is optional (not required) reading.

Proportional hazards regression follow-up using excel, with comparison to lognormal regression.

 

Why is probability needed in the regression model?

30. 5/10

 

 

Sample selection bias (Heckman model and method)

 

Future semesters – more time on endogeneity issues, instrumental variable regression

Why is probability needed in the regression model?

Final Exam 

 

Old finals and solutions are available in the old courses link. But every semester is different.

 

Why is probability needed in the regression model?