ISQS
5349, Regression Analysis, Spring 2016
Course Syllabus
Old
midterms and finals
Class recordings
Supplemental books:
Practical
Regression and Anova using R, by Julian Faraway.
Probabilistic
Modeling in Computer Science, by Norm S. Matloff of UC Davis, a free book
licensed under creative commons. See Ch. 22 in particular on regression.
(I took my first course in regression analysis from Dr. Matloff around 1979!)
Helpful R materials:
Winston Chang’s Cookbook for R, a free book licensed under creative commons.
From UCLA’s Statistical Consulting Group:
swirl teaches you R programming and data science
interactively, at your own pace, and right in the R console!
http://www.ats.ucla.edu/stat/r/seminars/intro.htm
A start for this
class showing how to access data and do basic things
An overview of R for
statistics – everything you want, in a nutshell
A list of
useful R functions
http://www.cyclismo.org/tutorial/R/
http://www.rstudio.com/ide/docs/
http://ww2.coastal.edu/kingw/statistics/Rtutorials/
http://www.stat.auckland.ac.nz/~ihaka/120/Notes/ch03.pdf
(graphics, from a founder of R, Ross Ihaka)
Class
Topics (Rough schedule – this will change depending on student
presentations. Dr. Westfall will update the class regularly on the schedule
changes) 
Preparation
– Read and study everything in this column. There will be a quiz at the beginning
of class on the day listed. Refer
back to these documents repeatedly. Links within the links are recommended
and may aid understanding but are not required. 
R codes,
Homework, and etc. 
1. 1/21 Smoothing; Scatterplots, LOESS smoothers,
the classical regression model and its assumptions. 
Readings and videos. The quiz will be on 1/26 and
will count double. Approximating functions
(regression functions for us) by linear terms The regression
function as a conditional expectation (Section 1 only, but the rest is
good too) Approximating regression
functions using LOWESS in R Regression,
populations, and processes Read this
summary of the assumptions of the regression model Read the
Wikipedia entry (all of it) about the R^{2} statistic Read the Wikipedia
entry about the GaussMarkov theorem, up to the words “The theorem now
states that the OLS estimator is a BLUE.” 
Initial
regression analyses using R; introduction to some data sets we’ll be using. Models produce data. How
different do the data look when they are produced by the same model? How different do the data look when they
are produced by different models? 
2. 1/26
Maximum likelihood and least squares. The GaussMarkov theorem. (Today’s quiz covers all the readings for 1/21 and
1/26, and counts double) 
The quiz for Today’s class will be over two days’
worth of material, shown in the box above) and will count double 
Illustrating
the GaussMarkov property, both good and bad. 
3. 1/28 Exact Inferences in the classical parametric
regression model: pvalues, confidence and prediction intervals. 
Read this
discussion of a confidence interval for the slope, from Doug Stirling,
Massey University in Palmerston North, New Zealand. Read this document
on interpreting pvalues, by Dr.
Westfall. Read this
document on “Why you should never say “Accept Ho,” written by Dr.
Westfall Read this
discussion of confidence intervals for E(YX=x) versus Prediction interval for YX=x, from “Musings on Using and Misusing
Statistics,” by Martha K. Smith, retired UT professor. Read this
document on “Prediction and Generalization,” written by Dr. Westfall. Read this
document on “Confidence intervals and significance tests as predictions,”
written by Dr. Westfall. 
Confidence
intervals for slope and intercepts. Understanding the
standard error: a simulation study. Constructing
frequentist confidence and prediction intervals; GPA/SAT example and Toluca
example; Constructing the corresponding
Bayesian intervals using the same examples. 
4. 2/2 Checking the assumptions of the classic
model 
Read the document, “How
to check assumptions using data,” written by Dr. Westfall, and run the R
code therein. Read and run the code in the document, “Why
do assumptions matter?” written by Dr. Westfall Read the document, “Comments
on Transformations,” written by Dr. Westfall. 
HW 3, due 2/11. Write a report where
you show how to replicate all the analyses (all statistics, model equations,
tables and graphs) in this
paper using R rather than Minitab. Include all code and outputs, as well
as surrounding words, in a professional, clean, publicationquality document.
Use a “tutorial” style presentation, aimed at teaching a person (say not in
this class) how to do everything. (Note: it is not necessary to make the
graphs appear in the “panel” display as in the paper where they show all four
together. Instead, you can show them
all separately. Just be sure to label them clearly in your report, e.g.,
“Upper right graph of Figure 6.” Testing
for curvature using quadratic regression: Toluca, Peak Energy, Car Sales, and
Product Complexity examples 
5. 2/4
Using transformations to achieve a more reasonable model. The multiple
regression model. 
Read
this introduction to multiple regression analysis 
Lance Car
Sales example: Analysis of model using x^{1} transformation. 
6. 2/9 The multiple regression model, added
variable plots Note on presentations: Good example PowerPoint
student presentations can be found on last Fall’s ISQS 6348
page (multivariate analysis), in the left column. DO NOT use last year’s ISQS
5349 presentations as examples – they were trying to present as if at a
conference, rather than teaching the material (which was my fault because I
did not give adequate guidance.) 
Read about added
variable plots (also called partial regression plots) Read
Sections 17 of this matrix algebra preparation material, courtesy A. Colin Cameron, UC
Davis (whoo hoo! My alma mater). (Not required, but
if you find yourself needing additional selfstudy on matrix algebra, see the
SOS math tutorials, matrix0, matrix1, matrix2, and Introduction
to matrix algebra using R.
See
also Matrix and linear algebra
tutorials; See
also the MIT
Open courseware (a free online course with separate lectures for separate
linear and matrix algebra topics).) 
Multiple
regression analysis of how computer time to run a job relates to Ram and
processor speed. R code for
Sales vs. Int rate and Gas Price example: How curvature can be explained
by a third variable. Visualizing the
Multiple Regression model 3D and partial plots using EXCEL. 
7. 2/11
Matrix form of model, estimates, standard errors, t intervals and
tests. 
Read these
presentation slides by Carlos Carvalho, UT Austin. Read the document “Prediction
as association and prediction as causation,” written by Dr. Westfall. The
document shows that you cannot infer causation using the regression model. 
The
multivariate normal distribution (from Wikipedia) Information on
covariance matrices, from Wikipedia The various
matrices and regression calculations shown in R code. You can also see many of the
matrices using the "xpx" and "i" options in the
"MODEL" statement of PROC REG. 
8. 2/16
Causality 
Read “Causal
Inference using Regression on the Treatment Variable”, by Andrew
Gelman. 
The computer speed
example will be good for discussing causality. 
9. 2/18 Midterm 1 

10. 2/23 Multicollinearity The ANOVA table, the F test, and the
Rsquared statistic 
Read these
presentation slides, slides 16, by Alicia Carriquiry of Iowa State.
Read this
document on multicollinearity, by Dr. Westfall 
File to
illustrate problems with multicollinearity R code for diagnosis and
interpretation of multicollinear variables; also indicates one of many
potential solutions to the problem. 
11. 2/25 Interactions; the inclusion principle 
Read this document
about the inclusion principle, compiled by Timothy Hansen, U. South Carolina. Read this document about
interactions in regression by Kristopher J. Preacher of Vanderbilt
University. 
3d
graphs of interaction and noninteraction surfaces Examining
interactions – an R demo Moderator example,
from Karl Wuensch's web page http://core.ecu.edu/psyc/wuenschk/.
The publication is here. A "Handdrawn" graph using Excel of the moderating effect. File to illustrate
problems with violating the inclusion principle 
12. 3/1 Dummy variables, ANOVA, ANCOVA, ANCOVA
with interactions, graphical summaries 
Read this
document through slide 24. This is another one by Carlos Carvalho, UT Austin. Note: On Slide 10, top equation, he is missing the “i” subscript on the
“Exp” variable. 
ANOVA/ANCOVA,
first file – comparing GPAs of Male and Female students (twolevel
ANOVA/ANCOVA). ANOVA/ANCOVA,
second file – comparing GPAs of students different degree plans. 
13. 3/3 Variable and model selection 
Read
about the variance/bias tradeoff here. Comments: (1) Right below the first
graph: characterized à characterize
(2) Two sentences later, g(x) is called an “estimator” of f(x), and
this persists through the article.
This is not the usual usage of the term “estimator,” because an
estimator is a function of random data.
If you put a “^” on top of g(x) you could call it an estimator. Better
to call it a “candidate mean function” or “supposed model” or something like
that. Later on, though, he refers to g(x) as a function of data, and this is
indeed an estimator.

A simulation
example: Predicting Hans’ graduate GPA from GRE scores and Undergrad GPA. R code to demo
to demonstrate the danger of overfitting. R file for producing
and comparing nfold
crossvalidation statistics for different models All subsets
model selection for predicting doctors per capita using R. 
14. 3/8 Variable and model selection. . 
Read
this document. Realize that there
are Rcounterparts to all the SASspecific procedures. So you are responsible
for understanding all content that is softwareindependent, but you are not
responsible for knowing which SAS procedure does what. 2.
P. 2. The supposed problem that “models are too complex” when you perform
stepwise methods is strange. How can a model with fewer X variables be more
complex than a model with more X variables? I would simply ignore that point. 4.
P. 2, “White noise” means “iid normal.” there
are others that are not related.” à “Usually,
when one does a regression, at least one of the independent variables is
really related to the dependent variable, but there are others that have little
relationship.” 6.
P. 4. The comment about partial least squares really only applies to cases
where you have more than one Y variable; this is covered in ISQS 6348. Also,
their X’Y’ formula should be X’Y, and to be consistent with their model on
page 1, they should either have made this Y lower case or the one
on page 1 upper case. My suggestion would be the latter. 8.
In their simulation analyses they seem to be stating that lars and lasso are
good for protecting against the effects of outliers. That is not what they are designed to do,
although the fact that they "shrink" estimates towards zero means
that they will not give really crazy results with large coefficients just
because of an outlier. Still, they are not methods that will protect you
against all kinds of problems caused by outliers. 
R code to demo
to demonstrate the danger of overfitting. R file for producing
and comparing nfold
crossvalidation statistics for different models All
subsets model selection for predicting doctors per capita using R 
15. 3/10 Heteroscedasticity: WLS, ML estimation,
robust standard errors 
Update: Modified readings. I really like Cosma Shalizi
of CarnegieMellon. Expect to see more
from him. Read pages 19 from this
document. Comments: 2. The assumed functional form of the
heteroscedasticity, 1 + x^2/2, is somewhat unusual in that it makes the variance
decrease then increase as x ranges from its min to its max. This might make
sense in some cases where the X variable has zero in its range, as it does in
Cosma’s simulation, but for typical cases where the X and Y data are
positive, the heteroscedasticity function is either monotonically increasing
or monotonically decreasing, with no “down then up” behavior. 3. “The oracle of regression” is what I sometimes
refer to as “Hans.” Optional: Read this critique of
robust standard errors. The prof summarizes: If you think you have a problem with
heteroscedasticity, you probably have more serious problems and should not be
using OLS anyway. Using robust standard errors with OLS is like putting a
bright red bow on the head of a ugly pig lying in the stinking pig slop, and
then saying that the pig now looks pretty. 
First file
to illustrate benefit of Weighted Least Squares – shows imprecise
predictions of OLS in the presence of heteroscedasticity Comparison
of prediction limits: Homoscedastic vs. Heteroscedastic models – shows
OLS prediction limits are incorrect in the presence of heteroscedasticity Estimating
the heteroscedastic variance GE returns as a function of trading volume via
maximum likelihood using R . (Try different variance functions and
compare log likelihoods to see which fit better.) 
16. 3/22 Bootstrap 
Read this
most excellent document, written by Cosma Shalizi of CarnegieMellon.
This document provides an excellent summary of the main points about
probability models that I have emphasized in this course and in the ISQS 5347
course. Helpful tips: 

17. 3/24
Quantile regression and Winsorizing 
Read quantile
regression 
EXCEL
spreadsheet to explain the quantile estimation method 
18. 3/29
Generalized Least Squares, Correlated errors, Time series 
Read this
document from John Fox 
John Fox code; also
some code to show how the AR and MA datagenerating processes work. Simulation studies of
inefficiency and Type I error rates when using OLS and ML/GLS: An example with
autocorrelated data. 
19. 3/31 Intro to Mixed Effects Models. 
Read this
tutorial on lmer by Bodo Winter. 
Some code from
Bodo Winters’ tutorial 
20. 4/5
Repeated measures and random effects,
Bayesianstyle “shrinkage” estimates of random effects. 
Read also from this
article by Simon Jackman, Proposition 7.1 on p. 206207, for a more
direct representation of the Bayesian shrinkage estimate. 
Simulation
code to help understand the shrinkage estimates shown in Jackman’s papers. 
21. 4/7 Multilevel analysis 
Required: Read the Preface and Sections 1, 2, 3
and 7 from this
document. Here is the R code from Section
7. There are a couple mistakes in their syntax, but I was able to
replicate their results precisely once the code was cleaned up.
Optional: The Bliese
tutorial paper. You will find a
lot of this paper very useful as well, but it’s not required reading. But
Bliese wrote the R code and gives tutorials on the topic, so he is actually a
better source than the simpler reading that I assigned. 
Why
is probability needed in the regression model? A
multilevel regression of trends (random coefficients regression modelling) in
the TTU grad data. R code from Section
7 of the reading. 
22. 4/12 Finishing up mixed effects analyses  The
Hausman “test” for fixed versus random effects

The most important sentence in this document
appears on p. 11: "The Hausman test does not aid in evaluating this
tradeoff." Just found this
paper (optional reading), which gives a simple fix to the problem of
random effects being correlated with the X data. This makes the Hausman test
even more useless than is indicated by the main reading. Note also (optional) a
similar solution from Andrew Gelman. Take home message: Just like all tests for model
assumptions, such as tests for normality, homoscedasticity, etc., the Hausman
test for fixed versus random effects is not very useful. 
How to
perform the Hausman test using R. Code
showing the simulation method from the reading. 
23. 4/14 Binary regression models 
Read the Wikipedia page
on logistic regression – it’s pretty good.
Read up to, but not including the section “As a twoway latent
variable model” (even though, believe it or not, Daniel McFadden won a Nobel
prize in 2000 for the material in that section). Need some % change effects – Bayesian

Some
logistic regression curves Code
for inclass project: Bayesian logistic regression of “Trashball” – a
variation of basketball. Code for finding the pvalue for the
likelihood ratio test: 
24. 4/19 Ordinal response regression models 
Read this
document by Paul Johnson at KU. It’s kind funny, as well as informative,
and thankfully, mostly correct. Some issues: This is the wrong interpretation of a continuous
density. It is correct for a discrete
distribution, but recall that "the probability that ei is equal to some
particular value" is actually zero when ei is continuously
distributed. 2. p. 5, right before equation (11), should read,
“More
succinctly, for k+1 categories, we would write…” 
HW4 notes
(much courtesy of one student group in this semester’s class). BTW, all the
stuff about probability (expected values, simulation, parameter interpretation,
“by chance alone” that you may be struggling with will form the bulk of the
final. So please pay more attention to these concepts. R code for ordinal
logistic (or probit) regression. rating =
c(1, 1, 2, 2, 1, 3, 3, 1, 1, 2, 3, 2, 1, 2) salary =
c(15, 30, 20, 30, 40, 45, 49, 16, 20, 40, 55, 56, 24, 31) ordd =
data.frame(rating, salary) library(MASS) ord.logit
= polr(as.factor(rating) ~ salary, data = ordd) ord.probit
= polr(as.factor(rating) ~ salary, data = ordd, method="probit") cbind(summary(ord.logit)$fitted.values,
ordd) Some graphical
presentations using Excel. Comparing Probit and
Logit distribution Functions 
25. 4/21 Poisson, negative binomial and other
count data regression models. 
Read about the Poisson
regression, through Section 3.2. Read about Negative
binomial regression, too. Supplemental
reading on count data models in R (not required but excellent) 
More on the
latent variable formulation Simulating
and analyzing data from Poisson and Negative binomial regression models. Analysis of
experimental data on wine sales using count data models. 
26. 4/26 GLMMs with repeated or hierarchical data
structures 
Read this article from
the UCLA Institute for Digital Research and Education 
Summary of Generalized
Linear Mixed Models, from the UCLA Institute for Digital Research and
Education 
27. 4/28 Nominal response regression models 
Read this article from
the UCLA Institute for Digital Research and Education (Thank you UCLA/IDRE for supplying such
nice materials! You guys rock!) Optional reading: “Applying
discrete choice models to predict Academy Award winners, J. R. Statist. Soc.
A (2008), 375394, by Pardoe and Simonton. 
Multinomial
logistic regression using R. Summaries
using EXCEL. 
28. 5/3 Tobit and censored regression models. 
Update, 4/17. Read about censored regression here. Optional: Read sections 1,2,3 and 5 of “Tobit Models: A Survey,” By Takeshi
Amemiya, Journal of Econometrics,
Volume 24, 1984. Here
is the link. 
A graph illustrating the Tobit model. R code for
the above, with more

29. 5/5 Survival analysis regression models; Cox
proportional hazards model. 
Read this summary by Maartin Buis
of Vrije Universiteit Amsterdam.
Section 6, “Unobserved Heterogeneity” is optional (not required)
reading. 
Proportional hazards
regression followup using
excel, with comparison to lognormal regression. 
30. 5/10 
Sample selection
bias (Heckman model and method) Future semesters – more time on endogeneity
issues, instrumental variable regression 

Final Exam 5/12, 4:307:30PM 
Old finals and solutions are available in the old
courses link. But every semester is different. 