ISQS
5349, Spring 2016
Course Syllabus
Old
midterms and finals
Class recordings
Supplemental books:
Practical
Regression and Anova using R, by Julian Faraway.
Probabilistic
Modeling in Computer Science, by Norm S. Matloff
of UC Davis, a free book licensed under creative commons.
Helpful R materials:
From UCLA’s
Statistical Consulting Group: http://www.ats.ucla.edu/stat/r/seminars/intro.htm
A start for this
class showing how to access data and do basic things
An overview of R for
statistics – everything you want, in a nutshell
A list of
useful R functions
http://www.cyclismo.org/tutorial/R/
http://www.rstudio.com/ide/docs/
http://ww2.coastal.edu/kingw/statistics/Rtutorials/
http://www.stat.auckland.ac.nz/~ihaka/120/Notes/ch03.pdf
(graphics, from a founder of R, Ross Ihaka)
Class
Topics (Rough schedule – this will change depending on student
presentations. Dr. Westfall will update the class regularly on the schedule
changes) 
Preparation
– Read and study everything in this column. There will be a quiz at the beginning
of class on the day listed. Refer back
to these documents repeatedly. Links within the links are recommended and may
aid understanding but are not required. 
R codes,
Homework, and etc. 
1. 1/21 Smoothing; Scatterplots, LOESS smoothers,
the classical regression model and its assumptions. 
Nature favors continuity over discontinuity: Review Example 7.5
of Chapter 7 of Understanding Advanced
Statistical Methods, by Westfall and Henning, to refresh your memory as
to what a conditional distribution is, and as to why we do not use the word
“population” in statistical modeling. Read this discussion
of the “Model produces data” concept as it related to the regression model,
by Dr. Westfall. This provides further explanation of conditional
distributions and of the flawed “population” terminology in statistical
models. Also read this
discussion of the “Model produces data” concept as it related to the
regression model, by Dr. Westfall. Hopefully, the points made in ISQS 5347
will be abundantly refreshed by now! Read this introduction to LOESS: LOESS curves are estimates of g(x), the
“center” of the distribution of Y for each X=x. These curves are used to
diagnose functional form of the regression model. In particular, they are excellent tools to
diagnose the adequacy of the assumption that g(x) is a linear function. 
HW 1, due Thursday 1/22 Initial
regression analyses using R; introduction to some data sets we’ll be using. Models produce data. How
different do the data look when they are produced by the same model? How different do the data look when they
are produced by different models? 
2. 1/26 Maximum likelihood and least
squares. The GaussMarkov theorem. (Today’s quiz covers all the readings for 1/15 and
1/20, and counts double) 
The quiz for Today’s class will be over two days’ worth
of material, 1/16 (in the box above) and 1/21 (in this box). This quiz will
count double. Read this
summary of the assumptions of the regression model, by Dr. Westfall. Review this material
from ISQS 5347. Read the Wikipedia
entry (all of it) about the R^{2} statistic (also called the
coefficient of determination). Read the Wikipedia
entry about the GaussMarkov theorem, up to the words “The theorem now
states that the OLS estimator is a BLUE.” Focus your attention what the
theorem is saying. In particular, focus your attention on the idea of what a
“linear estimator” is, and how OLS is the best among all other linear
unbiased estimators. 
Illustrating
the GaussMarkov property, both good and bad. 
3. 1/28 Exact Inferences in the classical parametric
regression model: pvalues, confidence and prediction intervals. 
Read this
discussion of a confidence interval for the slope, from Doug Stirling, Massey University in Palmerston North, New
Zealand. Read this
document on interpreting pvalues,
by Dr. Westfall. Read this
document on “Why you should never say “Accept Ho,” written by Dr.
Westfall Read this
discussion of confidence intervals for E(YX=x) versus Prediction interval for YX=x, from “Musings on Using and Misusing
Statistics,” by Martha K. Smith, retired UT professor. Read this
document on “Prediction and Generalization,” written by Dr. Westfall. Read this
document on “Confidence intervals and significance tests as predictions,”
written by Dr. Westfall. 
Confidence
intervals for slope and intercepts. Understanding
the standard error: a simulation study. Constructing
frequentist confidence and prediction intervals; GPA/SAT example and Toluca
example; Constructing the corresponding
Bayesian intervals using the same examples. 
4. 2/2 Checking the assumptions of the classic
model 
Read the document, “How
to check assumptions using data,” written by Dr. Westfall, and run the R
code therein. Read and run the code in the document, “Why
do assumptions matter?” written by Dr. Westfall Read the document, “Comments
on Transformations,” written by Dr. Westfall. 
Testing
for curvature using quadratic regression: Toluca, Peak Energy, Car Sales, and
Product Complexity examples 
5. 2/4 Using transformations to achieve a
more reasonable model. The multiple regression model. 
Read 19, 1114 of these presentation
slides, by
William G. Jacoby, Department of Political Science, Michigan
State. Read
this introduction to multiple regression analysis 
Lance Car
Sales example: Analysis of model using x^{1} transformation. 
6. 2/9 The multiple regression model, added
variable plots Note on presentations: Good example PowerPoint
student presentations can be found on last Fall’s ISQS 6348 page
(multivariate analysis), in the left column. DO NOT use last year’s ISQS 5349
presentations as examples – they were trying to present as if at a
conference, rather than teaching the material (which was my fault because I
did not give adequate guidance.) 
Read about added
variable plots (also called partial regression plots) Read
Sections 17 of this matrix algebra preparation material, courtesy A. Colin Cameron, UC
Davis (whoo hoo! My alma
mater). (Not required, but
if you find yourself needing additional selfstudy on matrix algebra, see the
SOS math tutorials, matrix0, matrix1, matrix2, and Introduction
to matrix algebra using R.
See
also Matrix and linear
algebra tutorials; See
also the MIT
Open courseware (a free online course with separate lectures for separate
linear and matrix algebra topics).) 
Multiple
regression analysis of how computer time to run a job relates to Ram and
processor speed. R code for
Sales vs. Int rate and Gas Price example: How curvature can be explained
by a third variable. Visualizing the
Multiple Regression model 3D and partial plots using EXCEL. 
7. 2/11 Matrix form of model, estimates,
standard errors, t intervals and tests. 
Read these
presentation slides by Carlos Carvalho, UT
Austin. Notes: (i) slide 7: it is more common that the first subscript denote
row and the second denote column. He
has it reversed. (ii) slide 9: The term “standard
error” here is different from the “standard error” of the beta estimate. It refers to the estimated conditional
standard deviation, and is called “residual standard error” in R. (iii) Slide
11. The “Yhat” inside the summation expression needs an "i"
subscript” as shown correctly on the next line down. (iv) slide
15: the last distribution should be the conditional distribution given the X
data, because the S_{b}_{ }covariance
matrix is a function of the specific observed X data ((X’X)^{1} in
particular. (The unconditional distribution would involve the average of the
(X’X)^{1} matrix over the distribution of X.) Read this document on the
matrix form of the regression model, from François Nielsen of UNCChapel
Hill, sections 1, 2, 3, 4, 5.1 – 5.4, 5.6.1, 5.6.2. Notes: (i) A slight quibble: He states in 5.6.1 that the standard error
is the standard deviation of the sampling distribution of the estimator. This is not quite true, because the true
standard deviation involves the unknown conditional standard deviation of YX=x,
which we call s. The
standard error is computed by plugging in an estimate of this s into the formula for the standard deviation of the sampling
distribution of the estimator. In addition, as in the first document, these
distributions should be labeled as conditional upon the X data. (ii) On page
7, bottom of the page, it should read “If this relation only holds when all
the λ 's are zero then the columns are linearly
independent.” Read the document “Prediction
as association and prediction as causation,” written by Dr. Westfall. The
document shows that you cannot infer causation using the regression model. 
The
multivariate normal distribution (from Wikipedia) Information on
covariance matrices, from Wikipedia The various
matrices and regression calculations shown in R code. You can
also see many of the matrices using the "xpx"
and "i" options in the "MODEL" statement of PROC REG. 
8. 2/16
Causality 
Read Linear
Statistical Models for Causation: A Critical Review, by the late David Freedman.
The document shows that you can infer causation from the “response
schedule” regression model, also called the NeymanRubin
causal model. But the assumptions cannot be verified. Note: You can skip the material from p. 4,
starting “If we are trying to find laws of nature that are stable under intervention, standardizing may be a bad idea,” through the end of the
section on “Hooke’s Law” on p. 5. It
is interesting material, but somewhat distracts from the main points about
causality. Also, be careful about taking words out of
context. When Freedman writes “Thus, DOut has a causal influence on DPaup, …” it should be clear from the
thrust of his article and the context of the sentence that he means something
like “Thus, the naïve social
scientist may wish to claim that DOut has
a causal influence on DPaup,
…” 
The computer speed
example will be good for discussing causality. Also, the Yule
data is here. 
9. 2/18 Midterm 1 

10. 2/23 Multicollinearity The ANOVA table, the F test, and the
Rsquared statistic 
Read these
presentation slides, slides 16, by Alicia Carriquiry
of Iowa State.
Read this
document on multicollinearity, by Dr. Westfall 
File to
illustrate problems with multicollinearity R code for diagnosis and
interpretation of multicollinear variables; also indicates one of many
potential solutions to the problem. 
11. 2/25 Interactions; the inclusion principle 
Read this document
about the inclusion principle, compiled by Timothy Hansen, U. South Carolina. Read this document about
interactions in regression by Kristopher J. Preacher of Vanderbilt
University. 
3d
graphs of interaction and noninteraction surfaces Examining
interactions – an R demo Moderator example,
from Karl Wuensch's web page http://core.ecu.edu/psyc/wuenschk/.
The publication is here. A "Handdrawn" graph using Excel of the moderating effect. File to illustrate
problems with violating the inclusion principle 
12. 3/1 Dummy variables, ANOVA, ANCOVA, ANCOVA
with interactions, graphical summaries 
Read this
document through slide 24. This is another one by Carlos Carvalho, UT Austin. Note: On Slide 10, top equation, he is missing the “i” subscript on the
“Exp” variable. 
ANOVA/ANCOVA,
first file – comparing GPAs of Male and Female students (twolevel
ANOVA/ANCOVA). ANOVA/ANCOVA,
second file – comparing GPAs of students different degree plans. 
13. 3/3 Variable and model selection 
Read
about the variance/bias tradeoff here. Comments: (1) Right below the first
graph: characterized à characterize
(2) Two sentences later, g(x) is called an “estimator” of f(x), and
this persists through the article.
This is not the usual usage of the term “estimator,” because an
estimator is a function of random data.
If you put a “^” on top of g(x) you could call it an estimator. Better
to call it a “candidate mean function” or “supposed model” or something like
that. Later on, though, he refers to g(x) as a function of data, and this is
indeed an estimator.

A simulation
example: Predicting Hans’ graduate GPA from GRE scores and Undergrad GPA. R code to demo
to demonstrate the danger of overfitting. R file for producing
and comparing nfold
crossvalidation statistics for different models All
subsets model selection for predicting doctors per capita using R. 
14. 3/8 Variable and model selection. . 
Read
this document. Realize that there
are Rcounterparts to all the SASspecific procedures. So you are responsible
for understanding all content that is softwareindependent, but you are not
responsible for knowing which SAS procedure does what. 2.
P. 2. The supposed problem that “models are too complex” when you perform
stepwise methods is strange. How can a model with fewer X variables be more
complex than a model with more X variables? I would simply ignore that point. 4.
P. 2, “White noise” means “iid normal.” there are others that are not related.”
à “Usually,
when one does a regression, at least one of the independent variables is
really related to the dependent variable, but there are others that have little
relationship.” 6.
P. 4. The comment about partial least squares really only applies to cases
where you have more than one Y variable; this is covered in ISQS 6348. Also,
their X’Y’ formula should be X’Y, and to be consistent with their model on
page 1, they should either have made this Y lower case or the one
on page 1 upper case. My suggestion would be the latter. 8.
In their simulation analyses they seem to be stating that lars and lasso are
good for protecting against the effects of outliers. That is not what they are designed to do,
although the fact that they "shrink" estimates towards zero means
that they will not give really crazy results with large coefficients just
because of an outlier. Still, they are not methods that will protect you
against all kinds of problems caused by outliers. 
R code to demo
to demonstrate the danger of overfitting. R file for producing
and comparing nfold
crossvalidation statistics for different models All
subsets model selection for predicting doctors per capita using R 
15. 3/10 Heteroscedasticity: WLS, ML estimation,
robust standard errors 
Update: Modified readings. I really like Cosma Shalizi of
CarnegieMellon. Expect to see more
from him. Read pages 19 from this
document. Comments: 2. The assumed functional form of the
heteroscedasticity, 1 + x^2/2, is somewhat unusual in that it makes the variance
decrease then increase as x ranges from its min to its max. This might make
sense in some cases where the X variable has zero in its range, as it does in
Cosma’s simulation, but for typical cases where the
X and Y data are positive, the heteroscedasticity function is either
monotonically increasing or monotonically decreasing, with no “down then up”
behavior. 3. “The oracle of regression” is what I sometimes
refer to as “Hans.” Optional: Read this critique of
robust standard errors. The prof summarizes: If you think you have a problem with
heteroscedasticity, you probably have more serious problems and should not be
using OLS anyway. Using robust standard errors with OLS is like putting a
bright red bow on the head of a ugly pig lying in
the stinking pig slop, and then saying that the pig now looks pretty. 
First file
to illustrate benefit of Weighted Least Squares – shows imprecise
predictions of OLS in the presence of heteroscedasticity Comparison
of prediction limits: Homoscedastic vs. Heteroscedastic models – shows
OLS prediction limits are incorrect in the presence of heteroscedasticity Estimating
the heteroscedastic variance GE returns as a function of trading volume via
maximum likelihood using R . (Try different variance functions and
compare log likelihoods to see which fit better.) 
16. 3/22 Bootstrap 
Read this
most excellent document, written by Cosma Shalizi of CarnegieMellon. This document provides an
excellent summary of the main points about probability models that I have
emphasized in this course and in the ISQS 5347 course. Helpful tips: 

17. 3/24
Quantile regression and Winsorizing 
Read quantile
regression 
EXCEL
spreadsheet to explain the quantile estimation method 
18. 3/29
Generalized Least Squares, Correlated errors, Time series 
Read this
document from John Fox 
John Fox code;
also some code to show how the AR and MA datagenerating processes work. Simulation
studies of inefficiency and Type I error rates when using OLS and
ML/GLS: An example
with autocorrelated data. 
19. 3/31 Intro to Mixed Effects Models. 
Read this
tutorial on lmer by Bodo Winter. 
Some code from
Bodo Winters’ tutorial 
20. 4/5 Repeated measures and random
effects, Bayesianstyle “shrinkage”
estimates of random effects. 
Read also from this
article by Simon Jackman, Proposition 7.1 on p. 206207, for a more
direct representation of the Bayesian shrinkage estimate. 
Simulation
code to help understand the shrinkage estimates shown in Jackman’s papers. 
21. 4/7 Multilevel analysis 
Required: Read the Preface and Sections 1, 2, 3
and 7 from this
document. Here is the R code from Section
7. There are a couple mistakes in their syntax, but I was able to
replicate their results precisely once the code was cleaned up.
Optional: The Bliese tutorial paper. You will find a lot of this paper very
useful as well, but it’s not required reading. But Bliese
wrote the R code and gives tutorials on the topic, so he is actually a better
source than the simpler reading that I assigned. 
Why
is probability needed in the regression model? A multilevel
regression of trends (random coefficients regression modelling) in the TTU
grad data. R code from Section
7 of the reading. 
22. 4/12 Finishing up mixed effects analyses  The
Hausman “test” for fixed versus random effects

The most important sentence in this document
appears on p. 11: "The Hausman test does not aid in evaluating this
tradeoff." Just found this
paper (optional reading), which gives a simple fix to the problem of
random effects being correlated with the X data. This makes the Hausman test
even more useless than is indicated by the main reading. Note also (optional) a
similar solution from Andrew Gelman. Take home message: Just like all tests for model
assumptions, such as tests for normality, homoscedasticity, etc., the Hausman test for fixed versus random effects is not very
useful. 
How to
perform the Hausman test using R. Code
showing the simulation method from the reading. 
23. 4/14 Binary regression models 
Read the Wikipedia page
on logistic regression – it’s pretty good.
Read up to, but not including the section “As a twoway latent
variable model” (even though, believe it or not, Daniel McFadden won a Nobel
prize in 2000 for the material in that section). Need some % change effects – Bayesian

Some
logistic regression curves Code
for inclass project: Bayesian logistic regression of “Trashball” – a
variation of basketball. Code for finding the pvalue for the
likelihood ratio test: 
24. 4/19 Ordinal response regression models 
Read this
document by Paul Johnson at KU. It’s kind funny, as well as informative,
and thankfully, mostly correct. Some issues: This is the wrong interpretation of a continuous
density. It is correct for a discrete
distribution, but recall that "the probability that ei
is equal to some particular value" is actually zero when ei is continuously distributed. 2. p. 5, right before equation (11), should read,
“More
succinctly, for k+1 categories, we would write…” 
HW4 notes
(much courtesy of one student group in this semester’s class). BTW, all the
stuff about probability (expected values, simulation, parameter
interpretation, “by chance alone” that you may be struggling with will form
the bulk of the final. So please pay more attention to these concepts. R code for ordinal
logistic (or probit) regression. rating =
c(1, 1, 2, 2, 1, 3, 3, 1, 1, 2, 3, 2, 1, 2) salary =
c(15, 30, 20, 30, 40, 45, 49, 16, 20, 40, 55, 56, 24, 31) ordd = data.frame(rating, salary) library(MASS) ord.logit = polr(as.factor(rating)
~ salary, data = ordd) ord.probit = polr(as.factor(rating)
~ salary, data = ordd, method="probit") cbind(summary(ord.logit)$fitted.values, ordd) Some graphical
presentations using Excel. Comparing Probit and
Logit distribution Functions 
25. 4/21 Poisson, negative binomial and other
count data regression models. 
Read about the Poisson
regression, through Section 3.2. Read about Negative
binomial regression, too. Supplemental
reading on count data models in R (not required but excellent) 
More on the
latent variable formulation Simulating
and analyzing data from Poisson and Negative binomial regression models. Analysis of
experimental data on wine sales using count data models. 
26. 4/26 GLMMs with repeated or hierarchical data
structures 
Read this article from
the UCLA Institute for Digital Research and Education 
Summary of Generalized
Linear Mixed Models, from the UCLA Institute for Digital Research and
Education 
27. 4/28 Nominal response regression models 
Read this article from
the UCLA Institute for Digital Research and Education (Thank you UCLA/IDRE for supplying
such nice materials! You guys rock!) Optional reading: “Applying
discrete choice models to predict Academy Award winners, J. R. Statist. Soc.
A (2008), 375394, by Pardoe and Simonton. 
Multinomial
logistic regression using R. Summaries
using EXCEL. 
28. 5/3 Tobit and censored regression models. 
Update, 4/17. Read about censored regression here. Optional: Read sections 1,2,3
and 5 of “Tobit Models: A Survey,” By
Takeshi Amemiya, Journal of Econometrics, Volume 24, 1984. Here
is the link. 
A graph illustrating the Tobit model. R code for
the above, with more

29. 5/5 Survival analysis regression models; Cox
proportional hazards model. 
Read this summary by Maartin Buis of Vrije Universiteit
Amsterdam. Section 6, “Unobserved
Heterogeneity” is optional (not required) reading. 
Proportional hazards regression followup
using
excel, with comparison to lognormal regression. 
30. 5/10 
Sample selection
bias (Heckman model and method) Future semesters – more time on endogeneity issues, instrumental variable regression 

Final Exam 
Old finals and solutions are available in the old
courses link. But every semester is different. 