4 credit hours (CRN: 42787)
Semester: Spring 2024 (January 8-May 3; Exam week: April 29 - May 3) Class Time**: Mondays 2:00-5:00pm Classroom**: Blackboard Zoom Lab room: ETMSW 2027 (1040 W. Harrison St.) |
Professor: George Karabatsos (home page)
Phone: 312-413-1816E-mail: georgek@uic.edu (most reachable) Office Hours: Monday 10am-12pm Send me email at georgek@uic.edu to schedule an appointment to meet. |
**Classroom: Online asynchronous course. A new video lecture is presented and recorded through Zoom Cloud
(in BlackBoard)
every Monday 2:00-5:00pm during the semester. Students may either attend these lectures live, while asking any questions;
or otherwise, may watch the class (lecture) recordings which will be made available in BlackBoard by Monday 9pm.
Course Description:
This course introduces students to Hierarchical Linear Models, which
are mixture models that are widely applied in education, psychology,
medicine, and other fields. Sometimes, they are referred to as
random-effects models. But more generally, a hierarchical model not
only treats the dependent variable observations as random, but also
treats model parameters (e.g., regression coefficients, error variance
parameter) as random variables, that follow some distribution (e.g.,
normal, or gamma, etc.). In a Bayesian statistical framework, all model
parameters are treated as random variables, arising from a prior
distribution.
A Hierarchical Linear Model (HLM) can be viewed as having a nested structure, in that the model allows regression
coefficients to vary from one context to another. For example, in educational
research, a HLM is often used to analyze data about student
math achievement. Here, students are nested within schools, and the model permits
the investigation of the relationship between student socioeconomic status and
math achievement, by school, and allows the investigation of school-level factors
that affect this relationship. To give another example, for longitudinal data
analysis, the HLM provides an approach to learn the growth curve of each individual
subject, and also provide a way to identify the subject-level predictor variables
that significantly predict changes in the subjects' growth curves.
The HLM provides a single flexible framework for statistical
modeling that applies to many important tasks of data analysis, including:
(1) analysis of variance (ANOVA),
analysis of covariance (ANCOVA),
(2) random-coefficients regression analysis,
(3) categorical data analysis (rating data, or classification involving unordered categories),
(4) longitudinal (repeated-measures) analysis,
(5) meta-analysis,
(6) causal inference in nonrandomized studies,
(7) spatial regression,
(8) the analysis of censored data (e.g., for survival analysis),
(9) psychometric analysis with predictor variables
(e.g., Rasch models, random item IRT).
This course will present various Hierarchical Linear Models from both a Bayesian
and a frequentist perspective of statistical inference.
Moreover, this course will also present nonparametric Hierarchical Linear Models,
which provide a way to circumvent the empirically parametric assumptions of "off-the-shelf"
versions of Hierarchical Linear Models. These assumptions include the normality of error distribution,
the normality of random effects, and the (inverse-)link function being defined by the standard
logistic distribution.
All Hierarchical Linear Models discussed in the course will be illustrated
on data sets arising from education, psychology, medicine, and other fields.
Through in-class exercises (which count as credit towards three open-notes
exams), students will learn how to perform data analysis with Hierarchical Linear
Models using the HLM software, and also using the free R software with packages nlme and lme4.
Students can use other software, such as SAS and Stata.
Also, I provide a free menu-driven Bayesian Regression software that I developed,
which can be used to perform data analysis using Bayesian hierarchical
models.
They include 2-level and 3-level mixture models, with mixing
distribution modeled either normally, or modeled nonparametrically as
an infinite mixture of distributions.
Go to the Bayesian Regression software webpage to download and install the software pckage on your personal computer.
It is possible to use the Bayesian Regression and SPSS software for data management, and for the imputation of missing data.
Also, any Windows/PC software can be used in a Mac computer using the Parallels software,
which can be purchased by UIC students at the U of I webstore.
Course Prerequisites: A previous course covering multiple regression,
or equivalents.
Textbook:
Raudenbush, S., & Bryk, A.S. (2002). Hierarchical Linear Models: Applications
and Data Analysis Methods. Thousand Oaks, CA: Sage. ISBN 0-7619-1904-X.
Suggested (optional) Readings:
Optional Article readings:
De Iorio, M., Mueller, P., Rosner, G.L., and MacEachern. S.N. (2004). An ANOVA Model for Dependent Random Measures. Journal of the American Statistical Association, 99, 205-215.
Imbens, G.W., and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142, 615-635.
Karabatsos, G. (2020). Fast search and estimation of Bayesian nonparametric mixture models using a Classification Annealing EM Algorithm. Journal of Computational and Graphical Statistics.
Karabatsos, G., and Walker, S.G. (2012). Adaptive-Modal Bayesian Nonparametric Regression. Electronic Journal of Statistics, 6, 2038-2068.
Karabatsos, G., and Walker, S.G. (2012). A Bayesian nonparametric causal model. Journal of Statistical Planning and Inference, 142, 925-934.
Karabatsos, G., and Walker, S.G. (2012). Bayesian nonparametric mixed random utility
models. Computational Statistics & Data Analysis, 56, 1714-1722.
Karabatsos, G., Talbott, E., and Walker, S.G. (2015). A Bayesian nonparametric meta-analysis model. Research Synthesis Methods, 6, 28-44.
Karabatsos, G. (2017, in press). Marginal
Maximum Likelihood Estimation Methods for the Tuning Parameters of
Ridge, Power Ridge, and Generalized Ridge Regression.
Communication in Statistics: Simulation and Computation.
Kleinman K., & Ibrahim J.G. (1998) A semi-parametric Bayesian approach to
the random effects model. Biometrics, 54, 921-938.
Kleinman, K.P. & Ibrahim, J.G. (1998). A semiparametric Bayesian approach
to generalized linear mixed models. Statistics In Medicine, 17,
2579-2596.
Robins, J.M., Hernan, M.A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560.
Savitz, N.V., & Raudenbush, S.W. (2009). Exploiting spatial
dependence to improve measurement of neighborhood social processes. Sociological Methodology, 39, 151-183.
Stuart, E.A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25, 1-21.
Optional book readings:
Cooper, H., Hedges, L.V., & Valentine, J.C. (2009). The Handbook of Research Synthesis and Meta Analysis. New York: Russell Sage.
Fox, J. (2008). A Mathematical Primer for Social Statistics. Thousand Oaks, CA: Sage.
Gelman, A., and Hill, J.
Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge.
Raudenbush, S., Bryk, A., Cheong, Y.-F., & Congdon, R. (2004). HLM 6:
Hierarchical Linear and Nonlinear modeling. Lincolnwood, IL: Scientific
Software International. Available as pdf in the course.
Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The Handbook of Research Rynthesis (pp. 231-244). New York: Russell Sage.
Ward, M.D. & Gleditsch, K.S. (2008). Spatial Regression Models.
Los Angeles: Sage.
Also, I will provide notes on various related topics.
Journal Publications of students who have taken this HLM course in the past:
Arenson, E., and Karabatsos, G. (2018). A Bayesian beta-mixture model for nonparametric IRT (BBM-IRT). Journal of Modern Applied Statistical Methods, 1.
Brow, M. (2018). Significant
predictors of mathematical literacy for top‐tiered countries/economies,
Canada, and the United States on PISA 2012: Case for the sparse
regression model. British Journal of Educational Psychology.
Ding, D., and Karabatsos, G. (2021). Dirichlet Process Mixture Models with Shrinkage Prior. Stat, 10: e371.
Fujimoto, K., and Karabatsos, G. (2014). Dependent Dirichlet Process Rating Model (DDP-RM). Applied Psychological Measurement, 38, 217-228.
Kaminski-Ozturk, N., and Karabatsos, G. (2017). A Bayesian robust IRT outlier detection model. Applied Psychological Measurement, 41, 195-208.
Muckle, T., and Karabatsos, G. (2009). Hierarchical generalized linear models
for the analysis of judge ratings. Journal of Educational Measurement, 46, 198-219.
Talbott, E., Zurheide, J. L., Karabatsos, G., & Kumm, S. (2020). Similarity in Teacher Ratings of the Externalizing Behavior of Twins: A Meta-Analysis. Behavioral Disorders.
Tang, X., Karabatsos, G., & Chen, H. (2020). Detecting local dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) approach for polytomous items. Applied Measurement in Education.
Assignments to earn a grade:
-- Three take-home exams (including mid-term and final exam), which
involve applications of HLM models to analyze various data sets.
The take-home exams may be completed in a collaborative fashion, with other students in the class.
-- One in-class presentation or final paper (your choice), occurring towards the end of the semester.
The presentation or paper will describe an application of one or more HLM models, for data analysis.
A suggested outline for the presentation or paper is provided below, under the course schedule.
Key Dates of the Semester:
January 8, M, Instruction begins.
January 15, M, Martin Luther King Jr. Day.
March 18-22, M–F, Spring vacation. No classes.
April 26, F, Instruction ends.
April 29-May 3, M–F, Final exam week.
COURSE SCHEDULE
Week | Topic |
Read
|
1 | Assignments/tasks. Introduction and motivation for Hierarchical Linear Models. | |
2 | Hierarchical Linear Models: HLMs for continuous outcomes | Ch. 1-2 |
3 | HLM for one-way random-effects ANOVA, ANCOVA, ordinary linear regression,
Random Coefficients regression, HLM for Means-as-outcomes model, model for non-random varying slopes, and Full HLM |
Ch. 3-5 |
4 | Hierarchical Linear Models: Maximum Likelihood estimation and model fit assessment. |
Ch.9,
13-14 |
5 | HLM for Repeated Measures and longitudinal analysis. 3-level HLM. | Ch. 6, 8 |
6 | Bayesian Hierarchical Linear Models: HLMs for continuous outcomes.Bayesian estimation. Exam #1 is due by Saturday at the end of this week. |
Ch. 13 |
7 | Imputation of missing data. Power analysis. | |
8 | Generalized HLM. For binary outcomes. |
Ch.
10-12 |
9 | Generalized HLM. For binomial and Poisson counts outcomes, ordinal outcomes, and categorical (unordered) outcomes. |
Ch. 10 |
10 | Bayesian HLM (parametric and nonparametric models) |
Ch. 7 |
11 | Spring Break, No class this week. | |
12 | HLM for Meta-Analysis. | |
13 | HLM analysis of Censored data. Exam #2 (along with statement of data set for Final Presentation or Paper) is due by Saturday at the end of this week. |
|
14 | HLM for Causal Analysis in nonrandomized studies: Propensity score and Regression discontinuity methods. |
|
15 | HLM and Semiparametric HLM for Psychometric Analysis. Spatial regression. | |
16 | Student Presentations (also, I will discuss other topics) | |
Exam Week |
Exam #3 due by May 1, Wednesday 11:59pm. Final Powerpoint Presentation document or Final Paper document also due by Wednesday 11:59pm (depending on whether you chose to do a final presentation or paper). |
Grading Policy:
The three take-home exams, together, are worth 70% of the final grade, and the Final Presentation or Final Paper (your choice)
is worth 30% of the final grade.
Each exam involves addressing data analysis tasks using HLM methods described in the course.
The second exam will require a written one-paragraph description of the
data set that will be used for the Final Presentation or Paper.
Final grades will be given out according to the following scale:
A |
90% - 100%
|
B |
79% - 89%
|
C |
68% - 78%
|
D |
57% - 67%
|
F |
56% - Lower
|
Borderline grades will be decided on the basis of class participation.
Each assignment submitted late receive a 20% grade reduction for each week after the original due date.
Students will spend substantial amounts of time reading, and on the computer.
It is assumed that students will exert individual initiative in solving computing/analysis
problems as they arise.
(Standard policies: There are no exceptions to the above grading scale, and no
extra credit work will be accepted.
Incomplete grades will be considered only for
students with extenuating circumstances.
Poor performance on assignments will
not be considered in a request for an incomplete).
Data Analyses Presentation or Paper:
For the final assignment, you have the option to do a 25 minute live presentation (about 15 Power-Point
slides; no more)
or a 8-page double-spaced paper, discussing an application of a
HLM for the analysis of a real data set(s).
The presentation or paper should use an outline that contains (at least) the following items:
INTRODUCTION
Describe the substantive research questions or open problems that your study will address (10 points).
METHODS
-- Describe sample characteristics. (5 points)
-- Fully describe the HLM model you will use to answer your research questions
(using words and mathematical notation), and include a discussion of the assumptions
of your model. (10 points)
-- Describe the parameters will you interpret to answer your research questions.
(10 points)
-- Use the appropriate coding for all the predictor variables in your model. (10
points)
-- Use the model that is appropriate for the type of dependent variable you will
analyze. (15 points)
RESULTS - Accurately describe all the relevant results of your HLM model, including all significant and
non-significant effects at Level 1 and Level 2 of the model. (25 points)
DISCUSSION - What are the implications of the results of your study, and potential
directions for future research?
(10 points).
Students are encouraged to present or write a paper about an analysis of their own data set.
If you need to find a data set, rich data sets can be found online, on world-wide assessments of educational progress.
Data sets include the PIRLS data set (literacy assessment), the TIMSS data set (for math assessment),
and the PISA data sets (for either Math, science, or literacy assessments), among others:
PIRLS or TIMSS data: http://timssandpirls.bc.edu/
PISA data: http://www.oecd.org/pisa/pisaproducts/
Open Psychology Data https://openpsychometrics.org/_rawdata/
Illinois School Report Card data https://www.isbe.net/ilreportcarddata
EPA data https://www.epa.gov/outdoor-air-quality-data
Machine Learning Data https://archive.ics.uci.edu/ml/datasets.html
Machine Learning Data http://mldata.org/
Disability Services:
UIC strives to ensure the accessibility of programs, classes, and services to
students with disabilities. Reasonable accommodations can be arranged for students
with various types of disabilities, such as documented learning disabilities,
vision, or hearing impairments, and emotional or physical disabilities. If you
need accommodations for this class, please let your instructor know your needs
and he/she will help you obtain the assistance you need in conjunction with
the Office of Disability Services (1190 SSB, 413-2183).