EPSY 546 - Educational Measurement

4 Credit Hours
CRN: 10855

Professor: George Karabatsos (home page)
E-mail: georgek@uic.edu
Phone: 312-413-1816

Semester: Fall 2015
Class Time: Thursday 5:00-8:00pm
Room: 2SH 220
Computer lab: Room 2027 EPASW
Office Hours: Thursday 2-4 (EPASW 1034)


Course Description:
This course teaches psychometrics, which is the practice of constructing scales for measuring of psychological traits (e.g., ability in an examination, or attitudes), from data consisting of examinees' scored responses on a set of test items, or consisting of judges' ratings of individual examinees on task performance. The course will cover classical and contemporary models and methods of psychometrics, for the analysis of dichotomous-scored (e.g., correct/incorrect), rating-scale, and multiple-choice test items.

Models and methods include classical test theory (CTT) and test reliability analysis, parametric and semiparametric Rasch models, Item Response Theory (IRT) models, exploratory factor analysis, confirmatory factor analysis models, Hierarchical Linear Modeling (HLM) approaches to psychometrics, extended reliability analysis with generalizability theory (i.e., variance components modeling), methods for equating examinee scores from different tests (Given a score on Test X, what is the equivalent score on Test Y?), methods for analyzing person fit (in order to identify which test respondents are giving aberrant item responses due to cheating, lucky-guessing, carelessness, etc.), and methods for analyzing item fit (in order to identify which items contain surprising responses, because of poor wording of the item content, the irrelevance of the item in terms of what the test intends to measure, etc.).

This course illustrates all of these psychometric methods and models, through the analysis of test data arising from various fields such as education, psychology, and health care.
The data applications will involve appropriate psychometric software, such as SPSS, WINSTEPS, FACETS, PARSCALE, Amos, the DPpackage of R, among other software.
Also, I will provide freely-available menu-driven software that I have developed that can be used to perform psychometric modeling of data using the CTT model and Bayesian IRT models.
IRT models include 2-level and 3-level mixture models, with mixing distribution modeled either normally or nonparametrically as an infinite mixture of random parameters.
The software, along with the user's manual (paper), can be downloaded from the webpage: http://tigger.uic.edu/~georgek/HomePage/BayesSoftware.html

While this course focuses primarily on practical applications of psychometric methods, this focus will not be made at the sacrifice of rigor. In particular, students who take this course will also learn the basic foundations of reliability and test validity, the key properties and characteristics of various psychometric models, and the (maximum likelihood and Bayesian) approaches to estimating the parameters of such models.
Still, the course will not require an extensive mathematical background.

For the final grade, students are expected to complete two-take home exams, and provide a short paper presentation towards the end of the semester,
involving applications of educational measurement (e.g., via the CTT model, IRT model, equating, etc.) for the analysis of real test data.
If you need to look for a data set for the paper presentation, several data sets can found online on world-wide assessments of educational progress.
Data sets include the PIRLS data set (literacy assessment), the TIMSS data set (for math assessment), and the PISA data sets (for either Math, science, or literacy assessments).
Website to obtain either a PIRLS or TIMSS data set: http://timssandpirls.bc.edu/
Website to obtain a PISA data set: http://www.oecd.org/pisa/pisaproducts/

Prerequisite: Any introduction to statistics course, or equivalents, or consent.

Readings: Suggested readings are listed below, as "Relevant References" within the COURSE SCHEDULE. The article readings will be provided by the instructor.


COURSE SCHEDULE

Date Topic
Aug 27 The four scales of measurement (nominal, ordinal, interval, and ratio scales).
Test reliability, test validity.

Classical Test Theory -- foundations.
Sep 3 Classical Test Theory (continued)
Relevant References:

Boorsboom, D., & Mellenbergh, G.J. (2004). The concept of validity. Psychological Review, 11, 1061-1071.
Kline, P. (1993). Reliability of tests: Practical issues. In Ch 1, The Handbook of Psychological Testing, 5-15.
Messick, S. (1995). Validity of Psychological Assessment. American Psychologist, 50, 741-749.
Sep 10 Rasch models for binary item scores.
-- The item response function (IRF), the item-step response function (ISRF), and the item category-response function.
-- The three properties of all psychometric models (unidimensionality, local independence, monotonicity of the IRF/ISRF).
-- Invariant item ordering.
-- The definition of the Rasch model (for dichotomous item scores).
-- The specific objectivity property of the Rasch model.
Bond, T., & Fox, C.M. (2007). Applying the Rasch Model: Fundamental Measurement in the Human Sciences, Second Edition. Lawrence Erlbaum.
Sep 17 Rasch models for binary item scores.
-- Estimating the person ability and item difficulty parameters (maximum likelihood method; marginal maximum likelihood method).
-- Investigating item fit and person fit.
-- Analyzing the reliability of the test.
EXAM 1 IS DUE.
Sep 24 Rasch models for the analysis of rating scales items, and the analysis of judge ratings.
-- Rasch rating scale model, Rasch partial credit model, and the (FACETS) Rasch model for judge ratings.
-- Estimating the person ability and item difficulty parameters (maximum likelihood method; marginal maximum likelihood method).
-- Investigating item fit and person fit.
-- Analyzing the reliability of the test.
Oct 1 Item Response Theory Models.
Dichotomous item scores: 2-parameter logistic model, 3-parameter logistic model, Rasch model with guessing parameter.
Polytomous item scores: graded response models, generalized partial credit models.
Embretson, S., & Reise, S.P. (2000). Item Response Theory for Psychologists. Lawrence Erlbaum.
Oct 8 Hierarchical Linear Models
-- Rasch model as a (special) Hierarchical Linear Model.
-- (Rasch) analysis of test items, rating scales, and judge ratings
-- Investigating Item Bias (Differential Item Functioning),
-- Comparing test performance across different groups of respondents.
-- Multidimensional Rasch modeling.
-- Incorporating additional predictor variables in psychometric analysis.
-- Bayesian semiparametric inference of Rasch and IRT models.
-- Illustrative applications of the model on real data.
Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.). Newbury Park, CA: Sage. (especially Chapters 10 and 11, pp. 365-371).
Raudenbush, S.W., & Bryk, A.S., Cheong, Y.F., & Congdon. R.T. (2004). HLM 6: Hierarchical Linear and Nonlinear Modeling. Lincolnwood, IL: Scientific Software International.
Karabatsos, G. (2015). Karabatsos, G. (2015, to appear). Bayesian nonparametric IRT. In chapter 19 of W.J. van der Linden & R. Hambleton (Eds.), Handbook Of Item Response Theory: Models, Statistical Tools, and Applications, Volume 1. New York: Taylor & Francis. arXiv preprint. Survey data set (csv format, suitable for the Bayesian Regression software).
Kleinman K.P., & Ibrahim, J.G. (1998b). A semi-parametric Bayesian approach to generalized linear mixed models. Statistics In Medicine, 17, 2579-2596.
Oct 15 Exploratory Factor analysis of test items.
Kline, P. (1993). An easy guide to factor analysis. Routledge.

Oct 22

Confirmatory Factor analysis of test items.
Kline, P. (1993). An easy guide to factor analysis. Routledge.
EXAM 2 IS DUE.

Oct 29

Generalizability Theory: A comprehensive approach to reliability analysis.
Brennan, R. (2001). Generalizability Theory. New York: Springer.
Shavelson, R., & Webb, N. (1991). Generalizability Theory: A Primer. Sage Publications.
Nov 5 Equating Test Scores: Given a score on Test X, what is the equivalent score on Test Y?
-- Equating designs.
-- Methods of score equating under various designs.
-- Rasch item equating.
Livingston, S.A. (2004). Equating Test Scores (without IRT). Princeton: Educational Testing Service.
Karabatsos, G., & Walker, S.G. (2009). A Bayesian nonparametric approach to test equating. Psychometrika, 74, 2, 211-232.
Nov 12 Computer adaptive testing (CAT), Item banking, and Standard Setting.
Cizek, GJ (1996). Setting passing scores. Educational Measurement: Issues and Practice, 15, 20-31.
Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50.
Meijer, R.R., & Nering, M.L. (1999). Computer adaptive testing: Overview and introduction. In: Applied Psychological Measurement, 23, 3, 187-194. Special Issue on Computerized adaptive testing.
Ward, A.W., & Murray-Ward, M. (1994). Guidelines for the development of item banks. An NCME instructional module. Educational Measurement: Issues and Practice, 13 (1), 34-39.
Nov 19 Student Presentations of final paper
Nov 26 Thanksgiving
Dec 3 Student Presentations of final paper
Dec 10 FINAL PAPER DUE (Exam week)
Please leave paper in my mailbox in Room 3233, or under my office door at Room 1034.



Grading Policy
:
The final grade is based on the performance on the two exams (total 40% of final grade), a data analysis presentation and paper (50% of final grade),
and class participation (10% of final grade; includes attendance and contributions to in-class discussion).
Final grades will be given out according to the following grading scale:

A
90% - 100%
B
79% - 89%
C
68% - 78%
D
57% - 67%
F
56% - Lower

Students will spend substantial amounts of time reading, and on the computer. It is assumed that students will exert individual initiative in solving computing/analysis problems as they arise.
I can only accept hard-copies of the completed exam and completed paper.


ASSIGNMENTS:
A) Two Take-Home Exams (40% of total grade; 20% each)
B) Data Analysis Presentation (25% of total grade)
C) Data Analysis Paper (25% of total grade)

A. Computer-Based Exam (40% total):
You will be tested on your ability to perform psychometric analyses of real data sets, and answer questions concerning the interpretation of these analyses.

B,C. Data Analysis Presentation and Paper
-- The data analyses and paper will consist of the relevant output from the software programs and a complete report stating the results.
-- You may supply your own data or you may solicit faculty (education or other) for data.
-- The paper must be 10-15 double spaced-pages, using 1-inch margins, and in APA format (computer generated data-analysis output must be
placed in the Appendix, and is not part of the 10-15 page limit).
-- The presentation has a limit of 25 minutes (about 15 Power Point slides).

Both the presentation and paper must include:

Introduction -
Describe in detail the substantive problem you will be solving in this research study,
and describe the rationale/theory underpinning the data you will analyze (5 points).

Methods - (not necessarily in the following order).
-- Describe sample characteristics (5 points).
-- Describe the items on your test(s) (including their number and scoring format) (5 points).
-- Describe the unidimensional variable(s) you intend to measure with the test(s) (5 points).
-- For data analysis, use one or more psychometric models. (5 points)
-- Please fully describe the model(s) you are using (15 points).
-- Please fully describe the methods you will use to investigate the unidimensionality,
reliability, validity, and (possibly) item bias of each of your test(s) (15 points).
-- Also, if you intend to equate test scores, please fully describe the equating methods you will implement
(use either equipercentile equating, Rasch item equating, or both)

Results - (not necessarily in the following order).
-- Discuss the amount of evidence for unidimensionality (10 points), reliability (10 points) and validity of your test(s) (10 points),
and justify any modifications you make to your test (removing items, removing persons, etc…).

Discussion - (not necessarily in the following order).
-- What modifications (if any) would improve the instrument? (3 points)
-- What are the implications of your study, with respect to the measurement and applications in the field of interest? (3 points)

Please provide appropriate handouts and develop meaningful overheads for your presentation.

Disability Services:
UIC strives to ensure the accessibility of programs, classes, and services to students with disabilities. Reasonable accommodations can be arranged for students with various types of disabilities, such as documented learning disabilities, vision, or hearing impairments, and emotional or physical disabilities. If you need accommodations for this class, please let your instructor know your needs and he/she will help you obtain the assistance you need in conjunction with the Office of Disability Services (1190 SSB, 413-2183).