Spring 2019 - 60400 - PA 397C - Advanced Empirical Methods for Policy Analysis

Statistical Analysis and Learning

Large datasets are increasingly becoming available across many sectors such as healthcare, energy, and online markets. This course focuses on methods that allow “learning” from such datasets to uncover underlying relationships and patterns in the data, with a focus on predictive performance of various models that can be built to represent the underlying function generating the data. The course starts with a review of basic statistical concepts and linear regression. But the course will focus mostly on classification and clustering based on non-regression techniques such as tree-based approaches, support vector machines, and unsupervised learning (e.g., hierarchical clustering). In the problem sets, tutorials, and class projects we will examine applications in: healthcare, energy, transportation, education, crime, and online markets. This course is intended for first and second year Masters students. Ph.D. students with an interest in non-regression based quantitative methods may also find this course useful.

Topics to be covered: Linear Regression, Classification, Resampling Methods, Linear Model Selection and Regularization, Tree-Based Methods, Support Vector Machines, Unsupervised Learning.

In covering the material from the assigned textbook (see below), this course will emphasize both on formulaic and conceptual understanding of the discussed methods. As necessary, the instructor will draw on material from outside the textbook for driving conceptual clarity.