Spring 2019 - 60380 - PA 397C - Advanced Empirical Methods for Policy Analysis

Data Visualization, Statistics, and Econometrics for Policy Analysis, Using the Python Data Science Platform

This will be fast-paced class, requiring a substantial amount of time invested outside of class in mastering basic coding and data analytical skills, through simple but progressively more demanding exercises. The class will introduce the application of data visualization, statistical, econometric, and machine learning methods to real world data, by creating demonstrations and simulations using the Python software platform and its data science-oriented data visualization and analysis packages. The learning goal is for each student to have a working knowledge of both key concepts, and basic software tools, needed to apply advanced visualization and analysis tools to real world data, to better inform real world policymaking.

The Python software tools in which all students will gain some proficiency are components in the most widely used, non-proprietary, open data science software platform, and readily allow access to excellent visualization, statistical, and econometric analysis tools capable of handling even the largest public datasets. The same software platform can also be integrated with, and run, R and Stata statistical analysis code.

The intention is to make this class doubly valuable to a student interested in public policy. First, the class will introduce you to cutting edge computer software tools that can be applied to real data for practical policy purposes (and hopefully both give you some advantages in post-graduation job markets, and facilitate future acquisition of even more advanced skills over the rest of your careers). Second, the class is designed to motivate further learning of statistical, econometric, and machine learning concepts by showing that they can be simply and practically applied to real world data, and to give you some first- hand experience in doing this this.

Much of the learning will be structured as completion of data analysis exercises. In addition to these exercises, every student will have an individual midterm project, and join in a small group analysis project, with in-class presentation, discussion, and critique. Every student will also submit an individual final empirical data visualization and analysis project, in the form of a Jupyter (interactive Python) notebook.

The Python data science software platform is increasingly being used by organizations and businesses, as well as researchers, to undertake policy-relevant analysis. For examples of some interesting and useful Jupyter notebooks documenting policy relevant data analysis reported by online journalists, see https://github.com/BuzzFeedNews . For examples of research economists doing relatively advanced analyses in Jupyter notebooks, see https://quantecon.org/notebooks . For an interested curated collection of Jupyter notebooks, see https://github.com/jupyter/jupyter/wiki/A- gallery-of-interesting-Jupyter-Notebooks . For a useful collection of Jupyter notebooks focused on introductory Python programming, see https://github.com/leriomaggio/python-in-a-notebook . The Jupyter notebooks in these archives can also give you valuable insights on how to do useful things when analyzing and visualizing large scale data.

The class assumes you have previously taken an introductory statistics course as a prerequisite. If you have also previously gone well beyond an introductory course, you will be encouraged to assist those of your peers who have not. Lectures will be based on interactive Python notebooks (aka Jupyter notebooks). Students will follow along class lectures using open source data science software installed on a personal laptop computer (Windows, Mac, or Linux). All students must read all assigned reading, since this will be assumed as background to all the Jupyter notebook content on statistical, econometric, and machine learning concepts we go through in class. There are no computer programming prerequisites, but you will need to bring to class a personal computer with the Anaconda distribution of Python installed (more specific instructions will be distributed prior to the first class).