Spring 2022 - 59525 - PA 397C - Advanced Empirical Methods for Policy Analysis

Data Visualization and Econometrics for Policy Analysis, Using the Python Data Science Platform-WB

This class will introduce the application of data visualization and econometric modelling methods to real world data using the Python software platform and its data science-oriented data visualization and analysis packages. The learning goal is for each student to have a working knowledge of both key concepts, and basic software, needed to apply advanced data visualization and analysis tools to real world data, to better inform real world policymaking.

The Python software tools we will use are components in the most widely used, non-proprietary, open data science software platform, and enable excellent visualization, statistical, and econometric analysis of even the largest datasets. The same software platform can also be integrated with, and run, R and Stata statistical analysis code.

The intention is to make this class doubly valuable to a student interested in public policy. First, the class will introduce you to cutting edge data science tools that can be applied to real data for practical policy purposes (and hopefully both give you some advantages in post-graduation job markets and facilitate future acquisition of even more advanced skills over the rest of your careers). Second, the class is designed to motivate further learning of data visualization and econometric modelling concepts by showing their usefulness in simple and practical applications to real world data and giving you first-hand experience in doing this.

Much of the learning will be structured as completion of data analysis exercises. In addition to these exercises, every student will have an individual midterm project, and join in a small group analysis project, with in-class presentation, discussion, and critique. Every student will also submit an individual final empirical data visualization and analysis project, in the form of a Jupyter (interactive Python) notebook. Students will be asked to complete 12 hours of individual, introductory asynchronous online instruction, covering basic Python software skills, prior to the third week of class. Examples worked in class and solutions posted online will use the Python data science software platform.

The class assumes you have previously taken an introductory statistics course (equivalent to the LBJ School’s Introductory Empirical Methods) as a prerequisite. No computer programming experience is assumed or needed. Time will be reserved for in-class laptop “empirical analysis clinics,” to provide you with real time feedback on conceptual or data analysis issues you may encounter.

In 2021, students in this class formed a team that won 2nd place in the Microsoft-ODI Open Education Data Challenge (an international competition organized on the XPrize platform) with a presentation on “Closing the Digital Divide: Lessons Learned from Covid-19”. Projects and exercises were organized around open source datasets relevant to this competition, to facilitate participation in this unique opportunity. This year, projects and exercises will be more diverse, and tailored to individual student interests.

Core Courses
Instruction Mode