Stat 107: Data Science Discovery featured in UC Berkeley Data Science Newsletter | Department of Statistics

Data Science Education at University of Illinois Urbana-Champaign

June 4, 2020

In the recent May, 2020 issue of UC Berkeley's Data Science Newsletter, Statistics 107: Data Science Discovery - initially led by Karle Flanagan and Wade Fagen-Ulmschneider in its inception during the Spring 2019 term - was featured as part of UC Berkeley's School Spotlight feature. The School Spotlight feature is intended to showcase the latest schools that UC Berkeley have worked with on implementing Data Science programs. Stat 107 will be taught by Ha Khanh Nguyen for the Fall 2020 term.

With permission from UC Berkeley's Data Science Education Support, the following is the School Spotlight feature on Stat 107 as it was originally published:

Data Science Education at University of Illinois Urbana-Champaign

An Example in Using Github instead of JupyterHub

Stat 107: Data Science Discovery is the University of Illinois Urbana-Champaign’s (UIUC) foundational data science course. Led by Professor Wade Fagen-Ulmschneider and Professor Karle Flanagan, Stat 107 is designed with no prerequisites with the goal that any student at UIUC is able to gain a comprehensive introduction to the “next BIG thing at Illinois”.

Development of the course officially began in Fall 2018, shortly after UIUC attended the National Workshop in Data Science Education in Summer 2018 and learned about Data 8: Foundations of Data Science at Berkeley. UIUC shared Berkeley’s enthusiasm in offering an introductory course in data science that was accessible at a large scale. The pilot offering of Stat 107 was in Spring 2019 with 20 students from 20 different majors, with a massive growth to 300 students in its Fall 2019 offering, coinciding with its offering as a general education requirement.

While Stat 107 is modeled off of Data 8 in terms of curriculum, its infrastructure is based on differing philosophies. Programming in Python in the class is based on the pandas package rather than Berkeley’s datascience package, and local deployment of notebooks through Github is favored over JupyterHub as a means for deploying assignments. Traditionally, classes that utilize the datascience package implement the package throughout the class with the aim of flattening the perceived steepness of the programming learning curve -- a concern stemming from there being no formal prerequisites aside from high school mathematics. However, UIUC considers introducing students to pandas from the get-go as a more suitable option for bringing industry-relevant experience to the class. Empirically, students only struggle for the first two weeks with the learning curve, which is accompanied by close attention and support from the course staff.

The theme of gearing students towards industry-related tools and skills is also exemplified with the usage of Github; the instructors give out starter code for pulling Jupyter notebooks from the course repository for students to follow through the course of the semester, with an explanation of the theory behind the code given in the second half of the semester. Furthermore, all exams are open-book, open-Google, and open-resource in general to mirror the workflow techniques and collaboration present in most of the industry.

UIUC is currently working towards a fully established data science program. Existing related programs include the B.S. in Statistics & CS degree and the CS+X degree, which allows students to specialize in one of 10+ concentrations in diverse fields such as advertising, chemistry, or music. UIUC hopes to kick off its data science specific programs by offering a minor in the near future. Over the next 3-5 years, UIUC also plans on expanding its 4 connector courses, which weave together core concepts and approaches from Data 8 with complementary ideas or areas like psychology, cognitive science, and business. These courses come with Stat 100: Statistics, an introductory statistics course, as a prerequisite. UIUC plans to include Stat 107 and Stat 100 in their data science degree curriculum, which will also include courses in statistics, computer science, mathematics, data ethics, and information.