The Department of Statistics at the University of Illinois recently partnered with Sandia National Laboratories to present the first Illinois-Sandia Data Challenge. The challenge invited teams to analyze brainwave data from participants in a language processing study and develop models to predict if future participants would be bilingual. The primary task was to predict Spanish bilingualism, with bonus opportunities for exploring bilingualism in other languages.

Over 180 students from various disciplines, including statistics, computer science, information science, and psychology, formed teams of 2-4 to tackle the challenge. The virtual data challenge launched on October 31, with Sandia revealing the problem statement and providing the dataset. Participants worked tirelessly over the first weekend in November, decoding the data and building reliable models to submit to the Sandia judges by the deadline. Sandia statisticians Lyndsay Shand, Marie Tuft, and Adah Zhang, as well as Sandia cognitive scientists Daniel Dickson and Mallory Stites, oversaw the challenge and provided guidance throughout the weekend. By November 4, the teams submitted their 5-7-minute video presentations, showcasing impressive results for the judges to review and declare the first Illinois-Sandia Data Challenge winner.

The competition saw the impressive return of the team 'In It to Win It' to claim victory with their presentation. Consisting of Anusha Chandraju (information management, MS), Tanmoy Debnath (information management, MS), and Shengzhu Yin (physics, PhD), 'In It to Win It' were runners-up in the Synchrony Datathon held earlier in the year, which looked to develop a viable business solution to enhance the Synchrony interactive voice response system.

Image
Teams talking to Sandia rep
Sandia representative, Mallory Stites talking to Ziyu Liu (left) of the 'Datanauts' and Shengzhu Yin, Anusha Chandraju, and Tanmoy Debnath of 'In It to Win It'.

When asked about their initial approach, the team emphasized the importance of building a solid foundation. "We first studied the problem statement to fully understand what was being asked and identify any additional questions that could be explored. Then we conducted initial feature engineering and built a crude classifier as a baseline," they shared. This methodical approach allowed them to refine their model progressively, ensuring an efficient and well-informed solution.

Teamwork played a crucial role in their success. The trio highlighted their collaborative strategies, which included breaking down the problem into manageable tasks, setting deadlines, and utilizing whiteboards for daily progress tracking. Working together in person proved invaluable: "Being in the same room allowed us to bounce ideas off each other instantly, tackle problems as a team, and stay fully engaged. This hands-on collaboration kept our momentum high, and energy focused throughout the challenge," they noted.

Drawing on experience from past competitions like the Synchrony Datathon, the team leveraged their varied academic backgrounds in information science and physics. This experience allowed them to create an integrated machine-learning model enriched with interpretable features and strong visualization elements. Their experience underscored the importance of making complex data accessible: "Creating compelling plots that enable audiences to grasp important concepts quickly was essential," they said.

According to the team, the most challenging part of the contest was translating technical results into meaningful insights that resonated with the audience. They refined their interpretations over the four days, ensuring their final presentation was informative and engaging.

Image
Data visual

Their advice for future competitors is clear: "Allocate enough time to craft a presentation that effectively walks through your approach. Many teams produce outstanding technical work, but without dedicating sufficient time and effort to the presentation, that work can go underappreciated."

'In It to Win It' exemplified how strategic planning, collaborative teamwork, and effective communication can lead to remarkable results in data challenges. Their win showcased their technical prowess and highlighted the importance of storytelling in data science.

'In It to Win It' was not the only team recognized for their efforts during the challenge. While only one team could be crowned the champion of this challenge, two other teams received honorable mentions for their outstanding work. 'Datanauts' and 'JACKAlantern' submitted projects that received additional Sandia judges' recognition. 'Datanauts' consisted of Ziyu Li (junior, mathematics) and Aditi Shukla (information management, MS), and 'JACKAlantern' consisted of Jack Anderson (mathematics, PhD), Jungsoo (Ben) Park (mathematics, PhD), Wilmer Smilde (mathematics, PhD), and Jie Yeo (mathematics, PhD).