Early last year, a committee was assembled with faculty members from all three Arizona public universities to help plan the first statewide American Statistical Association (ASA) DataFest competition to be hosted by the School of Mathematical and Statistical Sciences at Arizona State University.
At past DataFest events, only students from the host, Arizona State University, were able to participate, but the committee was looking forward to inviting students from the University of Arizona and Northern Arizona University for the first time.
In March 2020, the spread of COVID-19 forced a cancellation of not only the Arizona DataFest, but essentially the entire national and international rollout of competitions.
After a year’s worth of experience with teleworking and virtual classes, the ASA DataFest planning committee felt more confident in their understanding of what needed to be done to virtualize this year's event.
“I think a year ago I might have thought it wouldn’t be much harder and maybe easier to do a virtual DataFest,” said Rodney Jee, ASA DataFest co-organizer and senior credit risk analyst at USAA. “But now after having gone through one and seeing how much work there was, I see that I didn’t have a clue.”
Fortunately, the planning committee that was assembled had the right mix of people to work together to implement essentially everything that would be needed.
Yi Zheng, ASA DataFest co-organizer and associate professor of mathematics at ASU, and Derek Sonderegger, associate professor of statistics at Northern Arizona University, implemented almost all of the Zoom and Slack capabilities and features needed to virtualize the event based upon how it was conducted for a live version.
“With significant planning and hard work, I think we were able to conduct a very successful ASA DataFest and give the students an experience that reflected the world’s way of working at the time,” Jee said.
Because the event was virtual, there were likely some benefits that would not be seen in a live, in-person event. For example, the committee was able to get a few mentors from more distant geographic locations, including one from San Antonio (one of the sponsor’s headquarters location). Robert Gould, founder of DataFest and vice-chair of undergraduate studies and the director of the Center for Teaching of Statistics at UCLA, made an appearance and gave a heart-lifting talk to the students during opening night via Zoom. Gulhan Bourget, professor of statistics at California State University, Fullerton, served as the head judge – something she would likely not have done if the event were not virtual.
ASA DataFest is a 48-hour data hackathon in which teams of undergraduates work to discover and share meaning in a large, rich and complex data set. It is a nationally coordinated weekendlong data analysis competition that challenges students to find their own story to tell with the data that is meaningful to the data donor. DataFest competitions are held every weekend from mid-March through mid-May at college campuses across the country. The ASU virtual event took place March 19–21.
The data science community is brought together during ASA DataFest. The undergraduate teams do the work, but are guided by roving mentors — faculty members, graduate students and industry professionals. Those working in industry find DataFest to be a great recruiting opportunity, where they get to watch the student teams work under pressure and evaluate their problem-solving abilities.
After two days of intense data wrangling and analysis, each team is allowed only five minutes and two slides to impress a panel of judges who are experienced data scientists. Students can also see how their peers approach the same dataset during the presentation time.
When Dan Petty participated in ASA DataFest in 2019, his team won Best Use of External Data. He now works as a data analyst for Tempe Fire Medical Rescue, and returned to DataFest this year as a mentor to guide student teams throughout the weekend competition.
“DataFest is an excellent opportunity to experience what working with data in the real world is like. It's a great way to practice ‘hard skills’ like coding and visualization and ‘soft skills’ like teamwork and presentation,” Petty said.
“Taking a big messy dataset, finding something interesting, and then explaining it to strangers — all under a tight deadline — was a big confidence builder for me as a participant. It was fun to help this year's teams navigate this process as a mentor.”
Will Dong graduated from ASU in 2019 with concurrent degrees in mathematics and economics. He currently works as an artificial intelligence and machine learning engineer at a startup in the Bay area called Wingman.ai, and served as a virtual mentor.
“I think that when students take extra time outside of their studies to pursue events such as DataFest, it demonstrates their dedication and passion to grow their skills and knowledge beyond the norm amongst their peers,” Dong said. “Additionally, it provides students with an excellent data point on their resume as well as a project that they can speak to in-depth with employers to signal their abilities.”
“I would strongly recommend for students to participate in DataFest. If data is related to their major or somehow to what they do, then this is a great opportunity. Some experience is better than no experience,” said Mirjeta Pasha, postdoctoral associate at ASU and ASA DataFest mentor. “Preparing themselves to give presentations in public, to a broad interest audience will be very helpful for their future professional development.”
One of the key components that sets ASA DataFest apart from other competitions is its use of a real-world, large dataset provided by a real company or organization. This is coordinated at the national level by the American Statistical Association, and each local competition uses the same mystery dataset. All participants are sworn to secrecy until mid-May after the last DataFest competition finishes.
“I think DataFest is a unique and invaluable opportunity for undergraduate students to sharpen their essential skills for the data science profession, such as developing research angles, unpacking messy data, data visualization and modeling, teamwork, and presentation,” Zheng said. “And the mystery dataset is one of the most valuable and coolest parts.”
Indeed, this year’s dataset was relevant to the news headlines of today, but neither the identity of the donating company nor the content of the dataset can yet be revealed.
The student competitors were enthusiastic about their experiences at ASA DataFest 2021.
“The Datafest experience is like entering a war room in the time of a national crisis – there is a problem to be solved, the deadline was yesterday and people are depending on you,” said Andre Williams, who will graduate from ASU in May with a bachelor's degree in mathematics, and prevously worked as a process engineer at Intel for 14 years. “I would definitely recommend DataFest to others, especially those pursuing degrees in engineering and mathematics. DataFest was absolutely real-world so participating can help students determine whether their career choice is right for them.”
Cindy Luna Miranda was Williams' teammate on Team Exploratory, which earned Best Use of External Data recognition. She is studying mathematics at ASU, and was able to participate this year because it was held virtually.
“I learned that there is a multitude of information that can be interpreted and morphed differently by the analyzer. I was not expecting so much data, but it did allow for variations of discovery,” Luna Miranda said.
Tyrus Nelson, who will graduate from the University of Arizona this semester with concurrent degrees in sociology and statistics/data science, described DataFest as “a stressful time culminating into a beautiful grand finale.”
When his team, the Outliers, was announced as winner of the Best Visualization award, Nelson said he "lost his mind" with excitement. “Competing in DataFest definitely reaffirmed my dreams of becoming a computational sociologist and expanding my field with the help of data science.”
Nathan Nguyen and his team, Surprise Pikachu, were recognized by the judges for Best Insight.
“I’d recommend DataFest because it gives you practical experience with real data as opposed to textbook datasets. It validates your knowledge in practice,” said Nguyen, an ASU mathematics major graduating in May.
“It’s also an opportunity to see how other people approach a similar problem — opening your mind. For example, I thought that Team Lasagna (natural language processing approach) performed a very unique analysis compared to the rest of us. I might not have ever thought of that.”
This year’s ASA DataFest was sponsored by Discover and USAA, and each company had representatives serve as judges and mentors. When asked if they thought the students’ experience may have been different in the virtual environment, head judge Bourget replied, “During in-person DataFest, some students could be intimidated by judges sitting in the front row. I would say these students could be more at-ease in the virtual environment.
“I was very impressed by all student groups who observed their five minutes presentation time. It is actually quite difficult to give an overview of the project in five minutes.”
In addition to the sponsor companies, mentors represented a variety of companies, including Wells Fargo, Electronic Arts and Zillow, among others.
“For students who are seeking career opportunities, DataFest is a unique stage to showcase their skills to and connect with industry employers. For beginning students, it is also a great opportunity to learn from advanced students and experience the real world of data scientists,” Zheng said. “It’s definitely an exciting event to look forward to every year.”