“To be a good data scientist, you have to be able to communicate your results.”
Particle physicist Abhigyan Dasgupta says there are many reasons he left academia after earning his PhD: He wanted to avoid a nomadic life spent following elusive opportunities. He wanted a good work-life balance.
“I realized what I was enjoying about my day-to-day life was analyzing physics data,” he says. “But I realized I could do it with other kinds of data as well.”
Physics and astronomy PhDs who stay on the academic track find themselves chasing a limited number of positions pursued by a large number of extremely talented candidates. Despite this, many graduate programs in physics and astronomy do not introduce students to careers outside of research institutions, and so it’s up to the students to figure out what to do next.
As he neared the end of his doctoral program, Dasgupta started considering his options. He eventually came across the Insight Fellows Program, which trains academic-track scientists for careers in data science. The founder of the program, Jake Klamka, is a physicist who conducted research at both the US Department of Energy’s Fermi National Accelerator Laboratory and CERN.
Dasgupta applied, was accepted, and then started the seven-week program right after finishing his PhD in 2019.
A plethora of skills transfer from physics to data science, Dasgupta says. Physicists know how to take enormous amounts of raw data and use it to address a question—often approaching it from multiple angles before finding the answer.
“My job still reminds me of physics in many ways,” says Dasgupta, who now works as a data scientist for the video game company Activision Blizzard. “It's just that instead of electrons and muons as my individual data, it's users or revenue or something else.”
Data scientists have many roles, but in the broadest sense, they “collect and analyze data and present the results to business subject-matter experts so they can make data-driven decisions,” says Aga Leyko.
A particle-physicist-turned-data-scientist, Leyko works at a leading professional services company focusing on the healthcare industry. As she explains, data science is a broad career trajectory that uses skills such as data analysis, simulation and visualization. Leyko used these same skills for her PhD thesis, in measuring elementary particles’ properties using a multi-terabyte-sized set of data from particle interactions at the Large Hadron Collider at CERN.
Data science also uses non-technical skills, such as problem-solving, she says. “What makes physicists really good data scientists is their ability to see through complex issues, their attention to details, and their focus on finding tangible solutions.”
At a computer-gaming studio, data scientists study player behavior and how it interacts with the company’s revenue steam. At a large technology company, data scientists answer questions about sales tactics. Data science projects often require multiple phases and multiple tools, and they can take from weeks to years to complete.
“To be a good data scientist, you have to be able to communicate your results.”
The first step in a new data science project is figuring out what the problem is—translating a business question into a data-science project. The next step is acquiring and preparing the raw data.
The importance of this second step is not always obvious to those without a background in physics or astronomy, says Chaoyun Bao, a managing strategy consultant in data science at IBM who came to the field after a postdoctoral position in astrophysics.
“When I was doing my PhD, I was analyzing a lot of sensor data,” which involves dealing with distractions ranging from radio noise to faulty electronics. “So I knew that real-world data is going to have a lot of noise, it's going to [involve] a lot of digging around,” she says. “You know data is not going to be perfect, and you're not going to make decisions based on perfect information.”
Along the same lines, Leyko recalls of her time working in particle physics at CERN: “You would interrogate every single data point before you came to any conclusion.”
Leyko began her PhD work in 2010, when the LHC started back up after a faulty magnet took it out of commission. Verifying that everything was functioning properly was of especially great importance. “I never assume that everything in the data is correct,” Leyko says.
Leyko’s extra level of caution with data has been incredibly helpful to her career in healthcare consulting, she says. At one point she noticed a dataset she was working with just looked wrong, so she checked it out. A simple distribution plot confirmed her suspicions. It turned out a program had automatically changed any missing birthdates in the dataset to January 1, 1900—and as a result, the ages of clients seemed to peak at a value over 100 years old.
Once a data scientist is confident in their data, they can transform it into meaningful information. This is where writing code, making plots, using predictive models, investigating a subset of data, and using other analytical tools learned throughout a physics education are incredibly useful. It’s also where the problem-solving comes in: Perhaps even more important than knowing how to use the tools is knowing which tool to use when.
“You can teach someone how to run specific commands,” says Dasgupta, “but it takes longer to teach someone the intuition behind ‘I have this data, how can I get something useful out of that?’”
And then there’s one more step beyond the analysis: “To be a good data scientist, you have to be able to communicate your results,” Bao says.
It’s an inversion of the first step, translating a business question into a data problem; this time, the data scientists must translate their coding and analysis into interesting insights and business actions, she says.
Dasgupta agrees that this is an essential part of the job. “It’s being able to explain and tell the story of data really well.”