“To be a good data scientist, you have to be able to communicate your results.”
Particle physicist Abhigyan Dasgupta says there are many reasons
he left academia after earning his PhD: He wanted to avoid a nomadic
life spent following elusive opportunities. He wanted a good work-life
“I realized what I was enjoying about my day-to-day life was
analyzing physics data,” he says. “But I realized I could do it with
other kinds of data as well.”
Physics and astronomy PhDs who stay on the academic track find
themselves chasing a limited number of positions pursued by a large
number of extremely talented candidates. Despite this, many graduate
programs in physics and astronomy do not introduce students to careers
outside of research institutions, and so it’s up to the students to
figure out what to do next.
As he neared the end of his doctoral program, Dasgupta started
considering his options. He eventually came across the Insight Fellows
Program, which trains academic-track scientists for careers in data
science. The founder of the program, Jake Klamka, is a physicist who
conducted research at both the US Department of Energy’s Fermi National
Accelerator Laboratory and CERN.
Dasgupta applied, was accepted, and then started the seven-week program right after finishing his PhD in 2019.
A plethora of skills transfer from physics to data science, Dasgupta
says. Physicists know how to take enormous amounts of raw data and use
it to address a question—often approaching it from multiple angles
before finding the answer.
“My job still reminds me of physics in many ways,” says Dasgupta, who
now works as a data scientist for the video game company Activision
Blizzard. “It's just that instead of electrons and muons as my
individual data, it's users or revenue or something else.”
Data scientists have many roles, but in the broadest sense, they
“collect and analyze data and present the results to business
subject-matter experts so they can make data-driven decisions,” says Aga
A particle-physicist-turned-data-scientist, Leyko works at a leading
professional services company focusing on the healthcare industry. As
she explains, data science is a broad career trajectory that uses skills
such as data analysis, simulation and visualization. Leyko used these
same skills for her PhD thesis, in measuring elementary particles’
properties using a multi-terabyte-sized set of data from particle
interactions at the Large Hadron Collider at CERN.
Data science also uses non-technical skills, such as problem-solving,
she says. “What makes physicists really good data scientists is their
ability to see through complex issues, their attention to details, and
their focus on finding tangible solutions.”
At a computer-gaming studio, data scientists study player behavior
and how it interacts with the company’s revenue steam. At a large
technology company, data scientists answer questions about sales
tactics. Data science projects often require multiple phases and
multiple tools, and they can take from weeks to years to complete.
The first step in a new data science project is figuring out what
the problem is—translating a business question into a data-science
project. The next step is acquiring and preparing the raw data.
The importance of this second step is not always obvious to those
without a background in physics or astronomy, says Chaoyun Bao, a
managing strategy consultant in data science at IBM who came to the
field after a postdoctoral position in astrophysics.
“When I was doing my PhD, I was analyzing a lot of sensor data,”
which involves dealing with distractions ranging from radio noise to
faulty electronics. “So I knew that real-world data is going to have a
lot of noise, it's going to [involve] a lot of digging around,” she
says. “You know data is not going to be perfect, and you're not going to
make decisions based on perfect information.”
Along the same lines, Leyko recalls of her time working in particle
physics at CERN: “You would interrogate every single data point before
you came to any conclusion.”
Leyko began her PhD work in 2010, when the LHC started back up after a
faulty magnet took it out of commission. Verifying that everything was
functioning properly was of especially great importance. “I never assume
that everything in the data is correct,” Leyko says.
Leyko’s extra level of caution with data has been incredibly helpful
to her career in healthcare consulting, she says. At one point she
noticed a dataset she was working with just looked wrong, so
she checked it out. A simple distribution plot confirmed her suspicions.
It turned out a program had automatically changed any missing
birthdates in the dataset to January 1, 1900—and as a result, the ages
of clients seemed to peak at a value over 100 years old.
Once a data scientist is confident in their data, they can transform
it into meaningful information. This is where writing code, making
plots, using predictive models, investigating a subset of data, and
using other analytical tools learned throughout a physics education are
incredibly useful. It’s also where the problem-solving comes in: Perhaps
even more important than knowing how to use the tools is knowing which tool to use when.
“You can teach someone how to run specific commands,” says Dasgupta,
“but it takes longer to teach someone the intuition behind ‘I have this
data, how can I get something useful out of that?’”
And then there’s one more step beyond the analysis: “To be a good
data scientist, you have to be able to communicate your results,” Bao
It’s an inversion of the first step, translating a business question
into a data problem; this time, the data scientists must translate their
coding and analysis into interesting insights and business actions, she
Dasgupta agrees that this is an essential part of the job. “It’s being able to explain and tell the story of data really well.”