As big data starts to mean big business opportunities for companies around the globe, the demand for professionals who can sift through the goldmines of data is growing in kind. According to a January report from Indeed, postings for data science jobs grew 31% year-over-year in December 2018 – and show a massive increase of 256% compared to five years prior.
(Image source: hiringlab.org)
However, even as demand (and media buzz) rises, there’s still much confusion surrounding precisely what it is that data scientists do. In particular, the terms “data science” and “machine learning” seem to blur together in a lot of popular discourse – or at least amongst those who aren’t always as careful as they should be with their terminology.
So, let’s clear things up once and for all – what’s the difference between data science and machine learning, and indeed, what’s the difference between a data scientist and a machine learning engineer?
Scientists Vs. Engineers
Part of the confusion undoubtedly comes from the fact that machine learning is a part of data science. In fact, data science is something of an umbrella term that encompasses data analytics, data analysis, data mining, machine learning, and several other related disciplines.
However, while machine learning forms a major component of data science – and is an important skill for data scientists to have – it is only one of many. As such, it is simply wrong to use the two terms interchangeably.
To better understand the distinction, it’s useful to think about the differences between scientists and engineers. Scientists are subject experts. They systematically gather and use research and evidence to form hypotheses, which they then put to the test in order to gain understanding and knowledge. Engineers, on the other hand, build things. In the case of machine learning engineers, they build and maintain systems that utilize scalable machine learning algorithms to process datasets autonomously without human intervention. In practice, both data science and machine learning roles work with data – but they require different (though complementary) skillsets.
So, what does a data scientist do that a machine learning engineer does not?
What Is Data Science?
Data science covers the whole spectrum of data processing – not just the algorithmic aspects. It involves data extraction, data cleansing, data integration, data analysis, data visualization, machine learning, and – the ultimate purpose of it all – actionable insights generation.
When a business has a problem to solve, it turns to the data scientist to gather, process, and derive valuable insights from data in order to find an answer or solution. Data scientists understand data from a business perspective, and are tasked with providing accurate predictions and insights that can be used to power critical business decisions. Essentially, the goal of data science is to discover hidden patterns in raw data to help businesses improve and increase their profits.
The field of data science employs various disciplines, including mathematics and statistics, as well as the study of where data originates, what it represents, and how it can be transformed into a valuable resource for the business. In order to do so, it incorporates various techniques – including machine learning.
What Is Machine Learning?
Machine learning is the practice of building machines with the ability to learn from data and progressively improve performance on a specific task. This often takes the form of building a model based on past cases with known outcomes, and applying the model to make predictions for future cases. Machine learning engineers are responsible for developing the algorithms that can perform these tasks.
Machine learning overlaps with data science simply because it’s one of the best tools in the data scientist’s arsenal. When dealing with big data, for example, data is generated in such massive volumes that it becomes practically impossible for a data scientist to work on it. And this is when machine learning comes into play. An important part of machine learning is that it can process huge volumes of data autonomously without human intervention. Once it has been trained on existing data, it can work on its own, processing much more new data than a human being would be capable of in a fraction of the time.
Understanding the Overlap – And the Distinction
Machine learning engineers and data scientists embody two separate roles, but they are both part of the same team. It comes down to the split between scientist and engineer. A data scientist will be responsible for translating a business problem into a technical model that can be solved by analyzing data. The data scientist decides what data needs to be collected for the purpose, and will set to work looking for sources of the necessary data, creating pipelines for that data, and designing dashboards that make sense of it. There will come a point, however, when the data scientist will need a machine learning model to process the data. The data scientist may sketch out a prototyped model, but it will be the machine learning engineer who is responsible for building it.
In a recent interview for Springboard, Mansha Mahtani, a Data Scientist at Instagram, gave her take on the distinction between the two roles: “Given both professions are relatively new, there tends to be a little bit of fluidity in how you define what a machine learning engineer is and what a data scientist is. My experience has been that machine learning engineers tend to write production-level code. For example, if you were a machine learning engineer creating a product to give recommendations to the user, you’d be actually writing live code that would eventually reach your user. The data scientist would probably be a part of that process – maybe helping the machine learning engineer determine what are the features that go into that model – but usually data scientists tend to be a little bit more ad hoc to drive a business decision as opposed to writing production-level code.”
Machine learning and data science have a lot to do with one another, but they are not the same thing. The key thing to remember is that data science is a broad, overarching category that encompasses many different disciplines concerning how organizations manage data – from collecting it and cleaning it to refining it and putting it to use in the form of business insights. The definition of machine learning, on the other hand, is much narrower. Machine learning is about building machines that can put data through algorithms in order to discover patterns within it. Data scientists use machine learning, but it is a far more multidisciplinary role than that of a machine learning engineer.