An Intuitive Explanation of Data Science Concepts. Part I: DS, AI, ML, and DL in Simple Words
Data science is a sexy trend of the 21st century. It is evolving at an enormous speed and achieves new heights all the time. It comes as no surprise that more and more people are getting involved in this field. Data scientists, data engineers, machine learning engineers, business intelligence developers are only a few jobs in this sphere.
Unfortunately, many of those who are at the start of their data science path encounter difficulties dealing with certain concepts, terms, and algorithms. It is totally understandable – very few of us can do everything on the first try. The good news is that we live in the age of the Internet, where information is freely available online. You can find literally everything pretty quickly without even going out of your house.
This series of articles is aimed at collecting some of the essential information in one place. We will try to explain data science concepts, methodologies, metrics, etc., in simple words to make the lives of newcomers easier. We will concentrate on one topic at a time, starting with the most basic information and going deeper into complex methods and algorithms.
In this very first article, we will discuss the definition of data science and related fields. We will also explain the difference between data science and artificial intelligence.
Table of contents:
Let us start with artificial intelligence. Artificial intelligence (AI) is a field of computer science concerned with building intelligent machines, i.e., machines capable of performing tasks that humans typically do. AI has no standard definition among AI researchers; it is an expansive scientific field that includes various subfields. AI aims to simulate the human way of thinking and allows the machines to repeat our behavior.
What distinguishes AI from the usual programming? In standard programming, algorithms define what the machines can do and how they do it. The program cannot break these rules and cannot act beyond them. So, if anything unusual happens, the program will not be able to react accordingly. Consider a simple robot designed specifically for moving heavy objects. It may distinguish boxes from bags and apply different techniques to move them, but it will not be able to open the boxes and sort the items they contain.
On the other hand, a machine powered by AI is not explicitly programmed for a specific task. It learns a massive amount of data and can, therefore, act appropriately in different situations. Unlike rule-based algorithms, AI enables computers to understand the environment and make decisions based on what they observe. Such a robot may know how to move boxes, open them, sort the object inside, and much more.
Machine learning (ML) is a part of artificial intelligence. It is a discipline that provides computers the ability to learn new information and become better with training. Basically, machine learning is a set of techniques and algorithms that enable computers to act without being explicitly programmed. With enough data, you can create a model to analyze it and discover any hidden patterns or insights. The model can also be used for prediction and decision-making.
A typical machine learning example would be a spam filter. It uses many emails (normal ones and spam) to learn how to distinguish them. Your mail provider can then use the model to sort out unwanted letters.
While AI is a high-level concept that allows machines to think like humans, machine learning is the best AI implementation so far. ML has its subfields, each of which includes its tasks and algorithms.
Depending on the problem you are trying to solve, you may encounter different types of machine learning. Classification and regression – the two most popular machine learning tasks – require the data to have labeled answers. The algorithms will look at these answers and tune the model to match these labels. This type of machine learning is called supervised. Another type – unsupervised learning – uses unlabeled data. Typical unsupervised learning tasks include dimensionality reduction, clustering, anomaly detection, etc.
The third type of machine learning is called reinforcement learning. It resembles the way humans learn through trial and error. The core idea is surviving. Reinforcement learning is used when the problem is not directly related to data, but there is a constantly changing environment to live in. The model has to analyze the conditions around and make appropriate decisions.
There is one more field of machine learning – deep learning (DL). Deep learning concerns itself with one kind of ML algorithms – deep neural networks.
A neural network is a computing system that tries to replicate the human brain’s approach to analyzing data. It was inspired by biological neural networks that exist in our brains.
Simple neural networks consist of two layers of connected neurons. These are inputs and outputs. A neural network combines the inputs, applies mathematical transformations, and produces an output. It may have additional layers – hidden ones. Such neural networks are called deep neural networks. Hence, the name – deep learning. These hidden layers are used to perform additional computations and thereby improving the accuracy. There may be any number of hidden layers: from one to thousands.
A typical neural network with one hidden layer
Deep learning is, so far, one of the most potent approaches in machine learning. Deep neural networks allow computers to find patterns that are too complex for simple machine learning algorithms. Neural networks also work well with unstructured data: texts, images, sensor data. It makes them even more valuable, as traditional machine learning techniques require a lot of data preprocessing.
Finally, data science is an interdisciplinary field that uses different scientific methods to extract insights from data and apply them for various applications. Simply put, data science uses machine learning techniques for business.
While machine learning is mainly an academic discipline that deals with mathematics, statistics, and computer science, data science also takes care of the entire data processing pipeline: from data collection to its use in real-life applications.
Data scientists do not always need to be mathematicians. In fact, they often use existing libraries and frameworks that have implementations of most machine learning algorithms. However, data scientists need strong domain knowledge. They need to understand the nature of the data and underlying tasks to generate correct hypotheses and apply correct techniques.
The relationship between AI, ML, DL, and DS is represented visually on the diagram below. While deep learning is a subset of machine learning, which, in turn, is a subset of artificial intelligence, data science is a different field that only intersects these disciplines. This is because data science deals with many other problems apart from model training. However, the relationship is still solid, as data science widely uses AI, ML, and DL, and adds practical sense.
The relationship between data science, artificial intelligence, machine learning, and deep learning
To sum up, artificial intelligence is a broad discipline that focuses on allowing machines to replicate the human thought process and their behaviors. Machine learning is a subset of AI that deals with techniques to enable computers to learn from historical data. Neural networks are one of such methods. Deep learning is a field that works with deep neural networks – networks with hidden layers.
Data science combines all of these and uses them for real-life applications. Apart from usual academic disciplines, data science requires strong domain knowledge. And while there is a border between the definitions of these terms, they are still very much related and develop together.
The following article will take a look at several basic mathematical concepts used for developing machine learning algorithms. So, stay tuned!
How is Big Data related to data science?
Big data is a term for the strategies and technologies needed to gather, organize, transform and process large amounts of data. So large that traditional data processing techniques cannot handle them. Big data is more of an engineering concept, while artificial intelligence is at the intersection between engineering and computer science.
What are the types of machine learning?
Machine learning is usually divided into supervised, unsupervised, and reinforcement learning. Supervised learning uses data that already has labeled answers. Unsupervised learning deals with data without any labels. Reinforcement learning is mainly concerned with the environment. It tries to act accordingly to the situation. Finally, neural networks and deep learning can solve all kinds of tasks.
What are natural language processing and computer vision?
Natural language processing (NLP) and computer vision (CV) are two subfields of AI. NLP allows the computers to understand human language, while CV enables the computers to understand graphical data, like photos and videos. Both NLP and CV are very potent technologies, and they are widely used in various industries.
Why are neural networks so popular?
Neural networks are one of the most powerful machine learning techniques because of several reasons. First of all, neural networks can find complex patterns in data. Hidden layers make a deep neural network particularly flexible and allow it to analyze even the most complicated and nonlinear relations. Another reason is that neural networks can work well with unstructured data without much preprocessing.