Data preparation

What is data preparation?

Data preparation (or data preprocessing) is the process of transforming raw data into a form that can be analyzed and used for visualization and models building. Data preparation deals with such issues as missing and incomplete records, outliers, non-standardized values, incorrectly formatted values, etc. To fight all of these problems, data preparation includes pre-processing, profiling, cleansing, validation, transformation, and enrichment.

Why is it important?

Data preprocessing is an essential part of any data analysis project. Since no data comes immediately in the correct form, it must be prepared for further analysis. Many machine learning algorithms can work only with data in a specific format, making it impossible to use such data without preprocessing. Apart from that, missing and incorrect values can significantly reduce the accuracy of the model. Data preparation allows overcoming all these issues and producing accurate models.