Cross-validation

What is cross-validation?

Cross-validation is a technique used to evaluate the performance of a machine learning model with a limited amount of data. It is a more reliable method than a simple validation set since it diminishes the selection bias, which may arise if the validation set is not representative.

Cross-validation divides the dataset into k equal parts (k is a hyperparameter). It leaves one piece as a validation set and trains the model on other k-1 parts. It then measures the accuracy of the trained model using the validation set. The process repeats k times so that each piece becomes a validation set once. The average performance in all k iterations determines the model’s accuracy. 

Why is it important?

Cross-validation allows us to double-check the model’s performance, reduce overfitting, and determine hyperparameters. It proves to be extremely useful in the case of a limited dataset when there is no chance to obtain a representative validation sample.