Top 5 Challenges of the Model Training Phase in an ML Project Train Hard, Fight Easy

A machine learning project is a cool thing. A computer will be able to do fast and high-quality work instead of a human. Sounds great. The only thing to do is to actualize the project 🙂

Tricks, mistakes, and surprises await data scientists at every stage of the entire process. In this article, we will focus on the challenges that may be encountered during the Model Training phase. If you are curious about the challenges at other stages of a Machine Learning project, you can find them on our blog

We have already defined the Machine Learning problem, a suitable dataset has already been selected, it has been cleaned up and prepared for further work, and the ML model’s design is ready. All that remains is to train and tune the model before implementing this in a production environment. 

Well, let’s move on to an overview of the most common and frequent nuances that get in the way of making a good model.

Poor Documentation and Record-Keeping

Throughout the project, the involved employees should maintain documentation of their work. This should include the sequence of actions, the train of thought, the results obtained, and the tools used. The structure and chronology of records must be respected.

This general recommendation applies to most steps, but it is especially important at the stage of training the ML model. Here you should keep a record of all experiments and note significant details. 

Lack of such documentation leads to repeated work to recover the details. This is especially true of teamwork where colleagues do not need to retell the progress every time. Employees can refer to an always up-to-date source of information and not run experiments twice. Also, sloppy or too short documentation can confuse new team members. It will take them a long time to build a complete understanding of the work done and the current situation. 

In addition to successful results and improvements in your work, keep records of failed experiments. The absence of such details can send new colleagues along the path already traveled and time will again be wasted. Experiments without progress are also part of improving skills. It will also allow at the end of the project to make a qualitative analysis of the work.

Model versioning records should be clear and concise. Believe me, the names temp_01, temp_02, and temp_02+ are not a good idea.

Inaccuracies in Classification 

Particular attention should be paid to the results of training the Machine Learning model if it is a classification problem. You might think you made a great model, but it might not be so. Most often, such difficulties arise with unbalanced classes. Let’s take a closer look at this.

Imagine you are dealing with binary classification. The dataset consists of 95% of one class. A classifier marks all observations as first class. So there are no right labels for the second class but the accuracy equals 95% and is still “good enough”. But the model does not perform well. Accuracy is not the best metric for classification, much less when classes are unbalanced. Such cases should be monitored closely.

It is better to evaluate a trained Machine Learning model using other metrics. For example, precision and recall are good options. Additionally, F-score is a metric calculated using precision and recall. These methods of measuring model accuracy take into account the predictions within each class. For clarity and analysis of the binary classification results, use the confusion matrix. Another tool for assessing the quality of a classification problem with two classes is the ROC AUC. Examining the area under the curve will also provide insight into how well a trained model is performing.

Let’s say you’ve already figured out that the model provides poor results due to a large class imbalance. Then how do you train the model to take into account the specifics of the dataset? In this case, individual classes can be prioritized. Then the model will pay more attention to samples with the problem class. This step is not always a panacea, but it often helps to improve the classification. 

But bad results don’t always happen due to model training. Sometimes this is a signal to return to the stage of data preparation or model selection.

Selection of Hyperparameters

When a data scientist has the perfect data set and is confident in the choice of an algorithm, it still does not guarantee a great result. Even the most efficient algorithm will perform poorly if the hyperparameters are incorrectly selected. Model hyperparameters determine its performance. The ideal set for tuning the algorithm is selected experimentally. This is often done through manual trial and error. But this is not a very reliable and long-term way to achieve the best result. 

There are no universal hyperparameters that will fit all datasets, as they are different. There are automatic services and tools to simplify model customization. They combine different hyperparameter variations and measure the performance of each set. Thus, the best result will be with the most appropriate hyperparameters. For example, such a built-in Python tool is GridSearchCV. Similar cloud-based tools from Azure, Google, and AWS are also available. You need to minimize manual hyperparameter tuning to maximize productivity.

But if manual tuning takes place, then it is important not to make another mistake. Here we are talking about the simultaneous change of hyperparameters. If you select several hyperparameters at once, then you can miss important shifts in the behavior of the model. Sometimes even decreasing the parameter by one hundredth can lead to important signals in the change in the performance of the model. Therefore, it is important to select the hyperparameters in turn and monitor the results.

Reusing the Same Test Set 

Traditionally, test datasets are used to validate a trained model. Sometimes it happens that the accuracy of both the training and the test set is good enough. But as soon as the model deals with a new dataset, performance decreases. There may be several reasons, let’s discuss the one related to model training. 

The results are tailored to specific data when using the same dataset in the same places to tune hyperparameters and other parameters of the algorithm. It is important to modify datasets to see the real picture of the model’s performance. If it is possible to increase the training and test sets over time, then this must be done. Thus, the algorithm will be generalized and more flexible.

In the case when the data is limited, you can use other methods. For example, a cross-validation approach can be used. It helps to evaluate the model using different test sets while using the same data set. It is also a good approach to shuffle the data when split into train and test each time before a new experiment. This will allow you to have infinitely many different test data sets for control.

Insufficient Infrastructure Capacity

The big challenge in training a machine learning model can be the existing infrastructure. It is necessary to consider in advance the throughput of the system before starting experiments. 

There are two challenges. The first is the size and complexity of the dataset. If it is very large and the infrastructure is weak, then the time for model training can reach hours or even days. This is not always productive and limited in opportunities. Unforeseen lengthy training can shift the existing project timeline. The second is not an optimally selected algorithm. Some models process large datasets much slower than others. Therefore, it is important to consider the speed of the machine learning model in the previous step of choosing an algorithm. In order not to have time issues at the training stage. 

It is a good habit to keep track of the model training time. As a result, you will have a processing time as an additional parameter for evaluating the algorithm in addition to its metrics.

Still, if the model is chosen and the dataset is very large, how to avoid extended training times? You can try to take only part of the dataset. It is important to note that the sub-dataset must be representative. This should be selected in all appropriate proportions with the master dataset.


We looked at a few challenges that data scientists often come across in the training phase of a machine learning model. These are not the only problems that can be encountered but they are the most common ones. But if you follow some of the rules above, then the machine learning project will become more effective and less resource-intensive. Feel free to experiment during the training phase of the model, but consider your capabilities and resources. Also, keep taking notes throughout the project, even if they involve failed experiments. Always consider the specifics of your dataset. And then step by step you will have a good project. 

The next big step is the Operationalization Phase. You can follow the link for interesting details of this stage.