0% found this document useful (0 votes)

48 views

HyperparameterTuninginMachineLearningAComprehensiveReview

The document is a comprehensive review of hyperparameter tuning in machine learning, emphasizing its critical role in optimizing model performance and generalization. It discusses various tuning techniques such as grid search, random search, and Bayesian optimization, while highlighting the importance of hyperparameters like learning rate and batch size. The review concludes with challenges and future directions for enhancing the effectiveness of machine learning models.

Uploaded by

ABOUMAJD AYOUB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

HyperparameterTuninginMachineLearningAComprehensiveReview

Uploaded by

ABOUMAJD AYOUB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/381255284

Hyperparameter Tuning in Machine Learning: A Comprehensive Review

Article in Journal of Engineering Research and Reports · June 2024

DOI: 10.9734/jerr/2024/v26i61188

CITATIONS READS

10 2,631

12 authors, including:

Justus Akinlolu Ilemobayo Olamide Isaac Durodola

Auburn University Auburn University
7 PUBLICATIONS 11 CITATIONS 22 PUBLICATIONS 41 CITATIONS

SEE PROFILE SEE PROFILE

Temitope Olanrewaju Adewumi Olumide Falana

University of Florida Auburn University
4 PUBLICATIONS 104 CITATIONS 30 PUBLICATIONS 57 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Olamide Isaac Durodola on 08 June 2024.

The user has requested enhancement of the downloaded file.

Journal of Engineering Research and Reports

Volume 26, Issue 6, Page 388-395, 2024; Article no.JERR.118312

ISSN: 2582-2926

Hyperparameter Tuning in Machine

Learning: A Comprehensive Review
Justus A Ilemobayo a, Olamide Durodola a*,
Oreoluwa Alade b, Opeyemi J Awotunde a,
Adewumi T Olanrewaju c, Olumide Falana a,
Adedolapo Ogungbire d, Abraham Osinuga e,
Dabira Ogunbiyi f, Ark Ifeanyi g, Ikenna E Odezuligbo h
and Oluwagbotemi E Edu i
a Samuel
Ginn College of Engineering, Auburn University, USA.
b Department
of Physics, North Dakota State University, USA.
c University of Florida, USA.
d Department of Civil Engineering, University of Arkansas, USA.
e Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, USA.
f Department of Biosystems and Agricultural Engineering, Oklahoma State, University, Stillwater, OK.

USA
g Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee,

USA.
h Creighton University, Nebraska, USA.
I Agricultural and Environmental Engineering Department, Obafemi Awolowo University, Nigeria.

Authors’ contributions

This work was carried out in collaboration among all authors. All authors read and approved the final
manuscript.

Article Information
DOI: https://doi.org/10.9734/jerr/2024/v26i61188

Open Peer Review History:

This journal follows the Advanced Open Peer Review policy. Identity of the Reviewers, Editor(s) and additional Reviewers,
peer review comments, different versions of the manuscript, comments of the editors, etc are available here:
https://www.sdiarticle5.com/review-history/118312

Received: 29/03/2024
Review Article Accepted: 03/06/2024
Published: 07/06/2024

_____________________________________________________________________________________________________

*Corresponding author: Email: [email protected];

Cite as: A Ilemobayo, Justus, Olamide Durodola, Oreoluwa Alade, Opeyemi J Awotunde, Adewumi T Olanrewaju, Olumide
Falana, Adedolapo Ogungbire, Abraham Osinuga, Dabira Ogunbiyi, Ark Ifeanyi, Ikenna E Odezuligbo, and Oluwagbotemi E
Edu. 2024. “Hyperparameter Tuning in Machine Learning: A Comprehensive Review”. Journal of Engineering Research and
Reports 26 (6):388-95. https://doi.org/10.9734/jerr/2024/v26i61188.
Ilemobayo et al.; J. Eng. Res. Rep., vol. 26, no. 6, pp. 388-395, 2024; Article no.JERR.118312

ABSTRACT

Hyperparameter tuning is essential for optimizing the performance and generalization of machine
learning (ML) models. This review explores the critical role of hyperparameter tuning in ML,
detailing its importance, applications, and various optimization techniques. Key factors influencing
ML performance, such as data quality, algorithm selection, and model complexity, are discussed,
along with the impact of hyperparameters like learning rate and batch size on model training.
Various tuning methods are examined, including grid search, random search, Bayesian
optimization, and meta-learning. Special focus is given to the learning rate in deep learning,
highlighting strategies for its optimization. Trade-offs in hyperparameter tuning, such as balancing
computational cost and performance gain, are also addressed. Concluding with challenges and
future directions, this review provides a comprehensive resource for improving the effectiveness
and efficiency of ML models.

Keywords: Hyperparameter tuning; learning rate; batch size; grid search; random search; Bayesian
optimization; meta-learning; neural networks.

1. INTRODUCTION analyze vast amounts of genetic data to identify

markers associated with diseases [5].
Machine learning (ML) is a subset of artificial
intelligence (AI) that enables systems to learn Performance in ML is a measure of how well a
and make decisions from data without being model generalizes from the training data to
explicitly programmed. This field has witnessed unseen data. It is crucial because a model that
explosive growth and application across various performs well on training data but poorly on new
industries, driven by the increasing availability of data is not useful in real-world applications.
data and advancements in computing power. Performance is typically evaluated using
From healthcare and finance to autonomous metrics such as accuracy, precision, recall, F1-
systems and natural language processing, ML score, and the area under the receiver
models are transforming how tasks are operating characteristic (ROC) curve,
performed, offering improved efficiency, depending on the specific task and data
accuracy, and insights. characteristics.

In healthcare, ML models assist in diagnosing High performance is essential for several

diseases, predicting patient outcomes, and reasons:
personalizing treatment plans. For instance,
models trained on medical imaging data can 1. Accuracy and Reliability: In critical
identify anomalies with a level of precision applications like medical diagnosis and
comparable to that of human experts, enabling autonomous driving, high-performing
early detection and intervention [1]. In finance, models are necessary to ensure accurate
ML algorithms are used for risk assessment, and reliable predictions. Errors in such
fraud detection, and automated trading, helping contexts can have severe consequences,
institutions manage risks and optimize returns including loss of life and financial loss.
[2]. In agriculture, ML has also been useful in 2. Efficiency: High-performance models can
agricultural task categorization (pre-harvesting, process data more efficiently, reducing
harvesting, and post-harvesting) [3]. Similarly, in computational costs and time. This
autonomous systems, ML is integral in enabling efficiency is particularly important in real-
vehicles to perceive their environment, make time applications where decisions need to
decisions, and navigate safely [4]. be made quickly [6].
3. User Trust and Adoption: Models that
The importance of ML lies not only in its ability to consistently perform well build trust among
automate complex tasks but also in its potential users and stakeholders, facilitating broader
to uncover patterns and insights from data that adoption and integration into operational
would be difficult, if not impossible, for humans to workflows. This trust is critical in sectors
detect. This capability is particularly valuable in where decisions based on ML predictions
fields like genomics, where ML models can have significant impacts [7].

389
Ilemobayo et al.; J. Eng. Res. Rep., vol. 26, no. 6, pp. 388-395, 2024; Article no.JERR.118312

Hyperparameters are the parameters that govern 1.1 Factors Influencing Performance of
the training process and structure of machine Machine Learning Models
learning models. Unlike model parameters, which
are learned during training, hyperparameters are Several factors influence the performance of
set before the training process begins. They play machine learning models. High-quality data that
a critical role in determining the performance of accurately represents the problem domain is
the model. Examples of hyperparameters include crucial. Data preprocessing steps, such as
the learning rate in neural networks, the number cleaning, normalization, and feature engineering,
of trees in a random forest, the depth of a enhance data quality. Additionally, having a large
decision tree, the penalty term in support vector dataset provides more information, enabling the
machines, momentum, learning rate decay, a model to learn better and generalize well [11].
gradual reduction in the learning rate over The choice of algorithm is also critical, as
time to speed up learning and regularization different algorithms have different strengths and
constant. are suitable for different types of problems.
Selecting an appropriate algorithm that aligns
The relationship between hyperparameters and with the problem's nature and data
performance is complex. Properly tuned characteristics is essential for achieving high
hyperparameters can lead to significant performance [12].
improvements in model performance, while
poorly chosen hyperparameters can result in Hyperparameter tuning is another significant
suboptimal models. For instance, in neural factor, as hyperparameters control the behavior
networks, the learning rate controls how quickly and complexity of the model. Properly tuned
the model updates its weights during training. A hyperparameters can lead to significant
learning rate that is too high can cause improvements in performance. For example, in
the model to converge too quickly to a deep learning, hyperparameters such as learning
suboptimal solution, while a learning rate that is rate, batch size, and the number of layers and
too low can make the training process units in the network can greatly affect the
unnecessarily slow [8]. In momentum, it gives the convergence and accuracy of the model [13].
direction of the next step with respect to the Model complexity, defined by its architecture and
previous step. the number of parameters, also affects
performance. A model that is too simple may
The importance of achieving high performance in underfit the data, failing to capture underlying
ML cannot be overstated. High-performing patterns, while a model that is too complex may
models are needed for: overfit, capturing noise and spurious correlations
[14].
1. Operational Efficiency: High performance
translates to better decision-making and Regularization techniques, such as L1 and L2
operational efficiency. In industrial regularization, dropout, and early stopping, help
applications, this means optimized prevent overfitting by adding constraints to the
processes, reduced downtime, and model. These techniques maintain a balance
increased productivity. between bias and variance, leading to better
2. Competitive Advantage: Businesses generalization [15,16]. Finally, the choice of
leveraging high-performing ML models can evaluation methods, such as cross-validation and
gain a competitive edge by offering better bootstrapping, influences the assessment of
products and services. For instance, model performance. Proper evaluation ensures
recommendation systems used by that performance metrics are reliable and not
companies like Amazon and Netflix rely on biased by the specificities of the training and test
high-performing models to provide datasets [17].
personalized experiences that keep
customers engaged [9]. 2. HYPERPARAMETER TUNING IN
3. Advancement of Research: In scientific MACHINE LEARNING
research, high-performing models enable
the discovery of new knowledge and Hyperparameter tuning is the process of finding
insights. For example, in drug discovery, the optimal set of hyperparameters that yield the
ML models can predict the efficacy of new best performance for a machine learning model.
compounds, accelerating the development This process is critical because hyperparameters
of new treatments [10]. control the learning process and the structure of

390
Ilemobayo et al.; J. Eng. Res. Rep., vol. 26, no. 6, pp. 388-395, 2024; Article no.JERR.118312

the model such as learning rate, the number of Hyperparameters in various machine learning
neurons in a neural network, or kernel size in algorithms include learning rate, batch size,
support vector machine, directly impacting its number of layers in neural networks,
performance. Unlike model parameters, which regularization parameters, number of tree and
are learned from the data, hyperparameters are depth of trees. The learning rate in gradient-
set before training and require careful based optimization algorithms determines the
selection. Hyperparameter tuning can step size during each iteration of the optimization
improve the performance and generalization of process. A suitable learning rate is crucial for
the model. ensuring that the model converges to a good
solution without overshooting or slow
The importance of hyperparameter tuning in convergence. In stochastic gradient descent
machine learning cannot be overstated. Proper (SGD), the batch size defines the number of
hyperparameter tuning can significantly enhance samples used to compute the gradient at each
model performance. For example, selecting the step. Smaller batch sizes can provide more
right learning rate in neural networks can speed accurate gradient estimates but may require
up convergence and improve accuracy [13]. more iterations to converge.
Additionally, hyperparameter tuning helps
achieve a balance between bias and variance, The architecture of neural networks, including the
thereby improving the model's ability to number of layers and the number of units per
generalize to unseen data. This is crucial for the layer, determines the model's capacity to learn
model's robustness and reliability in real-world complex representations. These
applications [14]. Moreover, by identifying hyperparameters must be carefully selected to
optimal hyperparameters, computational balance model capacity and computational
resources are used more efficiently, reducing efficiency. Regularization parameters, such as L1
training time and costs. This efficiency is and L2, control the penalty applied to the model's
particularly important for large-scale models and parameters, helping to prevent overfitting. L1
datasets [18]. regularization promotes sparsity, while L2
regularization discourages large parameter
Hyperparameters play a crucial role in the values.
performance of various machine learning
models. In neural networks, hyperparameters For support vector machines (SVM), kernel
such as learning rate, batch size, number of parameters such as gamma in the radial basis
layers, and number of units per layer significantly function (RBF) kernel influence the model's
influence the model's performance. Proper tuning ability to handle non-linearly separable data.
of these hyperparameters can lead to faster Proper tuning of these parameters is essential for
convergence and higher accuracy [8,19]. For achieving good classification performance. In
support vector machines, the penalty parameter random forests, the number of trees and the
(C) and the kernel parameters, such as gamma maximum depth of each tree determine the
in the RBF kernel, are critical in determining the model's complexity and its ability to capture
decision boundary and margin. Tuning these interactions between features. Proper tuning of
parameters enhances the model's ability to these hyperparameters can improve both
handle non-linearly separable data [20]. accuracy and generalization.

In decision trees and random forests, In NLP applications, determining the optimal size
hyperparameters such as the depth of the tree, of the word embeddings could impact the
the minimum samples per leaf, and the number accuracy of predictions. Proper tuning of
of trees in a random forest influence the model's hyperparameters like size of the context window
complexity and performance. Proper tuning of and dimension of the embeddings can help strike
these hyperparameters can prevent overfitting a balance between computational efficiency and
and improve generalization [21]. Similarly, in model performance [23].
gradient boosting machines like XGBoost, and
LightGBM, hyperparameters like learning rate, 2.1 Techniques Used for Hyperparameter
number of estimators, and maximum depth of Tuning
trees are essential for capturing complex
patterns. Tuning these parameters can Several techniques have been developed to
significantly enhance performance in predictive automate and optimize the hyperparameter
tasks [22]. tuning process. Grid search is a brute-force

391
Ilemobayo et al.; J. Eng. Res. Rep., vol. 26, no. 6, pp. 388-395, 2024; Article no.JERR.118312

technique that exhaustively searches over a neural architecture search, to create robust and
predefined set of hyperparameters. Although efficient models with minimal human intervention
straightforward and easy to implement, it can be [30].
computationally expensive, especially for large
hyperparameter spaces [24]. Random search 2.2 Learning Rate as a Hyperparameter
offers a more efficient alternative, sampling in Deep Learning
hyperparameters randomly from a distribution.
This method has proven to be more effective in The learning rate is one of the most critical
finding optimal hyperparameters as it explores a hyperparameters in deep learning, governing
larger and more diverse set of combinations [24]. how much to change the model in response to
the estimated error each time the model weights
Bayesian optimization is a probabilistic model- are updated. It directly influences the
based approach that builds a surrogate model to convergence rate and final performance of neural
approximate the objective function. It iteratively networks. Selecting an appropriate learning rate
selects the most promising hyperparameters to is crucial for training neural networks efficiently.
evaluate, balancing exploration and exploitation, There are several strategies to optimize the
making it particularly useful for optimizing learning rate.
expensive functions [18]. Genetic algorithms,
inspired by the process of natural selection, use Using learning rate schedules can help adjust the
a population-based approach to search for learning rate during training. Common schedules
optimal hyperparameters. They apply genetic include step decay, where the learning rate is
operators such as mutation, crossover, and reduced by a factor after a fixed number of
selection to evolve the population towards better epochs; exponential decay, where the learning
solutions, effectively exploring complex and large rate decreases exponentially; and cosine
hyperparameter spaces [25]. annealing, which uses a cosine function to
decrease the learning rate. Adaptive learning
Early stopping is a regularization technique that rates are another effective strategy. Algorithms
monitors the model's performance on a validation such as Adaptive Gradient Algorithm (AdaGrad),
set and halts training when performance starts to Root Mean Square Propagation (RMSProp), and
degrade, preventing overfitting and saving Adaptive Moment Estimation (Adam) adjust the
computational resources [26]. Hyperband is an learning rate based on the gradients. These
adaptive resource allocation and early-stopping adaptive methods help improve convergence by
strategy for hyperparameter optimization. It scaling the learning rate according to the
evaluates a large number of hyperparameter historical gradient information.
configurations and allocates more resources to
promising ones, effectively balancing exploration Cyclical learning rates involve periodically
and exploitation [27,19]. varying the learning rate between a lower and
upper bound. This approach can help escape
Meta-learning, or learning to learn, leverages local minima and saddle points, potentially
past experiences to accelerate the leading to better solutions. Another useful
hyperparameter tuning process. It uses technique is learning rate warm-up, where the
knowledge from previously optimized models to learning rate is gradually increased at the
inform the search for optimal hyperparameters in beginning of training. This method can stabilize
new tasks [28]. Multi-fidelity optimization training and prevent divergence, which is
techniques employ approximations of the especially useful when training large models or
objective function at different levels of fidelity to using large batch sizes [31].
speed up the hyperparameter tuning process. By
evaluating cheaper approximations first and 3. TRADE-OFFS TO CONSIDER WHEN
refining promising configurations with more PERFORMING HYPERPARAMETER
expensive evaluations, these methods
TUNING
significantly reduce computational costs [29].
Hyperparameter tuning involves several trade-
Automated Machine Learning (AutoML) aims to
offs that need to be considered. Balancing
automate the entire machine learning pipeline,
exploration and exploitation is crucial, as it
including hyperparameter tuning. AutoML
involves searching a wide range of
combines various optimization techniques, such
hyperparameters while also focusing on
as Bayesian optimization, meta-learning, and
promising regions. Techniques like Bayesian

392
Ilemobayo et al.; J. Eng. Res. Rep., vol. 26, no. 6, pp. 388-395, 2024; Article no.JERR.118312

optimization and Hyperband are designed to hyperparameter tuning is critical, especially

balance these two aspects effectively [19]. in environments with limited resources.
Another important trade-off is between Developing adaptive resource allocation
computational cost and performance gain. strategies that balance exploration and
Evaluating hyperparameters can be exploitation can enhance the efficiency of
computationally expensive, so it's essential to the tuning process [27].
consider the trade-off between the cost of tuning 5. Robustness: Ensuring the robustness of
and the potential performance gain. Efficient hyperparameter tuning methods against
methods like early stopping and multi-fidelity noisy evaluations and varying data
optimization can help mitigate these costs. distributions is essential for reliable model
performance. Robust optimization
Ensuring that the selected hyperparameters techniques that account for these
generalize well to unseen data is vital, uncertainties are necessary [35,36].
addressing the generalization versus overfitting
trade-off. Techniques like cross-validation and 4. CONCLUSION
regularization are effective in mitigating the risk
of overfitting during hyperparameter tuning. Hyperparameter tuning is a critical aspect of
Additionally, there is a trade-off between machine learning that significantly impacts model
complexity and interpretability. More complex performance. Proper tuning can lead to
models and tuning strategies can yield better substantial improvements in accuracy, efficiency,
performance but may be harder to interpret. and generalization. Various techniques, from
Balancing model complexity with interpretability simple grid search to advanced Bayesian
is essential, especially in domains where model optimization and meta-learning, offer different
transparency is critical. trade-offs in terms of computational cost and
performance gain. As machine learning models
3.1 Challenges and Future Directions and datasets continue to grow in complexity and
size, the development of efficient, scalable, and
Despite the advances in hyperparameter tuning, robust hyperparameter tuning methods remains
several challenges remain: an important area of research. Future
advancements promise to further enhance the
1. Scalability: As machine learning models performance and applicability of machine
become more complex and datasets grow learning across diverse domains.
larger, the scalability of hyperparameter
tuning methods is a significant concern. COMPETING INTERESTS
Developing scalable optimization
techniques that can handle large Authors have declared that no competing
hyperparameter spaces and massive interests exist.
datasets is crucial [32].
2. Interpretability: The interpretability of REFERENCES
hyperparameter tuning processes and their
outcomes is essential for understanding 1. Esteva A, Kuprel B, Novoa RA, Ko J,
model behavior and improving trust in Swetter SM, Blau HM, Thrun S.
machine learning systems. Methods that Dermatologist-level classification of skin
provide insights into the impact of cancer with deep neural networks. Nature.
hyperparameters on model performance 2017; 542(7639):115-118.
are needed [33]. 2. Heaton J, Polson NG, Witte JH. Deep
3. Integration with Neural Architecture learning in finance. arXiv preprint arXiv.
Search: Neural architecture search (NAS) 2017;1602:06561.
involves automatically designing neural 3. Meshram V, Patil K, Meshram V, Hanchate
network architectures. Integrating D, Ramkteke SD. Machine learning in
hyperparameter tuning with NAS can lead agriculture domain: A state-of-art survey.
to more efficient and effective model Artificial Intelligence in the Life Sciences.
development, but this integration poses 2021;1:100010.
significant computational and Available:https://doi.org/10.1016/j.ailsci.20
methodological challenges [34]. 21.100010
4. Resource Allocation: Efficiently allocating 4. Bojarski M, et al. End to end learning for
computational resources during self-driving cars [Preprint]; 2016.

393
Ilemobayo et al.; J. Eng. Res. Rep., vol. 26, no. 6, pp. 388-395, 2024; Article no.JERR.118312

Available:https://doi.org/10.48550/arXiv.16 16. Hutter F, Hoos HH, Leyton-Brown K.

04.07316 Sequential model-based optimization for
5. Libbrecht MW, Noble WS. Machine general algorithm configuration. In
learning applications in genetics and International Conference on Learning and
Genomics’, Nature Reviews Genetics. Intelligent Optimization (pp. 507-523).
2015;16(6):321–332. Springer, Berlin, Heidelberg; 2011.
DOI: 10.1038/nrg3920 17. Yang M, Lim MK, Qu Y, Li X, Ni D. Deep
6. Dean J, et al. Large scale distributed deep neural networks with L1 and L2
networks, advances in neural information regularization for high dimensional
processing systems; 2012. corporate credit risk prediction. Expert
Available:https://papers.nips.cc/paper_files Systems with Applications. 2023;213:
/paper/2012/hash/6aca97005c68f1206823 118873.
815f66102863-Abstract.html (Accessed: Available:https://doi.org/10.1016/j.eswa.20
May 2024) 22.118873
7. Rudin C. Stop explaining black box 18. Bates S, Hastie T, Tibshirani R. Cross-
machine learning models for high stakes validation: What does it estimate and how
decisions and use interpretable models well does it do it? Journal of the American
instead’, Nature Machine Intelligence. Statistical Association. 2023;1–12.
2019;1(5):206–215.
Available:https://doi.org/10.1080/01621459
DOI: 10.1038/s42256-019-0048-x
.2023.2197686
8. Goodfellow I, Bengio Y, Courville A. Deep
learning. Cambridge, MA: The MIT Press; 19. Snoek J, Larochelle H, Adams RP.
2017. Practical bayesian optimization of machine
9. Gomez-Uribe CA, Hunt N. The netflix learning algorithms. In Advances in neural
recommender system, ACM Transactions information processing systems.
on Management Information Systems. 2012;2951-2959.
2015;6(4):1–19. 20. NP, Sugave S. Optimizing machine
DOI: 10.1145/2843948 learning models: An adaptive
10. Vamathevan J, et al. Applications of hyperparameter tuning approach.
machine learning in drug discovery and International Journal of Intelligent Systems
development’, Nature Reviews Drug and Applications in Engineering.
Discovery. 2019;18(6):463–477. 2023;11(4):344–354.
DOI: 10.1038/s41573-019-0024-5 Available:https://ijisae.org/index.php/IJISA
11. Kotsiantis SB, Zaharakis I, Pintelas P. E/article/view/3532
Supervised machine learning: A review of 21. Hsu CW, Chang C-C, Lin C-J. A practical
classification techniques. Emerging guide to support vector classification;
Artificial Intelligence Applications in 2016.
Computer Engineering. 2007;160(1):3-24. 22. Breiman L. Machine learning. 2001;
12. Shalev-Shwartz S, Ben-David S. 45(1):5–32.
Understanding machine learning: From
DOI: 10.1023/a:1010933404324
theory to algorithms. Cambridge University
Press; 2014. 23. N P, Sugave S. Ensemble approach with
13. Bengio Y. Practical recommendations for hyperparameter tuning for credit
gradient-based training of deep worthiness prediction. 2022 IEEE 3rd
architectures. In Neural networks: Tricks of Global Conference for Advancement in
the trade (pp. 437-478). Springer, Berlin, Technology (GCAT); 2022.
Heidelberg; 2012. Available:https://doi.org/10.1109/gcat5536
14. Hastie T, Tibshirani R, Friedman J. The 7.2022.9971879
elements of statistical learning: Data 24. Shetty P, Udhayakumar R, Patil A, Manwal
mining, inference, and prediction. Springer M, Vadar PS, NP. Application of natural
Science & Business Media; 2009. language processing (NLP) in machine
15. Salehin I, Kang D-K. A review on dropout learning. 2023 3rd International
regularization approaches for deep neural Conference on Advancement in
networks within the scholarly domain. Electronics & Communication
Electronics. 2023;12(14):3106. Engineering (AECE); 2023.
Available:https://doi.org/10.3390/electronic Available:https://doi.org/10.1109/aece5961
s12143106 4.2023.10428345’

394
Ilemobayo et al.; J. Eng. Res. Rep., vol. 26, no. 6, pp. 388-395, 2024; Article no.JERR.118312

25. Bergstra J, Bengio Y. Random search for 31. Feurer M, Klein A, Eggensperger K,
hyper-parameter optimization. Journal of Springenberg JT, Blum M, Hutter F.
Machine Learning Research. 2012;13(2). Efficient and robust automated
26. Santos EC, Monteiro GL, Moura-Pires F. A machine learning. In Advances in neural
genetic algorithm for neural network information processing systems. 2015;
hyperparameter optimization. In 2962-2970.
Proceedings of the International 32. Goyal P. et al. Accurate, large minibatch
Conference on Artificial Neural Networks. SGD: Training ImageNet in 1 Hour;
2010;217-225. 2018.
27. Prechelt L. Early stopping-but when? In 33. Molnar C, Casalicchio G, Bischl B.
Neural Networks: Tricks of the trade (pp. Interpretable machine learning – A brief
55-69). Springer, Berlin, Heidelberg; 1998. history, state-of-the-art and challenges.
28. Li L, Jamieson KG, DeSalvo G, ECML PKDD 2020 Workshops. 2020;417–
Rostamizadeh A, Talwalkar A. Hyperband: 431.
A novel bandit-based approach to Available:https://doi.org/10.1007/978-3-
hyperparameter optimization. Journal of 030-65965-3_28
Machine Learning Research. 34. Elsken T, Metzen JH, Hutter F. Neural
2017;18(1):6765-6816. architecture search: A survey. Journal of
29. Andrychowicz M, Denil M, Gomez S, Machine Learning Research. 2019;20(55):
Hoffman MW, Pfau D, Schaul T, de Freitas 1-21.
N. Learning to learn by gradient descent by 35. Henderson P, Islam R, Bachman P,
gradient descent. In Advances in neural Pineau J, Precup D, Meger D. Deep
information processing systems. 2016; reinforcement learning that matters. In
3981-3989. Proceedings of the AAAI Conference on
30. Kandasamy K, Dasarathy G, Oliva JB, Artificial Intelligence. 2018;32(1).
Schneider J, Póczos B. Multi-fidelity 36. Friedman JH. Greedy function
bayesian optimisation with continuous approximation: A gradient boosting
approximations. In Proceedings of the 34th machine.’, The Annals of Statistics.
International Conference on Machine 2001;29(5).
Learning. 2016;48:1799-1808. DOI: 10.1214/aos/1013203451

© Copyright (2024): Author(s). The licensee is the journal publisher. This is an Open Access article distributed under the terms
of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.

Peer-review history:
The peer review history for this paper can be accessed here:
https://www.sdiarticle5.com/review-history/118312

395