MTH 522 – 11/08/2023
For tasks involving regression and classification, decision trees are used as supervised machine learning tools. Its capacity to handle both numerical and categorical data, as well as its user-friendly interface and powerful visualization, make it useful for predictive modeling and decision-making. Recursive partitioning, in which the dataset is partitioned successively according to particular features or criteria, is a process used in the development of decision trees.
The best feature in the dataset is found by the algorithm, which then divides the data at each internal node. The goal is to identify the feature that minimizes error or impurity and best divides the data into homogeneous groups.The splitting criterion is utilized to determine the best way to divide the data according to the selected feature.While mean squared error (MSE) is frequently used in regression projects, Gini impurity and entropy are common criteria for classification tasks. Until a stopping requirement is met, the cycle of feature selection, splitting, and child node formation repeats itself. A maximum tree depth, a minimum number of samples needed to generate a node, or a maximum number of leaf nodes could be included in this criterion.