Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Case Report
Case Series
Editorial
Editorial I
Editorial II
Original Article
Review
Review Article
Systematic Review
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Case Report
Case Series
Editorial
Editorial I
Editorial II
Original Article
Review
Review Article
Systematic Review
View/Download PDF

Translate this page into:

Original Article
19 (
4
); 14-24
doi:
10.25259/OJS_8970

Comparison of statistical and machine learning methods for survival prediction of diabetic foot ulcers

Department of Community Medicine, Dow International Medical College, Dow University of Health Sciences, Karachi, Pakistan.
Department of Statistics, University of Karachi, Karachi, Pakistan.

*Corresponding author: Syed Muhammad Adnan, Department of Community Medicine, Dow International Medical College, Dow University of Health Sciences, Karachi, Pakistan. syed.adnan@duhs.edu.pk

Licence
This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-Share Alike 4.0 License, which allows others to remix, transform, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.

How to cite this article: Adnan SM, Fatima S. Comparison of statistical and machine learning methods for survival prediction of diabetic foot ulcers. Int J Health Sci (Qassim). 2025;19:14-24. doi: 10.25259/OJS_8970

Abstract

Objectives:

This prospective cohort study aims to provide the first survival prediction of diabetic foot ulcers (DFUs) of diabetes mellitus type 2 patients using machine learning (ML) and statistical prediction methods.

Methods:

A total of 319 DFU patients who visited the Diabetic Foot Care Clinic at Dow University Hospital in Karachi, Pakistan, were selected for the study. The selected participants were followed from October 2024 to January 2025 to monitor the occurrence of diabetic foot amputations (DFAs). ML - artificial neural network and statistical method (survival analysis) are employed to predict the likelihood of DFA occurring after 5 years of a DFU diagnosis. The accuracy of both models is compared using the receiver operating characteristic (ROC) curve and performance matrix.

Results:

In this study, 319 DFU patients’ data consisting of 220 (68.96%) males and 99 (31.04%) females are taken. During the follow-up period, 31 (9.72%) patients experienced DFAs, while 288 (90.28%) remained unaffected by foot amputation. Gender, age, body mass index, blood pressure systolic, total cholesterol, high-density lipoprotein, low-density lipoprotein, triglyceride, hemoglobin A1c, creatinine, neuropathy, nephropathy, retinopathy, and cardiovascular disease are statistically significant at the 5% level of significance. The ML model performance of the training data was 0.9417, and that of the testing data was 0.9271. The analysis shows that the accuracy of the ML model (area under the curve [AUC] = 0.850) was higher than that the accuracy of the statistical model (AUC = 0.782).

Conclusion:

The ML model provides better performance for predicting the DFA as compared to the traditional statistical model of survival analysis.

Keywords

Artificial neural network
Diabetic foot amputations
Diabetic foot ulcers
Dow University Hospital
Machine learning

INTRODUCTION

Diabetes mellitus (DM) is a disease that occurs when the body does not produce enough insulin or does not use it properly. In the current era, the prevalence of diabetes and its complications increases exponentially worldwide.[1] The worldwide facts and figures of diabetes show a rapidly increasing burden of diabetes in individuals, families, and countries. The International Diabetes Federation (IDF) Diabetes Atlas (2021) reveals that around 10.5% of the mature population (20-79 years) is affected by diabetes; among them, around half of the individuals are unaware of diabetes. In Pakistan, from 2011 to 2021, the estimated prevalence of diabetes increased to 30.8%. Urbanization, overage, physical inactivity, and overweight are the significant key factors of diabetes mellitus type 2 (T2DM).[2] Diabetic foot ulcer (DFU) is a significant factor of diabetes that leads to infection, foot amputation, and mortality. It is noted that a direct relationship between the increased number of cases of diabetes with DFUs may lead to foot amputations and deaths. However, a complex pathogenesis of DFUs plays a major role at different stages of the disease. The prevalence of DFUs has been rising rapidly worldwide. In addition, the medical costs for diabetic patients and their families rise when they are admitted to the hospital.[3] Annually, approximately 18.6 million adults are diagnosed with DFU; among them, approximately 80% of the population experience lower extremity foot amputation which significantly elevates the risk of mortality.[4] However, in Pakistan, the pooled prevalence of DFUs increased from 12.16% to 19.54% from 2011 to 2022 due to physical inactivity, obesity, and other environmental factors.[5] The likelihood of developing DFUs associated with DM lies between 15% and 25% throughout the lifespan. These patients usually needed DFAs after 4 years of the initial diagnosis of DFU.[6] The probability of foot amputation is higher in diabetes patients compared to non-diabetic individuals. Globally, every 30 seconds, a limp amputation occurs, which is mostly associated with lower limb peripheral artery disease (PAD). The DFA not only influences the quality of life (QoL) of diabetic patients but also increases the rate of mortality. According to estimates, the rate of mortality in DFU patients is almost double compared to patients diagnosed with diabetes without DFUs. The survival rate has decreased to 56% in DFU patients compared to those without DFU.[7] DFU patients are further affected by various types of comorbidities, such as PAD, cardiovascular disease, nephropathy, neuropathy, and retinopathy. DFU not only influences the QoL but also rapidly increases the costs of medication due to continuous treatment. Hence, prolonged cases of DFUs can result in minor or major foot amputations and further increase the risk of mortality.[8]

In recent years, with the significant development of technology, artificial intelligence (AI) and machine learning (ML) models have gained significant popularity. These techniques are being applied across various disciplines, particularly in the field of medicine, to solve classification and control problems. Several studies have compared the accuracy of statistical models with artificial neural network (ANN) models.[9,10] Whereas many studies have focused on the outcomes of DFUs,[11-15] studies on predicting the survival of DFUs by comparing the accuracy of statistical and ML methods remain limited. The ML model concentrates on data prediction, while statistical models not only predict but also provide insight information regarding the relationship between the variables.[16] It is important to assess the performance of well-designed statistical models alongside ML in the context of their predictive accuracy for clinical outcomes, such as survival of patients and response to treatment.[17]

The core objective of the study is to examine the causes of DFAs and predict survival rate using a statistical model for the selected data. The predictive performance of the survival rate of DFA is also compared with ML models.

MATERIALS AND METHODS

Study design

An observational study was conducted at Dow University Hospital, Karachi, Pakistan. A total of 319 T2DM patients were involved in this study, associated with DFUs from October 2024 to January 2025. A neurothesiometer device and Wagner’s classification of DFU method were used to diagnose and classification of DFUs, respectively. Furthermore, inclusion criteria were based on T2DM patients associated with DFUs. All DFA patients, pregnant women, or any other type of endocrine disorder patients were excluded from the study. A comprehensive database structure was made with the teamwork of a trained head nurse of diabetic foot and expert and trained physicians.

Data acquisition

The study included numerous continuous variables such as age, hemoglobin AIc (HbA1c) (%), creatinine (Cr) (mg/dL), body mass index (BMI) (kg/m2), high-density lipoprotein (HDL) (mg/dL), low-density lipoprotein (LDL) (mg/dL), total cholesterol (TC) (mg/dL), triglyceride (TG) (mg/dL), and blood pressure (BP) (mmHg).

Numerous categorical variables also included gender, obesity, presence of neuropathy, nephropathy, retinopathy, and classification of DFUs and DFAs.

Outcome Definition

The outcome variable was a binary variable, i.e., prediction of DFAs after 5 years of suffering from DFUs.

Preprocessing

Continuous variables were normalized in the range from 0 to 1. All missing values (missing laboratory investigation) were removed from the data set, i.e., the data set reduced from 321 to 319 patients.

Statistical model

Survival analysis

Survival analysis is a statistical method used to examine data where the outcome of interest is the time until an event occurs. It often involves dealing with censored data and provides insights into the probability of survival or failure over time. Survival analysis focuses on data that measure the duration from a starting point to the occurrence of a specific event (e.g., death, failure, or relapse). Common applications include studying survival rates in cancer patients, evaluating treatment effectiveness, and analyzing disease progression. A key characteristic of survival data is that not all subjects experience the event of interest within the study period, which results in censored data, where the exact time of the event is unknown for some subjects. The survival function, S(t), is defined as:

S(t) = P(T > t) (1)

here S(t) gives the probability that an individual will survive (or the event will not occur) beyond time t, “T” is a random variable representing the time until the event of interest, and “P” represents the probability of an event occurring.[18] A hazard ratio (HR) compares the instantaneous risk of an event occurring in one group versus another, often used in survival analysis to assess the effect of an intervention or risk factor over time “t.” The HR can be calculated as:

HR = Hazard Rate in Treatment Group/Hazard Rate in Control Group

  • An HR of 1 indicates no difference in the hazard rate between the two groups

  • An HR >1 suggests a higher hazard rate (and therefore, a higher risk) in the group in the numerator compared to the group in the denominator.

  • An HR <1 suggests a lower hazard rate (and therefore, a lower risk) in the group in the numerator compared to the group in the denominator.[19]

The Cox proportional hazards model is a common statistical method used to estimate HRs and assess the relationship between a predictor variable and the time of an event. The Cox regression model can be written as:

ht|x=h0t*expβ1x1+β2x2++βpxp 2

where h(t|x) is the hazard rate at the time “t” for an individual with covariates “x”, “h0(t)” is the baseline hazard rate, and “β” is the regression coefficient.[20]

A Kaplan-Meier curve, also known as a survival curve, is a graphical representation of the probability of surviving (or experiencing an event) over time, often used in medical research to analyze time-to-event data, particularly when dealing with censored data.[21]

ML model

The use of AI systems for medical diagnosis, especially ANN and computer-aided diagnosis, is a rapidly developing field in medicine and is anticipated to become more prevalent in biomedical systems. ANN and other ML techniques are being used by healthcare organizations to enhance care, widely utilized for diagnosis at lowering costs and for prediction. An ANN is a set of algorithms designed to recognize patterns mimicking the human brain.[22]

An ANN can have one or more layers and is made up of processing units (nodes or neurons) coupled by a set of movable weights that permit signals to pass through the network both sequentially and in parallel. Three layers of neurons make up an ANN in general: Input (which receives information), hidden (which extracts patterns and handles the majority of internal processing), and output (which generates and displays the network’s final outputs).[23]

The architecture of the ANN model was based on three layers: Input layers consist of the existing patient data, the hidden layers use back-propagation method to optimize the weights (w1, w2, w3,…) of the input variables x1, x2, x3., to increase the predictive power of the model, and the output layers consist on the prediction of DFA with a sigmoid activation function used in the model, i.e.,

g(x) = 1/(1+ex) (3)

where g(x) shows sigmoid function with respect to input variable, “x” and “e” represent Euler’s number, which is the base of the natural logarithm.

The output of a sigmoid neuron with inputs x1, x2..., weights w1, w2,…, and bias “b” can be written as:

ƒw,b(x) = 1/(1+e−(w.x+b) (4)

The ANN model was constructed using R software and library caret, neuralnet, tidyverse, and ROCR were used for data classification and to make performance evaluation matrix. The ANN model can be written as:

model<-neuralnet(Foot_Amputation~Age+ HbA1c+HDL+Cr+BMI,data=training_data,hidden=5,e rr.fct=“ce”,linear.output=FALSE,lifesign=’full’, rep=2,algorithm m=“rprop”,stepmax=100000) (5)

The ML model was developed to predict whether DFU patients will transform to DFA in the next 5 years after diagnosis of DFU and to determine the performance of the ML model using receiver operating characteristic (ROC) curve, confusion matrix, and misclassification error.

Variable importance

In R software, ML models can be implemented, and the importance of each predictor can be identified using the varImp function. The performance of the model is then assessed through an ROC curve, as shown in Figure 1.

Variable importance for 5-year model prediction. LDL: Low-density lipoprotein, CVD: Cardiovascular disease, HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c.
Figure 1:
Variable importance for 5-year model prediction. LDL: Low-density lipoprotein, CVD: Cardiovascular disease, HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c.

RESULTS

The data consists of 319 DFU patients after removing missing values, of which 220 (68.96%) were male and 99 (31.04%) were female. During the follow-up period, 31 (9.72 %) patients were affected by DFAs, while 288 (90.28 %) patients were unaffected by any minor or major foot amputation. The mean age of DFA cases was 71.23 ± 1.21 years higher than the without DFA patients, 55.02 ± 1.98 years (P = 0.03). Similarly, HbA1c of DFA patients was 11.65 ± 1.78 also higher than the without DFA patients, 8.16 ± 3.65 (P < 0.001). Furthermore, an independent sample t-test is used after verifying the normality assumption using the Shapiro-Wilk test of normality [Table 1].

Table 1: Characteristics of study population.
Factors Diabetic foot amputation -value of Shapiro-Wilk test of normality -value of independent sample t-test and Chi-squared test
Present Not present
n=31 n=288
Gender
  Male: Female 26:5 194:94 -- 0.04
Age (years) 71.23±1.21 55.02±1.98 0.01 0.03
BMI (kg/m2) 29.32±2.12 24.10±1.32 0.04 <0.001
BP Systolic (mmHg) 160.02±1.21 140.2±0.92 <0.001 0.00
BP Diastolic (mmHg) 90.93±1.78 80.98±1.98 0.04 0.54
TC (mg/dL) 309.12±3.12 219.23±2.98 0.04 0.00
HDL (mg/dL) 32.78±0.32 50.65±2.33 <0.001 <0.001
LDL (mg/dL) 190.56±1.65 135.32±1.52 0.03 <0.001
TG (mg/dL) 325.65±1.69 225.25±0.56 0.00 0.04
HbA1c (%) 11.65±1.78 8.16±3.65 <0.001 <0.001
Cr (mg/dL) 1.80±0.65 1.10±1.45 0.01 0.01
Active smokers (Yes: No) 23:8 40:248 -- <0.001
Obesity (Yes: No) 28:3 86:202 -- <0.001
Neuropathy
  Yes 22 (70.97%) 24 (8.33%) -- <0.001
  No 9 (29.03%) 264 (91.67%)
Nephropathy
  Yes 18 (58.06%) 12 (4.17%) -- <0.001
  No 13 (41.94%) 276 (95.83%)
Retinopathy
  Yes 15 (48.39%) 19 (68.05%) -- 0.05
  No 16 (51.61%) 269 (31.94%)
CVD
  Yes 18 (58.06%) 2 (0.69%) -- <0.001
  No 13 (41.94%) 286 (99.31%)

Authors’ calculation: Statistics presented: mean±SD; n(%). P-values are statistically significant at the 5% level of significance. BMI: Body mass index, BP: Blood pressure, TC: Total cholesterol, HDL: High-density lipoprotein, LDL: Low-density lipoprotein, TG: Triglyceride, HbA1c: Hemoglobin A1c, Cr: Creatinine, CVD: Cardiovascular disease

The selected variables, such as gender, age, BMI, BP systolic, TC, HDL, LDL, TG, HbA1c, Cr, neuropathy, nephropathy, retinopathy, and cardiovascular disease (CVD), are statistically significant predictors as the p-value is less than 0.05. Whereas, the correlation suggests that the intra-correlation between the continuous variables is not greater than or equal to 0.9 [Figure 2].

Pearson correlation matrix between the selected continuous variables. Shades and circles represent the strength and direction of correlations between variables. HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c, Cr: Creatinine.
Figure 2:
Pearson correlation matrix between the selected continuous variables. Shades and circles represent the strength and direction of correlations between variables. HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c, Cr: Creatinine.

Figure 3 represents the Kaplan-Meier curves for DFAs and the probabilities of overall DFAs with respect to gender, smoking status, obesity, hypertension, neuropathy, nephropathy, and retinopathy. The overall amputation risk is found to be 9.72%, with a higher risk in males (12.3%) compared to females (7.1%).

Kaplan-Meier survival curves of diabetic foot amputation with associated risk factors. Blue line represents survival probabilities. Green line represents risk factors.
Figure 3:
Kaplan-Meier survival curves of diabetic foot amputation with associated risk factors. Blue line represents survival probabilities. Green line represents risk factors.

Hypertensive patients show a higher risk of DFAs (43.9%) compared to non-hypertensive patients (2.3%).

Smoking status is also significantly linked with the risk of DFAs, with the risk being significantly higher in active smokers (36.5%) compared to non-smokers (3.1%).

The probability of amputation is higher in nephropathy patients (12.1%) compared to those without nephropathy (6.3%).

The probability of amputation is also greater in patients with neuropathy (24.1%) than in those without neuropathy (4.3%).

The hazard of amputation is also higher in retinopathy patients (29.2%) than in those without retinopathy (4.0%).

Similarly, the risk of amputation is higher in obese cases (24.6%) than in non-obese patients (1.5%).

Table 2 represents the HR with a 95% confidence interval of the predictors of DFAs. All predictors are statistically significant except diastolic BP. The high HR suggests that age, BMI, HbA1c, and Cr are the most significant risk factors for DFAs.

Table 2: HR of factors associated with diabetic foot amputation.
Factors P-value (Wald Chi-square) HR P-value (HR) 95% CI for HR
Age 0.97 5.32 <0.001 2.32-5.63
BMI 0.11 4.02 <0.001 1.89-4.26
BP Systolic 0.66 1.98 0.03 1.01-2.03
BP Diastolic 0.79 1.28 0.53 0.75-1.30
TC 0.96 2.32 0.01 1.63-2.52
HDL 0.46 1.09 0.00 0.79-1.16
LDL 0.92 1.56 0.01 1.12-1.71
TG 0.78 1.98 0.04 1.03-2.06
HbA1c 0.46 8.56 <0.001 4.23-9.02
Cr 0.99 5.69 <0.001 2.65-5.97

Authors’ calculation: BMI: Body mass index, BP: Blood pressure, TC: Total cholesterol, HDL: High-density lipoprotein, LDL: Low-density lipoprotein, TG: Triglyceride, HbA1c: Hemoglobin A1c, Cr: Creatinine, HR: Hazard ratio, CI: Confidence interval. P-values are statistically significant at the 5% level of significance.

Figure 4 provides insight information regarding the cases of DFUs and DFAs visited at Dow University Hospital.

Patients with diabetic foot ulceration and amputation at Dow University Hospital, Karachi, Pakistan.
Figure 4:
Patients with diabetic foot ulceration and amputation at Dow University Hospital, Karachi, Pakistan.

The ROC curve depicts that the AUC of statistical model is 0.782 [Figure 5].

Receiver operating curve (ROC) for statistical model.
Figure 5:
Receiver operating curve (ROC) for statistical model.

Whereas, the overall accuracy of the statistical model is 89.34%, as the P-value is 0.019 and 95% confidence interval (CI) is 0.8536-0.9213 [Table 3].

Table 3: The performance matrix of the 5-year foot amputation prediction statistical model.
Actual values
WDFA DFA
WDFA 260 5
DFA 29 25
Accuracy (95% CI, P-value) 0.8934 (0.8536: 0.9213, 0.019)
Sensitivity (95% CI) 0.8996 (0.8412: 0.9213)
Specificity (95% CI) 0.8333 (0.8089: 0.8512)
PPV (95% CI) 0.9811 (0.9315: 0.9985)
PNV (95% CI) 0.4630 (0.4256: 0.4821)
F1 score (95% CI) 0.9386 (0.9132: 0.9578)
AUC 0.782

Authors’ calculation: PPV: Predictive positive value, PNV: Predictive negative value, DFA: Diabetic foot amputation, WDFA: Without diabetic foot amputation, AUC: Area under the curve, CI: Confidence interval. P-values statistically significant at 5% level of significance

ML approach

The dataset is divided into a training set comprised of 223 (70%) patients and a test set based on 96 (30%) patients [Table 4].

Table 4: The performance matrix of the 5-year foot amputation prediction ML model.
Predicted values Training set Test set
Actual values Actual values
WDFA DFA WDFA DFA
WDFA 196 5 81 3
DFA 8 14 4 8
Accuracy (95% CI, P-value) 0.9417 (0.8912: 0.9685, <0.001) 0.9271 (0.8639: 0.9478, 0.026)
Sensitivity (95% CI) 0.9608 (0.9056: 0.9809) 0.9529 (0.9056: 0.9809)
Specificity (95% CI) 0.7368 (0.7025: 0.7621) 0.7273 (0.6956: 0.7545)
PPV (95% CI) 0.9751 (0.9160: 0.9912) 0.9643 (0.9287: 0.9789)
PNV (95% CI) 0.6364 (0.6056: 0.6512) 0.6667 (0.6178: 0.6765)
F1 score (95% CI) 0.9679 (0.9396: 0.9865) 0.9586 (0.9256: 0.9732)
AUC 0.847 0.850

Authors’ calculation: ML: Machine learning, PPV: Predictive positive value, PNV: Predictive negative value, DFA: Diabetic foot amputation, WDFA: Without diabetic foot amputation, AUC: Area under the curve, CI: Confidence interval. Pvalues statistically significant at 5% level of significance

After selecting significant predictors from statistical model, we then developed ML model to predict DFA. The ML model has five neurons in its five hidden layers. The black lines represent the networks with weights (w1, w2, w3…). The weights are calculated using the backpropagation algorithm. The blue line shows the bias (b) term (constant in a regression equation) [Figures 6 and 7].

Training plot. HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c, Cr: Creatinine.
Figure 6:
Training plot. HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c, Cr: Creatinine.
Test plot. HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c, Cr: Creatinine.
Figure 7:
Test plot. HDL: High-density lipoprotein, BMI: Body mass index, HbA1c: Hemoglobin A1c, Cr: Creatinine.

The ROC curve for the training and test sets is also visualized in Figure 8.

Receiver operating curve for the training and test sets.
Figure 8:
Receiver operating curve for the training and test sets.

Comparison between statistical and ML methods

The overall results suggest that the ML model was a better predictive model than the statistical model, i.e., the accuracy of the ML model (AUC = 0.850) is higher than the traditional statistical model (AUC = 0.782) [Tables 3 and 4]. The accuracy of the results was based on the statistically significant predictor’s age, HbA1c, HDL, Cr, and BMI. All predictors increased the chances of the development of DFA statistically and clinically. The statistical model requires certain assumptions to determine the relationship between the variables while ML makes accurate predictions by identifying the pattern of the data without following strong assumptions about the underlying relationship. ML methods have some limitations such as data dependency, risk of bias, and data complexity and interpretability.

DISCUSSION

The present study proposes a survival prediction in foot ulceration patients associated with a long duration of diabetes. According to past studies, this is the first study in Pakistan to develop ANN model to predict DFA in DFU cases and to compare the accuracy of outcomes with the survival statistical model. The results indicate that it is likely to predict with high accuracy whether DFU patients will lead to foot amputation in the next 5 years after being diagnosed with DFU with non-invasive and low-cost clinical and biological predictors.

Initially, non-invasive biological predictors related to the prediction model of 5-year outcomes of DFA, such as age, gender, HbA1c, HDL, LDL, TG, Cr, BMI, BP (systolic and diastolic), smoking status, classification of obesity, neuropathy, nephropathy, retinopathy, and CVD, are selected. However, these predictors were also used in the scientific literature regarding foot amputation and mortality after diagnosis of foot ulceration in diabetes patients.[24-27] Conversely, this current study investigates the intra correlation between the continuous predictors before implementing the AI/ ML approach. No strong intra correlation (r > 0.9) is found between the continuous variables.[7] The Cox regression HRs identify the significant predictors empirically. The model suggests that age, BMI, HbA1c, and Cr are strong predictors of continuous variables as compared to other indicators. Furthermore, Kaplan-Meier survival curve was developed to assess overall survival probabilities as well as survival probabilities concerning gender, hypertension, smoking status, obesity, neuropathy, nephropathy, and retinopathy. All these factors positively affected to DFUs, leading to foot amputation and increasing the risk of mortality. The accuracy of the 5-year prediction model was 0.9417 for the training set and 0.9271 for the test set, while the accuracy of the survival statistical model was 0.782, which is lower than the training and test set accuracy of the AI model. The AUC, sensitivity, specificity, predictive positive value, predictive negative value, and F1 score were all found to be satisfactory.[7]

The present study suggests that older age, male gender, obesity, systolic BP, abnormal lipid profiles, poor glycemic control, abnormal Cr levels, active smoking, and the presence of neuropathy, nephropathy, retinopathy, and CVD are statistically significant predictors of DFA. Similar literature also concluded that the same predictors increase the likelihood of foot amputation and mortality in diabetic patients.[8] The findings of the current study provide multiple strategies for the management of DFU. These strategies can bring about positive changes in the treatment of DFUs and improve overall health.[28] A study based on 1,428 diabetic patients found that DFUs are a major cause of comorbidities and mortality in diabetic patients. Individuals diagnosed with DFUs significantly increase the absolute risk of foot amputation, with mortality rising from 107.7/1,000 person-years to 33.7/1,000 person-years.[29] CVD and DFUs have a positive relationship, with prevalence increasing to 60% of cases.[30] A recent meta-analysis explained that there is a 26% pooled prevalence between the DFUs and CVDs.[31] Furthermore, CVDs have 2.22 times increased the chances of death in patients diagnosed with DFU (relative risk = 2.22, 95% CI 1.09-4.53). CVDs increase the chances of foot amputation and mortality in DFU patients compared to those without DFU.[32] The DFA and mortality related to CVDs are 40% higher in 5-year-old DFU patients.[33] A study conducted with 655 DFU patients revealed that proper care of foot ulceration decreased the chances of CVD-related deaths from 26.8% to 48.0%.[15] A previous study based on survival analysis has explained a 5-year survival rate of 49.7% (95% CI 44.8-54.6%) following DFU infection.[34] Another study also suggested that the risk of DFU increases the chances of DFA and mortality in diabetes patients, which is similar to the findings of the current study output.[27]

The comparison between survival analysis and ML methods has become increasingly important in the field of medical and healthcare applications.[35] Data collected through cohort studies are often censored, heterogeneous, or missing, posing significant challenges for traditional statistical methods used in survival analysis. While the Cox proportional hazards model is specifically designed to handle censored data, it is not well-suited for analyzing large-scale or high-dimensional datasets. In contrast, ML models have shown greater predictive accuracy in time-to-event data and offer a robust alternative for handling censored, large-scale, and heterogeneous datasets.[36] Several studies have demonstrated the effectiveness of ANNs and other ML approaches in survival analysis within health sciences.[37-39]

In this paper, the ML approach (such as an ANN) plays an important role in predicting DFAs, more so than the traditional statistical model. The ML model constructs a complex, non-linear prediction model, providing a more reliable technique to predict survival time than traditional statistical methods. Moreover, randomization is a key tool in the ML approach that controls the effect of confounders. At present, the ML model is based on randomization, and during the learning period, the randomization technique controls the confounding effects.[40]

A previous study using the ML approach identified age, gender, BMI, duration of diabetes, smoking habits, and diabetes complications (retinopathy and nephropathy) as key predictors for DFAs. The overall accuracy of the model was 96%, and these are similar strong predictors to the output of the current study’s 5-year prediction model for DFAs.[41] In medical frameworks such as medical imaging, ML model plays an important role in the early detection of DFUs and reduces the risk of DFAs and mortality of DFU patients.[42,43]

In the present study, the accuracy of the statistical survival model is compared with that of the neural network ML model. The accuracy of the statistical model is lower than that of the neural network ML model. The ML model provides a more flexible approach to predicting survival time compared to the statistical model. Moreover, the ML approach is better at controlling confounding variables as compared to the statistical model.

Strengths, limitations, and future direction

This was the first prospective study in Pakistan to predict 5-year DFAs using survival analysis and ML methods. The accuracy of the ML method was higher than that of the traditional statistical method. The findings of the current study provide empirical evidence for predicting DFAs in patients diagnosed with DFUs. The study also offers future perspectives for following DFU patients and developing a 10-year ML model to predict DFAs and mortality in DFU patients.

This was a single-center study conducted at Dow University Hospital, Karachi, Pakistan. Therefore, selection bias may have occurred. In the future, the study will be conducted in collaboration with various major hospitals in Karachi. R programming software played a significant role in the development of the ML model and in addressing the unequal distribution of results. In R software, the Synthetic Minority Over-Sampling Technique (SMOTE) can be used to control the effect of unequal distribution of the results.

One of the major limitations of this study was the small sample size, as there were no external funds available for the study. Increasing the sample size would make the results more reliable and improve the accuracy of the prediction model. The ML model heavily depends on the quality and quantity of the input data. Unfortunately, the data for DFUs contained missing and inconsistent values, which introduced bias in the development of the ML model.

CONCLUSION

The complications of DFUs have been increasing exponentially worldwide, particularly in low- to middle-income countries such as Pakistan, India, and Bangladesh. The ML approach presents a significant opportunity for the early detection of DFUs and the prediction of DFAs.

A well-trained ML model can assist clinicians and surgeons in making precise diagnostic and therapeutic recommendations without requiring extensive datasets. Moreover, ML-based solutions can provide valuable support to medical specialists, particularly in resource-limited settings, such as underdeveloped countries.

Current findings suggest that the ML model offers a more accurate prediction as compared to a traditional statistical model. It is possible to develop predictive models for 5-year foot amputation or mortality with high accuracy using noninvasive and cost-effective clinical and biological predictors.

This study identifies three key strategies for effective foot infection management: Local foot examinations, identification of risk factors, and timely assessment of comorbidities. These strategies can be efficiently implemented within a multidisciplinary team framework.

Acknowledgments:

The author’s special thanks to all diabetic patients who agreed to participate in the current study and all others who helped us to complete the research.

Authors’ contributions:

SF: Conceived the idea and designed the project; SMA: Collected the data; SF and SMA: Performed the data analysis; SMA: Wrote the main manuscript text and prepared the tables and figures; SF: Reviewed and provided feedback on the manuscript. The current study was a part of the PhD research thesis of SMA, and SF supervised the overall research thesis. Both authors have read and approved the final manuscript.

Ethical approval:

The study was approved by Dow University of Health Sciences Institutional Review Board (IRB- 3672/DUHS/ Approval/2024/335), dated 24th October 2024.

Declaration of patient consent:

Patient’s consent is not required as patient’s identity is not disclosed or compromised.

Financial support and sponsorship:

Nil.

Conflicts of interest:

There are no conflicts of interest.

Availability of data and material:

Data and supplementary materials will be available on request.

References

  1. , . Epidemiology of diabetes mellitus in Pakistan: A systematic review protocol. BMJ Open. 2024;14:e079513.
    [CrossRef] [PubMed] [Google Scholar]
  2. , . Prevalence of gestational diabetes mellitus in Pakistan: A systematic review and meta-analysis. BMC Pregnancy Childbirth. 2024;24:108.
    [CrossRef] [PubMed] [Google Scholar]
  3. , , , . Diabetic foot ulcers: Classification, risk factors and management. World J Diabetes. 2022;13:1049-65.
    [CrossRef] [PubMed] [Google Scholar]
  4. , , , . Diabetic foot ulcers: A review. JAMA. 2023;330:62-75.
    [CrossRef] [PubMed] [Google Scholar]
  5. , , , , , . The prevalence of foot ulcers in diabetic patients in Pakistan: A systematic review and meta-analysis. Front Public Health. 2022;10:1017201.
    [CrossRef] [Google Scholar]
  6. , , , . The diabetic foot as a proxy for cardiovascular events and mortality review. Curr Atheroscler Rep. 2017;19:44.
    [CrossRef] [PubMed] [Google Scholar]
  7. , , , , , , et al. Survival prediction in diabetic foot ulcers: A machine learning approach. J Clin Med. 2023;12:5816.
    [CrossRef] [PubMed] [Google Scholar]
  8. , , , , , , et al. Patients with healed diabetic foot ulcer represent a cohort at highest risk for future fatal events. Sci Rep. 2019;9:10325.
    [CrossRef] [PubMed] [Google Scholar]
  9. . Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl Soft Comput. 2014;21:286-97.
    [CrossRef] [Google Scholar]
  10. , , , , , , et al. Predicting multimorbidity using Saudi health indicators (Sharik) nationwide data: Statistical and machine learning approach. Healthcare (Basel). 2023;11:2176.
    [CrossRef] [PubMed] [Google Scholar]
  11. , , , . Association of diabetic foot ulcer and death in a population-based cohort from the United Kingdom. Diabet Med. 2016;33:1493-8.
    [CrossRef] [PubMed] [Google Scholar]
  12. , , , , , , et al. History of foot ulcer increases mortality among individuals with diabetes: Ten-year follow-up of the Nord-Trondelag health study, Norway. Diabetes Care. 2009;32:2193-9.
    [CrossRef] [PubMed] [Google Scholar]
  13. , , , , , , et al. Long-term prognosis of diabetic foot patients and their limbs: Amputation and death over the course of a decade. Diabetes Care. 2012;35:2021-7.
    [CrossRef] [PubMed] [Google Scholar]
  14. , , , , , , et al. Evaluation of the impact of chiropodist care in the secondary prevention of foot ulcerations in diabetic subjects. Diabetes Care. 2003;26:1691-5.
    [CrossRef] [PubMed] [Google Scholar]
  15. , , , . Improved survival of diabetic foot ulcer patients 1995-2008: Possible impact of aggressive cardiovascular risk management. Diabetes Care. 2008;31:2143-7.
    [CrossRef] [PubMed] [Google Scholar]
  16. , , , , , , et al. Machine learning in clinical and epidemiological research: Isn't it time for biostatisticians to work on it? Epidemiol Biostat Public Health. 2019;16:e13245-1.
    [CrossRef] [Google Scholar]
  17. , , , , . Comparison of conventional statistical methods with machine learning in medicine: Diagnosis, drug development, and treatment. Medicina (Kaunas). 2020;56:455.
    [CrossRef] [PubMed] [Google Scholar]
  18. , , , . Survival analysis part I: Basic concepts and first analyses. Br J Cancer. 2003;89:232-8.
    [CrossRef] [Google Scholar]
  19. , . How to clearly and accurately report odds ratio and hazard ratio in diagnostic research studies? Korean J Radiol. 2022;23:777-84.
    [CrossRef] [PubMed] [Google Scholar]
  20. , , . Survival analysis-part 2: Cox proportional hazards model. Indian J Thorac Cardiovasc Surg. 2021;37:229-33.
    [CrossRef] [PubMed] [Google Scholar]
  21. , , , , , . Methods to analyse time-to-event data: The Kaplan-Meier survival curve. Oxid Med Cell Longev. 2021;2021:2290120.
    [CrossRef] [PubMed] [Google Scholar]
  22. . Application of neural networks in the medical field. Appl Neural Netw Field. 2023;14:69-81.
    [CrossRef] [Google Scholar]
  23. , , , , , , et al. Application of artificial neural networks in detection and diagnosis of gastrointestinal and liver tumors. World J Clin Cases. 2020;8:3971-7.
    [CrossRef] [PubMed] [Google Scholar]
  24. , . Predictors of mortality among patients with type 2 diabetes in Jordan. BMC Endocr Disord. 2021;21:200.
    [CrossRef] [PubMed] [Google Scholar]
  25. , , , . Predicting 5-and 10-year mortality risk in older adults with diabetes. Diabetes Care. 2020;43:1724-31.
    [CrossRef] [PubMed] [Google Scholar]
  26. , , . Mortality in patients with diabetic foot ulcers: Causes, risk factors, and their association with evolution and severity of ulcer. J Clin Med. 2020;9:3009.
    [CrossRef] [PubMed] [Google Scholar]
  27. , , , , , , et al. Evaluating classification systems of diabetic foot ulcer severity: A 12-year retrospective study on factors impacting survival. Healthcare (Basel). 2023;11:2077.
    [CrossRef] [PubMed] [Google Scholar]
  28. . The social implications of covid-19 for nursing students. J High Educ Theory Pract. 2022;22:100-15.
    [CrossRef] [Google Scholar]
  29. , , , , , . Diabetic foot disease and the risk of major clinical outcomes. Diabetes Res Clin Pract. 2023;202:110778.
    [CrossRef] [PubMed] [Google Scholar]
  30. , , , , , , et al. Diabetic peripheral neuropathy as a predictor of asymptomatic myocardial ischemia in type 2 diabetes mellitus: A cross-sectional study. Adv Ther. 2016;33:1840-7.
    [CrossRef] [PubMed] [Google Scholar]
  31. , , . Ischemic heart disease and its risk factors in patients with diabetic foot ulcers: A systematic review and meta-analysis. Diabetes Metab Syndr. 2022;16:102414.
    [CrossRef] [PubMed] [Google Scholar]
  32. , , , , , , et al. The Association of ulceration of the foot with cardiovascular and all-cause mortality in patients with diabetes: A meta-analysis. Diabetologia. 2012;55:2906-12.
    [CrossRef] [PubMed] [Google Scholar]
  33. , , , . The impact of foot ulceration and amputation on mortality in diabetic patients. I: From ulceration to death, a systematic review. Int Wound J. 2016;13:892-903.
    [CrossRef] [PubMed] [Google Scholar]
  34. , , , , , . Major amputation profoundly increases mortality in patients with diabetic foot infection. Front Surg. 2021;8:655902.
    [CrossRef] [PubMed] [Google Scholar]
  35. , , , , . Interpretable machine learning for time-to-event prediction in medicine and healthcare. Artif Intell Med. 2025;159:103026.
    [CrossRef] [PubMed] [Google Scholar]
  36. , , , , , , et al. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Sci Rep. 2020;10:20410.
    [CrossRef] [PubMed] [Google Scholar]
  37. . Diagnosis of cognitive impairment using multiple data modalities Australia: University of New South Wales. [Doctoral Dissertation] 2022 Available from:
    [CrossRef] [Google Scholar]
  38. , , , . A wide and deep neural network for survival analysis from anatomical shape and tabular clinical data. In machine learning and knowledge discovery in databases: International workshops of ECML PKDD 201. In: Part. I. Proceedings. Würzburg, Germany: Springer International Publishing; . p. :453-64.
    [CrossRef] [Google Scholar]
  39. , , , , , , et al. Prediction of conversion to Alzheimer's disease using deep survival analysis of MRI images. Brain Commun. 2020;2:fcaa057.
    [CrossRef] [PubMed] [Google Scholar]
  40. , , , . Diabetic foot ulcers risk prediction in patients with type 2 diabetes using classifier based on associations rule mining. Sci Rep. 2024;14:635.
    [CrossRef] [PubMed] [Google Scholar]
  41. , . Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023;388:1201-8.
    [CrossRef] [PubMed] [Google Scholar]
  42. , , . Artificial intelligence and machine learning in respiratory medicine. Expert Rev Respir Med. 2020;14:559-64.
    [CrossRef] [PubMed] [Google Scholar]
  43. , , , , . Machine learning and artificial intelligence in pharmaceutical research and development: A review. AAPS J. 2022;24:19.
    [CrossRef] [PubMed] [Google Scholar]
Show Sections