0081/2025 - Dynamic Evaluation of a COVID-19 Death Prediction Model Using Extreme Gradient Boosting Predictive Model
Modelo preditivo de avaliação dinâmica de morte por COVID-19 usando Extreme Gradient Boosting
Autor:
• José Carlos Prado Junior - Prado Junior, J.C - <jcpradojr@gmail.com>ORCID: orcid.org/0000-0002-8438-0527
Coautor(es):
• Alexandre Evsukoff - Evsukoff, A - <evsukoff@gmail.com>ORCID: orcid.org/0000-0002-7828-0124
• Roberto de Andrade Medronho - Medronho, R.A - <medronho@medicina.ufrj.br>
ORCID: https://orcid.org/0000-0003-4073-3930
Resumo:
Background: Coronavirus disease 2019 (COVID-19), which emerged in December 2019, has become a significant global public health issue. The COVID-19 pandemic has evolved dynamically with the emergence of new variants and increasing of vaccination coverage. Given the high fatality rate of severe COVID-19, disease severity prediction models must incorporate these temporal variations. In this study, we aimed to develop a model to predict COVID-19 mortality in hospitalized patients.Methods: Extreme Gradient Boost model was used to predict COVID-19 mortality upon hospital admission, and the results were correlated with laboratory test results, vaccination status, comorbidities, and clinical signs and symptoms at the time of admission. Clinical dataelectronic medical records, vaccination databases, and severe acute respiratory syndrome notifications were used.
Results: The XGBoost model performed best, with an area under the curve (AUC) of 96,4% at epidemiological week 53 of 2020. The most significant variables for the model were body temperature, blood pressure, respiratory rate, heart rate, urea, magnesium, sodium and C reactive protein levels.
Conclusions: Our study identified key clinical and laboratory variables for predicting COVID-19 mortality. Additionally, we demonstrated how the performance of the models varied throughout the pandemic.
Palavras-chave:
Predictive modeling, COVID-19, Mortality, Machine learning, XGBoost algorithmAbstract:
Contexto: A doença coronavírus 2019 (COVID-19), que surgiu em dezembro de 2019, tornou-se um problema significativo de saúde pública global. A pandemia de COVID-19 evoluiu de forma dinâmica com o surgimento de novas variantes e o aumento da cobertura vacinal. Dada a alta taxa de mortalidade da COVID-19 grave, os modelos de previsão de gravidade da doença precisam incorporar essas variações temporais. Este estudo teve como objetivo desenvolver um modelo para prever a mortalidade por COVID-19 em pacientes hospitalizados.Métodos: O modelo Extreme Gradient Boost (XGBoost) foi utilizado para prever a mortalidade por COVID-19 na admissão hospitalar, e os resultados foram correlacionados com os resultados de exames laboratoriais, status vacinal, comorbidades e sinais e sintomas clínicos no momento da admissão. Dados clínicos de prontuários eletrônicos, bancos de dados de vacinação e notificações de síndrome respiratória aguda grave foram utilizados.
Resultados: O modelo XGBoost obteve o melhor desempenho, com uma área sob a curva (AUC) de 96,4% na semana epidemiológica 53 de 2020. As variáveis mais significativas para o modelo foram temperatura corporal, pressão arterial, taxa respiratória, frequência cardíaca, ureia, magnésio, níveis de sódio e proteína C reativa.
Conclusões: Nosso estudo identificou variáveis clínicas e laboratoriais chave para a previsão de mortalidade por COVID-19. Além disso, demonstramos como o desempenho dos modelos variou ao longo da pandemia
Keywords:
Modelagem preditiva, COVID-19, Mortalidade, Aprendizado de máquina, Algoritmo XGBoostConteúdo:
The COVID-19 pandemic caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) began in Wuhan, Hubei Province, China, in December 2019 [1], and quickly spread worldwide [2]. The World Health Organization (WHO) declared COVID-19 a public health emergency of international concern on January 30, 2020, and a pandemic on March 11, 2020 [2].
In Latin America, the first case was reported in Brazil on February 25, 2020 [3]. As of December 2022, Brazil had recorded 35,531,716 COVID-19 cases, resulting in 690,677 deaths, with a mortality rate of 324.93 per 100,000 inhabitants [4].
During the pandemic, scientists worldwide sought to understand the epidemiological, diagnostic, and prognostic aspects of this disease. In this regard, artificial intelligence (AI) methodologies and other technologies were widely utilized[5,6], including computational tools and models. These methods can estimate the number of infections and the severity of symptoms, potentially aiding future disease management and preventing significant losses of human life. The introduction of predictive models in health research using machine learning techniques is a rapidly growing field in the scientific community[7-10].
Artificial intelligence (AI) has been applied employed in developing computational models for the diagnosis of COVID-19 including screening [11,12], laboratory tests [13], genomic sequencing [14], and imaging [15–17]. Additionally, several studies have been published on COVID-19 prognosis [18–20].
Predicting unfavorable outcomes can be clinically useful in optimizing diagnostic and prognostic timelines, as well as in organizing health services to optimize hospital resources. Despite the global rollout of vaccines, new strains of the virus continue to emerge, with lower lethality compared to that observed during the pandemic. However, it remains crucial to identify prognostic factors and develop predictive models to estimate survival probabilities and better tailor treatments, particularly for patients in critical clinical condition.
Demographic and clinical characteristics have demonstrated predictive power [21,22]. Several studies have highlighted the significance of certain laboratory markers, such as lactate dehydrogenase (LDH), C-reactive protein (CRP), aspartate aminotransferase (AST), urea, creatinine, and D-dimer, in distinguishing between patients who survived and those who died from COVID-19 [23,24].
COVID-19 disease progression is dynamic. Static predictions based on single-time-point observations may not provide sufficient information on the dynamic behavior of individual patient risk over time [25].
Given the challenges posed by the COVID-19 pandemic, such as unfamiliarity with clinical and epidemiological aspects, the emergence of new variants, and the need for widespread vaccine deployment, predictive models must be adaptable to capture these evolving features over time.
In this study, we aimed to train a model to predict COVID-19 mortality in hospitalized patients. The findings of this study will aid in health resource planning and enable clinicians to predict COVID-19 severity based on hospital admission data.
Materials and Methods
Study design
A cross-sectional study was carried out using machine learning models to predict the mortality of patients admitted due to COVID-19 at the Ronaldo Gazolla Municipal Hospital in the Municipality of Rio de Janeiro during the pandemic period from August 6, 2020 to November 6, 2021. based on socioeconomic and clinical variables from the emergency and evolutionary data upon hospitalization.
Data
Anonymized longitudinal clinical data from 10,384 hospital admissions that occurred between August 25, 2020, and November 6, 2021, at the Ronaldo Gazolla Municipal Hospital (HMRG) in Rio de Janeiro were used. The HMRG emerged as the country's largest COVID-19 treatment center during the pandemic, with 400 beds dedicated solely to treating COVID-19, comprising 160 ward beds and 240 intensive care unit (ICU) beds.
The dependent variable was the hospitalization outcome (death or recovery).
The independent variables included demographic characteristics, comorbidities, vaccination status, clinical data upon admission, progression of vital signs, and the evolution of test results.
Data manipulation was conducted using the R statistical software, version 4.2.1 (The R Foundation for Statistical Computing) [26]. These techniques include file importation, database merging, data visualization, variable selection and encoding, and numerical variable categorization [27]. The anonymized data and R code will be available for review.
Clinical Data
Clinical admission data, including initial signs and symptoms and disease severity, were collected upon hospital admission. Data were accessed in 13/12/2023 for research purposes.
Laboratory test results, including routine blood test results, inflammatory biomarker data, liver function test results, kidney function test results and coagulation test results, were added to the database. These data were collected upon admission and throughout hospitalization.
Vital signs at the time of hospital admission were extracted from electronic health records and included heart rate (normal range: 50-100 beats/minute), respiratory rate (normal range: 12-20 breaths/minute), diastolic blood pressure (optimal: <80 mmHg, normal: 80-84 mmHg, prehypertension: 85-89 mmHg, stage 1 hypertension: 90-99 mmHg, stage 2 hypertension: 100-109 mmHg, stage 3 hypertension: >=110 mmHg), systolic blood pressure (optimal: <120 mmHg, normal: 120-129 mmHg, prehypertension: 130-139 mmHg, stage 1 hypertension: 140-159 mmHg, stage 2 hypertension: 160-179 mmHg, stage 4 hypertension >=180 mmHg), and temperature (normal <37.8).
Epidemiological Data
A linkage was performed between this clinical database and the Severe Acute Respiratory Syndrome surveillance database (SIVEP-Gripe) of the Ministry of Health [28]. The demographic variables and comorbidities from the SIVEP-Gripe notification form were also collected. Data were accessed in 06/12/2023.
The demographic variables included age (<20 years, 20-39 years, 40-59 years, 60-79 years, 80 years or more), sex (male or female), education level (elementary, high school, secondary, higher education), and race/ethnicity (white, Asian, indigenous, brown, and black).
The health conditions were categorized as yes/no and included pregnancy, heart disease, diabetes, Down syndrome, liver disease, asthma, neurological disease, respiratory disease, immune disease, kidney disease, and obesity.
Symptoms at the time of hospital admission were obtained from the SIVEP-Gripe notification form and included fever (yes/no), cough (yes/no), sore throat (yes/no), dyspnea (yes/no), diarrhea (yes/no), vomiting (yes/no), loss of taste (yes/no), loss of smell (yes/no), myalgia (yes/no), fatigue (yes/no), abdominal pain (yes/no), asthenia (yes/no), and headache (yes/no).
Vaccination Data
To correlate the vaccination data, the database from the SIPNI-National Immunization Program Information System of the Health Surveillance Secretariat of the Ministry of Health was used [29]. Data were accessed in 06/12/2023. The vaccination application date was correlated with the outcome date (death or recovery) to categorize patients who received two or more vaccine doses (yes/no) or three or more vaccine doses (yes/no) more than 15 days from the date of service.
Statistical analysis
Descriptive statistics for all variables were obtained using the R statistical package version 4.2.1 (The R Foundation of Statistical Computing) [26]. For comparisons between patients who either succumbed to or recovered from the disease, continuous variables were summarized as the means (± standard deviation), and either t tests or Mann?Whitney U tests were used, depending on the data distribution. Categorical variables are presented as the frequency (n) and proportion (%) and were compared using the chi-square test. The "funModeling" package in R [30] was used to analyze missing values. Multivariate analysis was also conducted using logistic regression. A p value ? 0.05 was considered to indicate statistical significance.
Preprocessing
Preprocessing encompasses techniques for adding, removing, or transforming variables; the latter is also known as "feature engineering". This is a crucial stage that can directly impact the performance of models [31]. The "recipes" package in R [32] was used for preprocessing, where steps were defined; subsequently, the "prepare()" function was used to set up the recipe, followed by the "bake()" function to execute the transformations.
Normalization and Treatment of Asymmetries
Numerical variables were converted into categorical variables (normal/altered). Thus, there was no need for variable normalization or transformation to address skewness.
Outlier Treatment
The numerical variables were categorized as normal, altered, or null ("missing"). Extreme outlier values were classified as null. These outliers were identified using the interquartile range (IQR), calculated as the difference between the 75th and 25th percentiles. The suggested lower limit was set at ?lim?_low=p25-1.5x(p75-p25), and the upper limit was set at ?lim?_up=p75+1.5x(p75-p25). The limits for each variable were manually reviewed, and values outside these ranges were recoded as null ("missing").
Handling of Null (Missing) Values
For handling null values, the chosen approach was exclude variables with >70% null values. The Categorical variables with <70% null values were assigned to the "unknown" category in place of null values. The missing variables frequency is shown in Supplementary Table.
Class Balancing
The class distributions were 61.3% (surviving) and 38.7% (deceasing). In this case, no treatment was applied for class imbalance.
Additional transformations (Computing)
Further transformations were conducted, such as converting categorical variables into factor-type variables. Additionally, for each categorical variable, categories with a low frequency were reclassified as "other". Finally, for categorical variables, dummy variables (yes/no) were created, enhancing the interpretability of the model.
Population Training and Testing Division and Cross-Validation
The "rsample" package in R [33] was used to split the population into a training set, comprising two-thirds of the sample, and a test set, comprising the remaining third. Cross-validation was conducted on the training sample using a nonreplacement "bootstrap" technique, with five layers of validation.
Machine Learning Models
Machine learning models such as Logistic Regression, Support Vector Machines, Random Forestst, Decision Tree, and Grad. Boost (XGBoost), Bart and MLP were employed. The XGBoost model demonstrated the highest accuracy (Supplementary Table).
XGBoost Model
The "gradient boosting" model, a decision tree-based machine learning model, was first proposed by Friedman [34] in 2001 and has since been enhanced by various authors. Recently, an optimized technique for supervised problems, "extreme gradient boosting" (XGBoost), was developed by Chen Tianqi and Carlos Gestrin [35]. In supervised learning, a mathematical framework is typically used where the prediction y_i is derived from the input predictor variables x_i, as exemplified in the linear expression y ?_i=?_j???_j x_ij ?. The parameter ? represents the undetermined portion that needs to be learned from the data.
Sliding Window Predictive Model
To determine the accuracy of the model over time, the epidemiological week variable was introduced based on the admission date. A loop was created where the training and testing sample was defined by sliding the epidemiological weeks vector. In this sliding model, variable selection was dynamically conducted, eliminating variables with more than 70% null value observations. In each analysis, a two-week training period followed by a one-week testing period was considered to maintain an approximate 2/3 sample proportion for training. The accuracy of the results and the area under the receiver operating characteristic (ROC) curve (AUC) were calculated and tabulated.auc
Subsequently, the historical series of reported COVID-19 cases and deaths in the city of Rio de Janeiro was integrated into the table for visual comparison of the epidemiological progression of COVID-19 with the model's performance evolution (Supplementary Figure 1). First, baseline and laboratory tests information during hospitalization were used and the sliding window was used to dynamically include variables across the pandemic of COVID-19.
Results
Population characteristics
The average age of the 10,384 patients was 65.6 (?15.7) years, with 46.7% being male. The average duration of hospitalization was 11.5 (?9.4) days, and 4,032 (38.7%) patients succumbed to the disease in the hospital. A comparison of demographic and health data
conditions, symptoms at admission, and vaccination status between deceased and surviving patients are presented in Table 1.
The demographic information, health conditions, signs and symptoms at hospital admission, vaccine status, baseline clinical laboratory test results and mortality outcomes were collected from medical records in the discovery dataset. BP: blood pressure, CK-MB: creatine phosphokinase-MB, INR: international normalized ratio, PAT: partial activated thromboplastin, LDL: low-density lipoprotein, GGT: gamma-glutamyl transpeptidase, WBC: white blood cell
The multivariate analysis of the variables is shown in Table 2. According to our multivariate analysis, death from COVID-19 was associated with urea level (OR 1.02, 95% CI 1.00-1.04).
The following variables were collected from medical records in the discovery dataset: demographic information, health conditions, signs and symptoms at hospital admission, vaccine status, baseline clinical laboratory test results and mortality outcomes.
Sliding Window Predictive Model
The sliding predictive model across epidemiological weeks, as demonstrated in the present study, can be found in Figure 1 and in supplementary table ”Timeline of Performance Metrics for XgBoost Algorithm for the test set according to the Epidemiologic Week”. These results are related to the test dataset. This time-series predictive model was constructed using the XGBoost model based on patient cases from Ronaldo Gazolla Hospital in Rio de Janeiro, Brazil. The model performance was evaluated using the area under the curve (AUC). It is still possible to visually compare the evolution of the number of cases, hospital admissions, and deaths in the city of Rio de Janeiro. The model’s performance declines as the epidemic subsides, coinciding with an increase in the percentage of the vaccinated population.
The highest accuracy (89,8%) and AUC (96,8%) were observed at epidemiological week 53 of 2020 (Figure 2). The sensitivity was 85,1%, specificity 92,4%, positive predictive value 86,3%, negative predictive value 91,7%, F1-score 85,7% and Brier score 11,7%, precision 86,3%, recall 85,1% e kappa 77,7%.
The significance of the associations of the top ten variables with death, calculated by the XGBoost method in the current epidemiological week, is illustrated in Figure 3.
The most significant variables for the model were body temperature, blood pressure, respiratory rate, heart rate, urea, magnesium, sodium and C reactive protein levels.
Discussion
Main Findings
Our study makes two main contributions. The first is the COVID-19 mortality prediction model based on hospital admission data, which include comorbidities, signs and symptoms at admission, vaccination status at admission, and laboratory results. Laboratory tests for COVID-19 patients during hospitalization are often repeated at irregular intervals. Doctors are interested in predicting future outcomes based on the available information. The tool proposed in this study allows for the prediction of hospital death with an accuracy of up to 96% based on these variables.
In our study, we employed the robust XGBoost model, which is increasingly used in the health field. The accuracy of our predictive model was satisfactory (96,8% AUC in the final validation during epidemiological week 53/2020). The superior performance of the XGBoost model, which is capable of handling more complex models, is evident when compared to that of the logistic regression model (AUC of 84,8%) [35].
The study's second contribution lies in assessing the model's performance throughout the epidemic's progression, correlating it with the incidence rate, mortality rate, and population's vaccination coverage. The sliding window approach revealed that the model's accuracy fluctuated throughout the COVID-19 epidemic. The model's performance declines as the epidemic subsides. This effect may be a result of the fixed two-week window. With fewer cases, there are fewer training records, which may impact the performance of the model. Additionally, the model's decrease in accuracy coincides with the increase in the proportion of the population vaccinated with two or more doses.
Comparison with other studies
Several recent studies have used dynamic analysis to predict the severity of COVID-19. Chen et al.[36] used a model association technique and historical regression. In this study, biomarkers were used to predict death. Like in our study, the most important variables in the model were altered lymphocytes and urea. In our study, we incorporated not only biomarkers but also clinical signs and symptoms, vaccination data, and comorbidities. Furthermore, using the sliding window approach, we compared the model performance throughout the epidemic period.
In this study, elevated temperature was the most significant parameter for predicting COVID-19 mortality, which was consistent with the findings of other studies [37-38].
In 2020, Wu et al conducted a study in which death was shown to be associated with advanced age (OR 7.32, 95% CI 5.42-9.89), elevated temperature (OR 1.28, 95% CI 1.09-1.49), and lymphocytopenia (OR 1.26, 95% CI 1.06-1.50) [37].
Advanced age was a significant variable in predicting COVID-19 mortality, aligning with the findings of a meta-analysis conducted by Du et al. using traditional logistic regression [39]. In another study using decision tree analysis, the authors also found an association with advanced age [40].
Numerous studies have established correlations between laboratory biomarkers and clinical variables and the prognosis of COVID-19 [36,40,41]. However, there is limited research on the dynamic prediction of these variables [42,43].
Highlights and Limitations
This study has several strengths. First, a large dataset was used, incorporating data from 10,493 COVID-19 patients admitted to Ronaldo Gazolla Hospital, one of the largest COVID-19 treatment facilities in Brazil.
Second, we employed an advanced machine learning technique, XGBoost, to identify significant biomarkers. This method fully uses longitudinal biomarkers and inherits the benefits of the random forest algorithm, ensuring robust stability and accuracy for subsequent prediction.
This study has several limitations. First, a number of patients had missing laboratory test data. Second, the mechanism of time-variant dynamic effects requires further clinical investigation. Third, the predictive model in this study was trained and validated using a population from a public hospital in the city of Rio de Janeiro, Brazil. Hence, caution should be exercised when generalizing these findings to other populations. Furthermore, the database included historical data up to November 2021. Since then, new variants have emerged and become dominant, along with an increase in the vaccination rate in the Brazilian population during the second half of 2021 and the beginning of 2022.
While we have acknowledged the potential limitation of our sliding window approach, particularly regarding the fixed window length, we recognize that this may affect the model's performance, especially as the epidemic progressed and case numbers fluctuated. To provide a clearer understanding of this impact, we propose a comparison using different sliding window lengths. By analyzing and presenting the results of the model's performance across varying window sizes, readers will be able to more directly assess the influence of window length on the model's accuracy and reliability. This additional analysis would offer a more comprehensive view of the sliding window approach’s behavior under different settings and help to clarify its limitations.
We acknowledge that our study primarily focuses on the data series up to November 2021, and we agree with the reviewer that the impact of different viral variants, which emerged after this period, is an important factor that requires further discussion. Although our study does not directly analyze data from later waves, we recognize that new variants, such as the Delta and Omicron variants, significantly altered the trajectory of the pandemic, especially in terms of transmission rates, severity, and vaccine effectiveness.
In future research, it would be crucial to expand on how these variants influenced COVID-19 outcomes in the population we studied. Variations in viral characteristics, such as increased transmissibility or altered pathogenicity, could have directly affected the predictive accuracy of our model. Moreover, as the pandemic progressed, the dynamics of the virus shifted, with the rise in vaccination rates and the emergence of new strains influencing the hospital admissions and mortality outcomes.
Implications and future research
Given the variation in the COVID-19 pandemic with emerging new variants and the progression of vaccination coverage, further studies are needed to understand the clinical and epidemiological behavior of the COVID-19 pandemic.
Conclusion
In conclusion, our study illustrates how time-varying clinical patterns, biomarkers, vaccinations, and comorbidities can dynamically predict individual outcomes and inform timely and accurate treatment recommendations. Our final predictive model achieved satisfactory AUC results. Our findings could lead physicians to adapt appropriate treatments in a timely manner by monitoring shifts in readily available biomarkers. Given the variation in the COVID-19 pandemic with emerging new variants and the progression of vaccination coverage, further studies are needed to understand the clinical and epidemiological behavior of the COVID-19 pandemic.
Abbreviations
Area under the receiver operating characteristic curve (ROC AUC), aspartate aminotransferase (AST), artificial intelligence (AI), confidence interval (CI), Coronavirus Disease 2019 (COVID-19), c-reactive protein (CRP), extreme gradient boosting (XGBoost), intensive care unit (ICU), interquartile range (IQR), lactate dehydrogenase (LDH), Mattews correlation coefficient (MCC), National Immunization Program Information System of the Health Surveillance Secretariat of the Ministry of Health (SIPNI), National Notification Information System (SINAN), odds ratio (OR), Ronaldo Gazolla Municipal Hospital (HMRG), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Severe Acute Respiratory Syndrome surveillance database (SIVEP-Gripe), and World Health Organization (WHO).
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request. SIVEP-Gripe data are publicly available at https://opendatasus.saude.gov.br/dataset/srag-2021-a-2024. Our
analysis code and XCOVID-BR are available at https://github.com/iujo78/xcovidrio.
Acknowledgments
Not applicable.
References
[1] Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 2020;395:1054-62. doi: 10.1016/S0140-6736(20)30566-3.
[2] WHO. Situation Report - 209 - Coronavirus disease 2019. vol. 2019. 2020. doi: 10.1001/jama.2020.2633. doi: 10.1001/jama.2020.2633.
[3] Rodriguez-Morales AJ, Gallego V, Escalera-Antezana JP, Méndez CA, Zambrano LI, Franco-Paredes C. COVID-19 in Latin America: The implications of the first confirmed case in Brazil. Travel Med Infect Dis 2020;35. doi: 10.1016/j.tmaid.2020.101613.
[4] Johns Hopkins Coronavirus Resource Center. Mortality Analyses. Johns Hopkins Coronavirus Resource Center 2022. https://coronavirus.jhu.edu/data/mortality (accessed Dec 09, 2022).
[5] Albalawi U, Mustafa M. Current Artificial Intelligence (AI) Techniques, Challenges, and Approaches in Controlling and Fighting COVID-19: A Review. Int J Environ Res Public Health 2022;19:5901. doi: 10.3390/ijerph19105901.
[6] Mhlanga D. The Role of Artificial Intelligence and Machine Learning Amid the COVID-19 Pandemic: What Lessons Are We Learning on 4IR and the Sustainable Development Goals. Int J Environ Res Public Health 2022;19:1879. doi: 10.3390/ijerph19031879.
[7] Rasheed J, Jamil A, Hameed AA, Al-Turjman F, Rasheed A. COVID-19 in the Age of Artificial Intelligence: A Comprehensive Review. Interdiscip Sci 2021;13:153-75. doi: 10.1007/s12539-021-00431-w.
[8] Krass M, Henderson P, Mello MM, Studdert DM, Ho DE. How US law will evaluate artificial intelligence for covid-19. BMJ 2021:n234. doi: 10.1136/bmj.n234.
[9] Rodríguez-Rodríguez I, Rodríguez J-V, Shirvanizadeh N, Ortiz A, Pardo-Quiles D-J. Applications of Artificial Intelligence, Machine Learning, Big Data and the Internet of Things to the COVID-19 Pandemic: A Scientometric Review Using Text Mining. Int J Environ Res Public Health 2021;18:8578. doi: 10.3390/ijerph18168578.
[10] Yu M, Tang A, Brown K, Bouchakri R, St-Onge P, Wu S. Integrating artificial intelligence in bedside care for covid-19 and future pandemics. BMJ 2021:e068197. doi: 10.1136/bmj-2021-068197.
[11] Wu X, Hui H, Niu M, Li L, Wang L, He B. Deep learning-based multiview fusion model for screening 2019 novel coronavirus pneumonia: A multicenter study. Eur J Radiol 2020;128. doi: 10.1016/j.ejrad.2020.109041.
[12] Kang H, Xia L, Yan F, Wan Z, Shi F, Yuan H. Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent Multi-View Representation Learning. IEEE Trans Med Imaging 2020;39:2606-14. doi: 10.1109/TMI.2020.2992546.
[13] Mei X, Lee HC, Diao K yue, Huang M, Lin B, Liu C. Artificial intelligence-enabled rapid diagnosis of patients with COVID-19. Nat Med 2020;26:1224-8. doi: 10.1038/s41591-020-0931-3.
[14] Abubaker Bagabir S, Ibrahim NK, Abubaker Bagabir H, Hashem Ateeq R. Covid-19 and Artificial Intelligence: Genome sequencing, drug development and vaccine discovery. J Infect Public Health 2022;15:289-96. doi: 10.1016/j.jiph.2022.01.011.
[15] van Ginneken B. The Potential of Artificial Intelligence to Analyze Chest Radiographs for Signs of COVID-19 Pneumonia. Radiology 2021;299:E214-5. doi: 10.1148/radiol.2020204238.
[16] Wang R, Jiao Z, Yang L, Choi JW, Xiong Z, Halsey K. Artificial intelligence for prediction of COVID-19 progression using CT imaging and clinical data. Eur Radiol 2022;32:205-12. doi: 10.1007/s00330-021-08049-8.
[17] Cai S, Chen Y, Zhao S, He D, Li Y, Xiong N. Dynamic 3D radiomics analysis using artificial intelligence to assess the stage of COVID-19 on CT images. Eur Radiol 2022;32:4760-70. doi: 10.1007/s00330-021-08533-1.
[18] Lupei MI, Li D, Ingraham NE, Baum KD, Benson B, Puskarich M. A 12-hospital prospective evaluation of a clinical decision support prognostic algorithm based on logistic regression as a form of machine learning to facilitate decision making for patients with suspected COVID-19. PLoS One 2022;17:e0262193. doi: 10.1371/journal.pone.0262193.
[19] Elhazmi A, Al-Omari A, Sallam H, Mufti HN, Rabie AA, Alshahrani M. Machine learning decision tree algorithm role for predicting mortality in critically ill adult COVID-19 patients admitted to the ICU. J Infect Public Health 2022;15:826-34. doi: 10.1016/j.jiph.2022.06.008.
[20] He F, Page JH, Weinberg KR, Mishra A. The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study. J Med Internet Res 2022;24:e31549. doi: 10.2196/31549.
[21] Lami F, Elfadul M, Rashak H, Al Nsour M, Akhtar H, Khader Y. Risk Factors of COVID-19 Critical Outcomes in the Eastern Mediterranean Region: Multicountry Retrospective Study. JMIR Public Health Surveill 2022;8:e32831. doi: 10.2196/32831.
[22] Chen U-I, Xu H, Krause TM, Greenberg R, Dong X, Jiang X. Factors Associated With COVID-19 Death in the United States: Cohort Study. JMIR Public Health Surveill 2022;8:e29343. doi: 10.2196/29343.
[23] Qiu H, Wu J, Hong L, Luo Y, Song Q, Chen D. Clinical and epidemiological features of 36 children with coronavirus disease 2019 (COVID-19) in Zhejiang, China: an observational cohort study. Lancet Infect Dis 2020;20:689-96. doi: 10.1016/S1473-3099(20)30198-5.
[24] Gopalan N, Senthil S, Prabakar NL, Senguttuvan T, Bhaskar A, Jagannathan M. Predictors of mortality among hospitalized COVID-19 patients and risk score formulation for prioritizing tertiary care-An experience from South India. PLoS One 2022;17:e0263471. doi: 10.1371/journal.pone.0263471.
[25] Chen X, Gao W, Li J, You D, Yu Z, Zhang M. A predictive paradigm for COVID-19 prognosis based on the longitudinal measure of biomarkers. Brief Bioinform 2021;22. doi: 10.1093/bib/bbab206.
[26] R Core Team. R: A Language and Environment for Statistical Computing 2022.
[27] Wickham, H; Grolemund G. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 1st edition. Sevastopol: O'Reilly Media, Inc; 2016.
[28] Brazil. Ministry of Health. Health Surveillance Secretariat. OpenDatasus. SARS 2021 and 2022 - Severe Acute Respiratory Syndrome Database 2022. https://opendatasus.saude.gov.br/dataset/srag-2021-e-2022 (accessed July 24, 2022) .
[29] Brazil, Ministry of Health. Health Surveillance Secretariat. SI-PNI 2022. http://sipni.datasus.gov.br/si-pni-web/faces/apresentacaoSite.jsf.
[30] Casas P. Data Science Live Book: An intuitive and practical approach to data analysis, data preparation and machine learning, suitable for all ages. Pablo Adrian Casas; 2020.
[31] Kuhn M, Johnson K. Applied Predictive Modeling. New York, NY: Springer New York; 2013. doi: 10.1007/978-1-4614-6849-3. doi: 10.1007/978-1-4614-6849-3.
[32] Kuhn M, Wickham H. recipes: Preprocessing and Feature Engineering Steps for Modeling 2022.
[33] Silge J, Chow F, Kuhn M, Wickham H. rsample: General Resampling Infrastructure 2022.
[34] Friedman JH. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 2001;29. doi: 10.1214/aos/1013203451.
[35] Chen T, Guestrin C. XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016;13-17-August-2016:785-94. doi: 10.1145/2939672.2939785.
[36] Chen X, Gao W, Li J, You D, Yu Z, Zhang M. A predictive paradigm for COVID-19 prognosis based on the longitudinal measure of biomarkers. Brief Bioinform 2021;22. doi: 10.1093/bib/bbab206.
[37] Wu R, Ai S, Cai J, Zhang S, Qian Z (Min), Zhang Y. Predictive Model and Risk Factors for Case Fatality of COVID-19: A Cohort of 21,392 Cases in Hubei, China. The Innovation 2020;1:100022. doi: 10.1016/j.xinn.2020.100022.
[38] Wang JM, Liu W, Chen X, McRae MP, McDevitt, Fenyo D. Predictive modeling of morbidity and mortality in COVID-19 hospitalized patients and its clinical implications. J Med Internet Res 2021;23(7):e29514. doi: 10.2196/29514
[39] Du P, Li D, Wang A, Shen S, Ma Z, Li X. A Systematic Review and Meta-Analysis of Risk Factors Associated with Severity and Death in COVID-19 Patients. Canadian Journal of Infectious Diseases and Medical Microbiology 2021;2021:1-12. doi: 10.1155/2021/6660930.
[40] Elhazmi A, Al-Omari A, Sallam H, Mufti HN, Rabie AA, Alshahrani M, et al. Machine learning decision tree algorithm role for predicting mortality in critically ill adult COVID-19 patients admitted to the ICU. J Infect Public Health 2022;15:826-34. doi: 10.1016/j.jiph.2022.06.008.
[41] Ko H, Chung H, Kang WS, Park C, Kim DW, Kim SE. An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model. J Med Internet Res 2020;22(12):E25442 https://WwwJmirOrg/2020/12/E25442 2020;22:e25442. doi: 10.2196/25442.
[42] Hu C-A, Chen C-M, Fang Y-C, Liang S-J, Wang H-C, Fang W-F. Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicenter study in Taiwan. BMJ Open 2020;10:e033898. doi: 10.1136/bmjopen-2019-033898.
[43] Prajapati S, Swaraj A, Lalwani R, Narwal A, Verma K, Singh G. Comparison of Traditional and Hybrid Time Series Models for Forecasting COVID-19 Cases 2019.