Aortic dissection (AD) is a very dangerous cardiovascular disease. The main causes of AD are hypertension, Marfan syndrome and aortic atherosclerosis (1-3). The blood in the aortic cavity enters the arterial wall through the cracked intima that causes intimal separation from the medial membrane and hematoma. The hematoma mass is driven by high blood pressure and spreads along the long axis of the artery (4). AD usually results in high morbidity and mortality (5,6). Once AD is onset, it will quickly lead to death. The mortality rate within 24 hours after the onset is about 1% to 2%, 50% within 48 hours, and 60% to 70% within one week. Its five-year natural survival rate is only 10% to 15% (7-9). Due to the different locations in which tears can occur and the extent of the tear, patients’ symptoms and signs are complex and diverse, making the rate of misdiagnosis and missed diagnosis reach 30% to 40% (10,11). Many patients miss the best treatment period for these reasons. It has been reported that 10.6% of patients with AD are misdiagnosed as having acute coronary syndrome (ACS) on first diagnosis (12). About 1% to 2% of patients with AD may develop acute myocardial infarction (AMI); however, AD and AMI are completely different in terms of treatment (13). Once AD patients get the wrong treatment, such as antithrombotic, thrombolytic, or emergency CAG/PCI, which are methods to treat AMI, all are associated with poor prognosis and increased risk of death due to AD. Therefore, if a patient is misdiagnosed or a diagnosis is missed, the patient is likely to be unable to obtain a further accurate diagnosis, or timely and accurate treatment. In China, the overall treatment level of AD is still low; the rate of missed diagnosis, hospitalization and preoperative mortality is comparatively high, and the age of patients is getting younger (14). Moreover, the development of medical resources and technology in China is imbalanced. In primary and underdeveloped hospitals, the lack of medical facilities and experienced doctors means higher rates of misdiagnosis and missed diagnosis, as well as higher mortality. Thus, a simple and effective early screening method is very necessary.
With the development of information technology and the popularity of electronic medical records, machine learning is constantly being applied to medical diagnosis to improve diagnostic accuracy, provide early prediction, reduce doctor pressure, and inspection costs. For example, Dwivedi (15) used six machine learning algorithms to assist in the diagnosis of ischemic heart disease. The best performance of his study was logistic regression with accuracy, sensitivity and specificity of 85%, 89% and 81%, respectively. Gatuha and Jiang (16) used machine learning algorithms to diagnosis breast cancer, and their best result had an accuracy of 97%. Liu et al. (17) established machine learning models to predict embryonic development. All of these methods have achieved better results than traditional methods. Machine learning diagnostic methods for AD have also attracted people's attention. Huo et al. used the Bayesian network, Naive Bayes, decision tree J48 and SVM algorithms to classify AD emergency patients (18). Their study contained 492 samples, including 330 patients with AD and 162 patients misdiagnosed as AD, which means the misdiagnosis rate reached 33%, and the sample ratio between AD patients and non-patients was close to 2:1. However, the purpose of their research was to find the misdiagnosed patients from the patients diagnosed with AD, rather than to screen out high-risk groups. The sample size was small. Liu et al. (19) analyzed the performance of several machine learning models in AD screening, among which the SmoteBagging was the best, and the sensitivity reached 78.1%. Wu et al. (20) used the Random Forest model to investigate the risk of in-hospital rupture in type A AD.
The purpose of our study is to explore if the patients’ routine examination data can be used to establish a rapid early screening model to advise doctors or patients on whether further examination is required.
We present the following article in accordance with the TRIPOD reporting checklist (available at http://dx.doi.org/10.21037/atm-20-1475).
The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Board of Xiangya Hospital, Central South University (201502042). This study is a retrospective study, all data were desensitized data from hospital’s electronic medical records, which did not contain patient identification information, and the consent was waived.
The registry data of 53,213 inpatients in the Cardiovascular Department of Xiangya Hospital from January 2008 to December 2016 were analyzed. There were 802 AD patients in this database, and the rest were hospitalized patients with other cardiovascular diseases including viral myocarditis, myocardial infarction and coronary heart disease. The diagnosis of AD was mainly based on medical imaging methods: (I) The Computed Tomography (CT) image showed one or more torn aortic intima and both true and false cavities could be found on the aortic; a series of complications may have been seen based on AD leakage or rupture, such as pericardial, mediastinal and pleural effusions, blood accumulation or aortic valve regurgitation. (II) Magnetic Resonance Imaging (MRI) showed differentiable high-signal true and false cavity images; in the Field Echo (FE) sequence scan image, both the true cavity and the false cavity may have been shown as high signals, and the low signal inner diaphragm could be seen between them; (III) Contrast agent overflow or ejection from aortic incision could be seen in the Computed Tomography Angiography (CTA); the shunt signals of the contrast agent dividing into two cavities with the blood flow could be observed; low-density linear endometrial flap could be seen between the true cavity and the false cavity. (IV) Incision and differentiable cavities could be directly observed in the intima or meniscus of the aortic during aortic surgery and postmortem of patients.
AD is a relatively rare cardiovascular disease, so the proportion of AD patients to non-AD patients is low, and the distribution of the two types of samples is extremely imbalanced. The purpose of the study is to screen for patients with AD from a large number of possible patients. Therefore, a major challenge of machine learning research for this work is how to deal with the problem caused by imbalance. Traditional prediction methods focus on global accuracy, and the formula is as follows:
Where TP is the number of true positives; TN is the number of true negatives; P is the number of positive samples; N is the number of negative samples, and FP is the number of false positives.
According to the formula, when P>>N or N>>P, even if none of the samples’ predictions is correct, the accuracy can still be high. The imbalance causes the predictor to ignore the minority. This means that the predictor always prefers to discriminate some ambiguous samples into the majority class.
In view of the shortcomings of traditional algorithms, our study proposes an oversampling ensemble algorithm named Extreme Gradient Boosting Forest (XGBF). XGBF is an ensemble learning model which is composed of several XGBoost classifiers (21). The specific structure of the XGBF model is shown in Figure 1.
Since XGBoost has a fast convergence rate, the training speed of XGBF can be guaranteed. The training data entered into each XGBoost classifier were composed with some undersampled majority data and oversampled minority data. The oversampling operations included duplicating and Smote (22), so the learning model could get more information from the minority samples. The undersampling operation used non-replacement sampling to draw a certain amount of samples from the majority class set each time to make the distribution of positive and negative samples be similar. Finally, each weak classifier was enhanced by ensemble methods to achieve better predictions.
The XGBF algorithm combines the advantages of the oversampling, undersampling and ensemble methods so that the predictor can fully learn the characteristics of the samples and produce better results.
Cross-validation of predictors
We used seven-fold cross-validation to verify the stability of the classifier. Before the training and verification (cross-validation), we randomly divided the AD patient set and non-AD patient set into seven disjointed subsets of the same size. Then one AD subset and one non-AD subset were merged together to get seven new subsets. The training and testing procedures were repeated seven times. Each time, one of the seven subsets was picked as a test set, and the others were merged together as a training set. A total of 687 AD patients and 44,924 non-AD patients were included in six training sets, and the test data set included 115 AD patients and 7,487 non-AD patients. In order to avoid the accident of the experiment, this study evaluated the average of ten experiments. In different experiments, the training set and test set were randomly split, so they were different in each experiment.
Comparison methods and evaluation parameters
The experimental study applied four ensemble learning algorithms including AdaBoost, XGBoost, SmoteBagging and EasyEnsemble to compare with XGBF. The introduction of these algorithms is introduced in the Supplementary file 1. The computer configuration of these experiment was: 64-bit Windows10 OS, Python3.6, 16G RAM, and CPUi5-6500.
A confusion matrix, as shown in Table 1, was used to show some basic evaluation indicators. A sample can be divided into true positive cases (TP), false positive cases (FP), true negative cases (TN) and false negative cases (FN) according to the combination of its real category and prediction category. In our case, the positive sample refers to the minority class of patients with AD, and the negative sample refers to the majority class of patients with non-AD.
As we mentioned, the proportion of the positive and negative samples was extremely imbalanced. The traditional evaluation indicators, such as accuracy and positive predictive value (PPV), were no longer suitable. For example, there was a predictor that predicted all samples as majority class, while the accuracy was still high, this kind of predictor is meaningless in this study. The situation of PPV is similar. Even if we have a predictor that can accurately predict all minority classes and has a low false positive rate, it is still possible to get a lower PPV, because there are far more majority than minority classes, resulting in the number of FPs being greater than the number of TPs. Therefore, the evaluation indicators used in this study were sensitivity and specificity. After all, the purpose of screening is to find as many of the high-risk groups as possible. Compared with false positives, the risk of false negatives is higher.
Python3.6 software was used for statistical analysis of the data in this study. Measurement data are expressed as mean ± standard deviation. Count data are a ratio or percentage. The differences in count data between the two groups were compared by the chi-squared test. The differences in measurement data were compared using the two-independent-sample t test. P<0.05 was considered statistically significant. The diagnostic performance of the classifiers was described using sensitivity and specificity.
Our AD dataset was obtained from Xiangya Hospital of Central South University. All sample data were extracted from electronic medical records (EMR), including patients’ information documents, hospitalization records and laboratory medical records. The information contained in the documents included patients’ symptoms, habits, medical history, examination results, and diagnostic results. We recruited six undergraduates and one master’s student who is a professional in the cardiovascular field to annotate and extract data from the text. After that, we got a structured dataset through the work of data extraction (23). However, the dataset was still missing data. Some features with a missing rate of more than 30% were deleted. Then we used a hierarchical mean filling method to fill in the rest of the data and obtained the final dataset.
In the dataset, each sample contains 62 features, which came from the patients’ routine blood examinations, complete biochemical examinations, routine blood coagulation examinations, living habits and family genetic history. Some of these features have been found to be highly correlated with AD, such as D-dimer and Serum potassium (24,25). Through the t-test, we chose 42 features with P value less than 0.005 as shown in Table 2.
A total of 53,213 patients’ data were collected from 2008 to 2016. There were 802 patients diagnosed with AD, the imbalance ratio was about 1:65. In addition, in this dataset, the incidence of AD in men and women was about 2:1; the average age of patients suffering from AD was 56 years old; patients usually had hypotension or hypertension; 33% of the patients suffered from chest pain or abdominal pain.
The prediction performance of predictors
The experimental results are shown in Tables 3-8 include the average results of ten experiments. In each experiment, the test data set was different, but the size was the same; each included 115 AD patients and 7,487 non-AD patients. After training the predictors, 42 features of each patient in the test set were input into each of the predictors to determine which patients had AD. Tables 3-7 show the confusion matrix of the prediction results for AdaBoost (26), XGBoost, SmoteBagging (27), EasyEnsemble (28) and XGBF.
Comparing Tables 3-7, it can be seen that the results obtained by AdaBoost and XGBoost are very close, while the latter three, SmoteBagging, EasyEnsemble and XGBF, are significantly different from the first two. For example, in Table 2, AdaBoost found 18 AD patients successfully, but 97 AD patients were predicted as non-AD patients; 15 non-AD patients were predicted as AD patients, and 7,472 non-AD patients were predicted correctly. The first two had high accuracy in determining a non-patient was a non-patient. But for patients, their performance was poor and they failed to achieve the purpose of screening. The latter three correctly identified more AD patients. Although they also predicted more non-AD patients as AD patients, the false positive rate was still low considering the large number of negative collections. Such classifiers are obviously more meaningful in disease screening. They greatly reduced the missed diagnosis rate. In the latter three classifiers, XGBF had the best results with the maximum number of correctly predicted AD patients as AD patients. Table 8 shows the different evaluation results for each algorithm.
The effectiveness of the improved AD screening algorithm XGBF is visualized in Tables 2-7. Compared with the traditional ensemble methods of AdaBoost and XGBoost, our method greatly improved sensitivity. AdaBoost and XGBoost cannot deal with imbalanced data. Specificity was higher than 99%, but sensitivity was only about 15% or 16%, which means these algorithms tended to classify all data as non-AD patients. In other words, they did not classify these data at all. Considering two imbalanced data classification methods—SmoteBagging and EasyEnsemble—although the specificity of our results is not obviously dominant, the sensitivity was still higher. In fact, the sensitivity and specificity of XGBF were the highest of all the algorithms. The SmoteBagging model adds extra training data, and the training time became very long. Considering time and sensitivity, EasyEnsemble was better than SmoteBagging. XGBoost was the fastest model, but the sensitivity was poor. AdaBoost was similar to XGBoost. Although the time consumption of XGBF was not the shortest, it was acceptable. Considering all the factors, it achieved the best results.
The aim of this study was to develop a machine learning model to screen for early AD from routine medical examination data. Currently, the application of machine learning technology in the medical field has received substantial attention. Wu et al. (20) investigated the risk of in-hospital rupture in type A AD patients with a Random Forest model. They used 16 features, including some features extracted from CT images. But there have been no studies specifically developing a screening model from routine examination data. In our study, patients’ routine blood tests, biochemical tests and routine tests for blood coagulation were chosen as the candidate features; all of these are basic inspections and can be performed in any hospital, including most rural hospitals with weak facilities. The cost of conducting these inspections is relatively low, and the inspection time is relatively short. At the same time, according to the doctor's experience, some patients' living habits, family history of genetic diseases and other data are also selected. We have used these features to build a machine learning model to predict patients’ medical condition. It can achieve higher sensitivity and will help people detect AD in a basic, cheap, and fast way.
In our study, an ensemble learning model XGBF was proposed to get better prediction results. Compared with AdaBoost, XGBoost, SmoteBagging and EasyEnsemble, XGBF combined undersampling, oversampling and the ensemble method to obtain the best results. The average sensitivity of the XGBF algorithm was 80.5%, and the specificity was 79.5%. The results show that the misdiagnosis rate of the XGBF algorithm is lower than that of the other four algorithms. At the same time, the screening results of XGBF were also better than the best results obtained by using Smotebagging in the literature (19), and also better than the clinical misdiagnosis rate (29,30). In particular, the improved algorithm XGBF made the missed diagnosis rate less than 20%, which is less than the missed diagnosis rate of 21.9% (19), 35.5% (29) and 39.69% (30).
This study has some limitations: (I) This study is a retrospective study, so there may be some biases. (II) There are some missing values in the data set. We filled and preprocessed the data manually, so there may be some biases. (III) The parameters of the predictor variables affect the prediction results, but most of the parameters in the experiment were adjusted based on experience or experiment. Therefore, due to the limitation of the number of experiments, this result is the best result we have obtained so far, there may be better results in future.
This study has proposed a machine learning model XGBF to predict the condition of AD with routine medical examination data. This predictor has better prediction effect on imbalanced AD data set than other ensemble algorithms. Therefore, XGBF has practical application value for screening for AD.
Funding: We gratefully acknowledge the funding supports from the National Science and Technology Major Project Foundation of China (#2013FY110800); Strategic Emerging Industry Technology Research and Major Technology Achievement Transformation Project (2019GK4013).
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at http://dx.doi.org/10.21037/atm-20-1475
Data Sharing Statement: Available at http://dx.doi.org/10.21037/atm-20-1475
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at http://dx.doi.org/10.21037/atm-20-1475). Dr. Liu and Dr. Tan report that they have a patent “An imbalanced data classification method based on mixed sampling and machine learning pending”. The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the ethics board of Xiangya Hospital, Central South University (201502042). This study is a retrospective study, all data were desensitized data from hospital’s electronic medical records, which did not contain patient identification information, and the consent was waived.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
- Golledge J, Eagle KA. Acute aortic dissection. Lancet 2008;372:55-66. [Crossref] [PubMed]
- Bossone E, LaBounty TM, Eagle KA. Acute aortic syndromes: Diagnosis and management, an update. Eur Heart J 2018;39:739-49d. [Crossref] [PubMed]
- Baguet JP, Chavanon O, Sessa C, et al. European Society of Hypertension scientific newsletter: hypertension and aortic diseases. J Hypertens 2012;30:440-3. [Crossref] [PubMed]
- Ince H, Nienaber CA. Management of acute aortic syndromes. Rev Esp Cardiol 2007;60:526-41. [Crossref] [PubMed]
- Pape LA, Awais M, Woznicki EM, et al. Presentation, diagnosis, and outcomes of acute aortic dissection: 17-year trends from the International Registry of Acute Aortic Dissection. J Am Coll Cardiol 2015;66:350-8. [Crossref] [PubMed]
- Kurz SD, Falk V, Kempfert J, et al. Insight into the incidence of acute aortic dissection in the German region of Berlin and Brandenburg. Int J Cardiol 2017;241:326-9. [Crossref] [PubMed]
- Hagan PG, Nienaber CA, Isselbacher EM, et al. The International Registry of Acute Aortic Dissection (IRAD): new insights into an old disease. JAMA 2000;283:897-903. [Crossref] [PubMed]
- De León Ayala IA, Chen YF. Acute aortic dissection: an update. Kaohsiung J Med Sci 2012;28:299-305. [Crossref] [PubMed]
- Kurabayashi M, Miwa N, Ueshima D, et al. Factors leading to failure to diagnose acute aortic dissection in the emergency room. J Cardiol 2011;58:287-93. [Crossref] [PubMed]
- Hansen MS, Nogareda GJ, Hutchison SJ. Frequency of and inappropriate treatment of misdiagnosis of acute aortic dissection. Am J Cardiol 2007;99:852-6. [Crossref] [PubMed]
- Asouhidou I, Asteri T. Acute aortic dissection: be aware of misdiagnosis. BMC Res Notes 2009;2:25. [Crossref] [PubMed]
- Wen W, Zhang XC. A Study on Misdiagnosis Literature of Single Disease of Chinese Misdiagnosed Disease Database: Aortic Dissection. Clinical Misdiagnosis and Mistherapy 2015;28:1-4.
- Chenkin J. Diagnosis of Aortic Dissection Presenting as ST-Elevation Myocardial Infarction using Point-Of-Care Ultrasound. J Emerg Med 2017;53:880-4. [Crossref] [PubMed]
- Li Y, Yang N, Duan W, et al. Acute aortic dissection in China. Am J Cardiol 2012;110:1056-61. [Crossref] [PubMed]
- Dwivedi AK. Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput Appl 2018;29:685-93. [Crossref]
- Gatuha G, Jiang T. Evaluating Diagnostic Performance of Machine Learning Algorithms on Breast Cancer. International Conference on Intelligent Science and Big Data Engineering, 2015:258-66.
- Liu L, Jiao Y, Li X, et al. Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor. Comput Methods Programs Biomed 2020;196:105624. [Crossref] [PubMed]
- Huo D, Kou B, Zhou Z, et al. A machine learning model to classify aortic dissection patients in the early diagnosis phase. Sci Rep 2019;9:2701. [Crossref] [PubMed]
- Liu L, Zhang C, Zhang G, et al. A study of aortic dissection screening method based on multiple machine learning models. J Thorac Dis 2020;12:605-14. [Crossref] [PubMed]
- Wu J, Qiu J, Xie E, et al. Predicting in-hospital rupture of type A aortic dissection using Random Forest. J Thorac Dis 2019;11:4634-46. [Crossref] [PubMed]
- Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016:785-94.
- Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 2011;16:321-57. [Crossref]
- Gao Y, Wang Y, Wang P, et al. Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network. Int J Environ Res Public Health 2020;17:1614. [Crossref] [PubMed]
- Marill KA. Serum D-dimer is a sensitive test for the detection of acute aortic dissection: a pooled meta-analysis. J Emerg Med 2008;34:367-76. [Crossref] [PubMed]
- Chen Z, Huang B, Lu H, et al. The effect of admission serum potassium levels on in-hospital and long-term mortality in type A acute aortic dissection. Clin Biochem 2017;50:843-50. [Crossref] [PubMed]
- Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997;55:119-39. [Crossref]
- Ali A, Shamsuddin SM, Ralescu A. Classification with class imbalance problem: a review. Int J Adv Soft Comput 2015;7:176-204.
- Liu XY, Wu J, Zhou ZH. Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B Cybern 2009;39:539-50. [Crossref] [PubMed]
- Chen XF, Li XM, Chen XB, et al. Analysis of Emergency Misdiagnosis of 22 Cases of Aortic Dissection. Clinical Misdiagnosis and Mistherapy 2016;29:30-1.
- Teng Y, Gao Y, Feng S, et al. Analysis of Emergency Misdiagnosis of 131 Cases of Aortic Dissection. Chinese Journal of Misdiagnostics 2012;8:1873.