Development and validation of a nonalcoholic fatty liver disease-based self-diagnosis tool for diabetes
Original Article

Development and validation of a nonalcoholic fatty liver disease-based self-diagnosis tool for diabetes

Hye Jun Kim1,2,3#^, Yohwan Lim1,2,3#, Sung Soo Yoon1,2,3, Sang Jun Lee1,2,3, Myeong Hoon Lee1,2,3, Hyewon Park1,2,3, Sun Jae Park4, Seogsong Jeong1,2,3, Hyun Wook Han1,2,3,5

1Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam, Republic of Korea; 2Institute of Basic Medical Sciences, School of Medicine, CHA University, Seongnam, Republic of Korea; 3Institute for Biomedical Informatics, School of Medicine, CHA University, Seongnam, Republic of Korea; 4Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea; 5Healthcare Big-Data Center, Bundang CHA Hospital, Seongnam, Republic of Korea

Contributions: (I) Conception and design: HJ Kim, Y Lim, S Jeong, HW Han; (II) Administrative support: All authors; (III) Provision of study materials or patients: All authors; (IV) Collection and assembly of data: HJ Kim, Y Lim, S Jeong, HW Han; (V) Data analysis and interpretation: HJ Kim, Y Lim, S Jeong, HW Han; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

^ORCID: 0000-0002-5062-1222.

Correspondence to: Hyun Wook Han; Seogsong Jeong. Department of Biomedical Informatics, CHA University School of Medicine, CHA University, 335 Pangyo-ro, Seongnam 13448, Republic of Korea. Email:;

Background: Prediction of type 2 diabetes mellitus (DM) has been studied widely. However, a hospital visit was necessary to apply previous prediction models for the evaluation of DM. This study was conducted to develop and validate a hospital visit-free self-diagnosis tool for DM.

Methods: Participants who underwent health screening between 2017–2018 (n=7,519; training cohort) and 2019–2020 (n=7,564; validation cohort) were extracted from the Korea National Health and Nutrition Examination Survey (KNHANES). DM was defined as doctor-diagnosed DM in a questionnaire. Logistic regression was used to determine independent predictors for DM, and a multivariable logistic regression-based nomogram was developed for the prediction of DM, which was validated in a cohort consisting of an independent population. The presence of nonalcoholic fatty liver disease (NAFLD) was operationally defined using the KNHANES-NAFLD score.

Results: Age, sex, waist circumference, systolic blood pressure, total cholesterol, triglyceride, aspartate aminotransferase, blood urea nitrogen, urinary protein, urinary glucose, and NAFLD were identified as independent predictors for DM. After excluding laboratory variables that require laboratory tests, a simplified multivariable model was conducted based on hospital visit-free variables, including age, sex, waist circumference, systolic blood pressure, and NAFLD. The full and simplified prediction models for DM were presented as nomograms. In the independent validation cohort, the full and simplified DM prediction models were validated with an area under the curve values of 0.903 and 0.824 from the receiver operating characteristic curves, respectively.

Conclusions: Involvement of NAFLD has allowed satisfactory prediction of DM without laboratory tests that require a hospital visit. The developed model may be promising in terms of early diagnosis of DM among individuals without hospital visits and may reduce the socioeconomic burden of DM in the real-world, which awaits future prospective trials to confirm.

Keywords: Nonalcoholic fatty liver disease (NAFLD); type 2 diabetes mellitus; diagnostic test; self-diagnosis; nomogram

Submitted Apr 26, 2022. Accepted for publication Sep 22, 2022.

doi: 10.21037/atm-22-2195


Type 2 diabetes mellitus (DM) is one of the most prevalent human diseases globally that can lead to a number of health issues in terms of morbidity, death, and economic loss (1). DM is recognized as a serious, global public health concern and will continue to pose a major challenge for healthcare systems since its prevalence is expected to nearly double by 2030 (2). DM has been identified to be associated with unhealthy lifestyles such as sedentarism and no physical activity, and other concomitant chronic conditions, such as obesity and liver diseases (3,4).

In general, three standard tests are used in asymptomatic patients to diagnose impaired glucose metabolism or DM: the hemoglobin A1c (HbA1c) test, the fasting plasma glucose (FPG) test, and the 2-hour 75 g oral glucose tolerance test (OGTT) (5). While the sensitivities and specificities of these tests vary, the OGTT is considered the gold standard for the diagnosis of DM. The OGTT detects DM more effectively than FPG because it recognizes altered postprandial metabolism (6). Nonetheless, all of these diagnostics require a hospital visit and undergo tests.

Nonalcoholic fatty liver disease (NAFLD) has gained greater prominence as one of the major factors contributing to the pathogenesis of DM (7,8). NAFLD is a condition characterized by excess fat in the liver caused by nonalcoholic reasons (9) and is regarded as the hepatic component of metabolic syndrome due to its similarities to metabolic disorders such as obesity, inflammation, insulin resistance, and DM (10,11).

Generally, NAFLD is diagnosed using clinical history, lab data, radiographic data, and histologic information (12). Liver biopsy is regarded as trustworthy and, in certain cases, essential for diagnosing NAFLD by grading various histological features, such as steatosis, hepatocellular ballooning, lobular inflammation, and fibrosis. However, due to the invasiveness and high cost of liver biopsy, it is not commonly performed for all individuals at risk of NAFLD (13). Therefore, experts have recommended its selective use in NAFLD patients with a higher likelihood of progression to nonalcoholic steatohepatitis (NASH). Hence, there has been a growing interest in non-invasive diagnostic strategies for NAFLD, such as transient elastography, abdomen ultrasound, computed tomography, or magnetic resonance imaging. Despite their noninvasive nature, these methods have lower sensitivity than a liver biopsy and are associated with radiation hazards or contrast-related risks, which limit their availability (12). As a result, further simplified methods without imaging approaches are becoming increasingly necessary. Following the initial development of the NAFLD liver fat score by Kotronen et al. [2009], there have been ongoing efforts to develop a simplified NAFLD score that is applicable to the general population and health examination datasets (13,14). Jeong et al. [2020] have suggested the K-NAFLD (Korea National Health and Nutrition Examination Survey-derived nonalcoholic fatty liver disease) score, which detects the presence of NAFLD based on sex, waist circumference (WC), systolic blood pressure (SBP), fasting serum glucose (FSG), triglyceride (TG), and alanine aminotransferase (ALT) (13). This score is unique in that it excludes variables such as fasting serum insulin which hampered availability and cost-effectiveness, thereby enhancing its diagnostic applicability. The K-NAFLD was developed with an area under the curve (AUC) value of 0.929 in the Korea National Health and Nutrition Examination Survey (KNHANES) database, which is a Korea-representative cohort, and the positive and negative predictive values were 0.990 and 0.860, respectively (13).

In particular, many studies have clearly demonstrated that NAFLD is responsible for roughly doubling the incidence of DM, regardless of obesity or other common risk factors (15). For example, a meta-analysis of 20 observational studies including over 115,000 nondiabetic people found that NAFLD was associated with a 1.6- to 2.0-fold elevated risk of DM over a median follow-up of 5 years (16). Another large-scale prospective study, which includes more than 130,000 Asian nondiabetic people, had shown that NAFLD was significantly associated with a 2.0-fold increase in incident DM risk independently of confounding factors (17).

Diabetes is a debilitating chronic epidemic, so early detection, diagnosis, and treatment are important. Identifying those at high risk for diabetes is key to preventive strategies (18). Several studies have shown that early detection and treatment of diabetes can delay the progression of the disease and prevent complications (18-20). Therefore, it is essential to develop a feasible and accurate screening tool to identify those at high risk of diabetes onset, which will aid in diabetes prevention programs. Risk prediction models can contribute to the clinical management of a patient. Models can screen individuals to identify those at an increased risk of having an undiagnosed condition, allowing for earlier diagnosis, management, and treatment, improving patient outcomes (19). Yet, most current prediction models are for western populations (18), and there are few diabetes risk nomograms for Korean population. Also, the presence of NAFLD has largely been alienated in earlier prediction models. Moreover, the existing diabetes risk prediction models incorporate a large number of laboratory variables that are inconvenient to apply (18).

Therefore, in this cross-sectional study, we explored the predictive impact of NAFLD on DM, testified NAFLD-based prediction model for DM, and simplified the NAFLD-based prediction model by excluding laboratory variables that require a hospital visit. Considering that most Korean adults are aware of the presence of NAFLD by undergoing biennial health screening of the Korean National Health Insurance Service, NAFLD-based prediction of DM may be a novel approach for the development of a self-diagnosis tool for DM. In addition, we evaluated the difference in predictive performance for the diagnosis of DM between the full and simplified models. We present the following article in accordance with the TRIPOD reporting checklist (21) (available at


Study population

The study population was derived from the KNHANES ( dataset, which is a nationally representative cross-sectional surveillance system that has been evaluating the health and nutritional status of noninstitutionalized Korean civilians residing in Korea since 1998 (22). The training cohort consisted of participants aged 20 or older who underwent health examinations between 2017 to 2018 (n=16,119). We excluded participants who had missing information for medical doctor-diagnosed DM (n=791), age below 20 (n=3,025), missing information for potential covariates (n=807), and alcohol consumption at least 2 times per week (n=3,977). Finally, a total of 7,519 participants were included in the training cohort (Figure 1). The validation cohort included 7,564 patients from KNHANES who underwent examination between 2019 to 2020, using the same inclusion criteria. All participants provided informed consent to the KNHANES before participation. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Institutional Review Board of Bundang CHA Hospital (IRB No. CHAMC 2022-04-041).

Figure 1 Flow diagram for the inclusion of the study population. Training (A) and validation (B) cohorts consisted of participants who underwent health examinations in 2017–2018 and 2019–2020, respectively. DM, diabetes mellitus.

Outcome variable

DM was defined as a previous physician diagnosis of diabetes based on self-reports in the KNHANES cohort. KNHANES is a national surveillance system in Korea that has been assessing the health and nutritional status of the noninstitutionalized Korean population since. This consists of three surveys: a health interview survey, a health examination survey, and a nutrition survey. The survey is conducted annually by the Korea Centers for Disease Control and Prevention (22). DM is defined as answering “Yes” to the survey question “Have you ever been diagnosed with diabetes by a physician?”. In South Korea, the following criteria are used to diagnose DM: HbA1c level ≥6.5% or FPG of ≥126 mg/dL for more than 8 hours or 2 hours plasma glucose of ≥200 mg/dL during 75 g OGTT or classic symptoms of hyperglycemia (polyuria, polydipsia, unexplained weight loss) with a random plasma glucose of ≥200 mg/dL (5). In addition, the doctor-diagnosed DM in the KNHANES database has been used in a number of studies (23,24).

Key variables

The presence of NAFLD was operationally defined when K-NAFLD >0.884 as defined previously (13). The K-NAFLD score is an estimation tool for diagnosis of NAFLD, which is calculated as follows: 0.913 × sex (2 if female; 1 if male) + 0.089 × WC + 0.032 (SBP + FSG) + TG × 0.007 + ALT × 0.105 − 20.929). The following variables were evaluated as a covariate for the development of prediction model: age (continuous; years), sex (categorical; male and female), body mass index (BMI) (continuous; kg/m2), WC (continuous; cm), SBP (continuous; mmHg), diastolic blood pressure (DBP) (continuous; mmHg), total cholesterol (continuous; mg/dL), TG (continuous; mg/dL), aspartate aminotransferase (AST) (continuous; IU/L), ALT (continuous; IU/L), blood urea nitrogen (BUN) (continuous; mg/dL), creatinine (continuous; mg/dL), urinary protein (categorical; −, ±, +, ++, +++, ++++), urinary glucose (categorical; −, ±, +, ++, +++, ++++), urinary pH (continuous), and NAFLD (categorical; yes and no).

Statistical analysis

Data are presented with a mean [standard deviation (SD)] and number (%) for continuous and categorical variables, respectively. The logistic regression model was used for univariable and multivariable analyses, and the results were provided as odds ratio (OR) with 95% confidence interval (CI). Significant variables identified in the univariate analysis were considered significant predictors for DM, and were enrolled for the multivariable analysis. Significant variables in the multivariable analysis were considered independent predictors for DM, which formed the prediction model for DM. As for overlapping or correlated variables, such as SBP and DBP, only one variable with a higher C-index was chosen if both variables were found statistically significant. The full prediction model consisted of all independent predictors for DM, whereas a simplified model was developed using hospital visit-free variables.

To improve usability, nomograms for both the full model and the simplified model were presented. The total point derived from the nomogram was defined as the nomogram score. The performance of the nomograms was independently validated in receiver operating characteristic (ROC) curves with AUC values and in calibration curves. Furthermore, we then developed the prediction model based on the NAFLD Ridge Score (NRS)-defined NAFLD as a sensitivity analysis. The NRS has been found an adequate diagnostic strategy with an AUC value of 0.87 for NAFLD. The cut-off values for NAFLD and no NAFLD were NRS >0.44 and <0.24, respectively, as described in their study (25). Statistical significance was defined as a two-sided P value of <0.05. SAS version 9.4 (SAS Institute Inc.) and R Project for Statistical Computing ( were used for all statistical analyses.


Participant characteristics

Detailed descriptive characteristics of the participants included in the training cohort are shown in Table 1. Briefly, there was a total of 7,519 participants with a mean age of 50.4 years and a female distribution of 59.4% (n=4,464). In addition, 12.4% (n=929) of the participants had NAFLD.

Table 1

Baseline characteristics of the participants in the training cohort

Characteristics Participant (n=7,519)
Age, years 50.4 (16.3)
Sex, female, n (%) 4,464 (59.4)
Body mass index, kg/m2 23.9 (3.6)
Waist circumference, cm 81.5 (10.3)
Systolic blood pressure, mmHg 117.2 (16.6)
Diastolic blood pressure, mmHg 74.9 (9.9)
Total cholesterol, mg/dL 192.8 (37.8)
Triglyceride, mg/dL 125.9 (91.6)
Aspartate aminotransferase, IU/L 22.5 (9.9)
Alanine aminotransferase, IU/L 22.1 (17.2)
Blood urea nitrogen, mg/dL 14.7 (4.6)
Creatinine, mg/dL 0.8 (0.2)
Urinary protein, n (%)
   Negative 6,390 (85.0)
   Positive 1,129 (15.0)
Urinary glucose, n (%)
   Negative 7,213 (96.2)
   Positive 288 (3.8)
Urine pH 5.9 (0.8)
Nonalcoholic fatty liver disease, n (%)
   Yes 929 (12.4)
   No 6,590 (87.6)

Data are mean (standard deviation) unless indicated otherwise.

For the validation cohort, there were 7,564 participants with a mean age of 50.6 years and a female distribution of 58.2% (n=4,405). NAFLD was present in 14.4% (n=1,086) participants.

Identification of independent prognostic factors

In the univariable analyses, age, sex, BMI, WC, SBP, DBP, FSG, total cholesterol, TG, AST, ALT, BUN, creatinine, urinary protein, urinary glucose, urinary pH, and NAFLD were identified as significant predictors for DM (Table S1). In the multivariable analysis, age, sex, WC, SBP, total cholesterol, TG, AST, BUN, urinary protein, urinary glucose, and NAFLD were found as independent predictors for DM (Table S2).

Derivation of the full and simplified prediction models for DM

Results of the multivariable analysis of all independent predictors for DM are shown in Table S3. Age (OR, 1.072; 95% CI: 1.062 to 1.081; P<0.001), female (OR, 1.350; 95% CI: 1.089 to 1.674; P=0.006), WC (OR, 1.026; 95% CI: 1.014 to 1.039; P<0.001), SBP (OR, 0.992; 95% CI: 0.986 to 0.998; P=0.013), total cholesterol (OR, 0.976; 95% CI: 0.973 to 0.979; P<0.001), TG (OR, 1.002; 95% CI: 1.001 to 1.003; P=0.002), AST (OR, 0.986; 95% CI: 0.975 to 0.997; P=0.017), BUN (OR, 1.028; 95% CI: 1.007 to 1.048; P=0.007), urinary protein (OR, 1.238; 95% CI: 1.051 to 1.458; P=0.011), urinary glucose (OR, 2.478; 95% CI: 2.215 to 2.772; P<0.001), and NAFLD (OR, 3.063; 95% CI: 2.210 to 4.244; P<0.001) consisted the full prediction model for DM.

Among independent predictors for DM, age, sex, WC, SBP, and NAFLD were considered self-evaluable, which composed the simplified prediction model for self-diagnosis of DM (Table 2).

Table 2

Multivariable analysis of hospital visit-free independent predictors for diabetes

Covariate Estimate OR (95% CI) P value
Intercept −8.222 <0.001
Age, years 0.077 1.080 (1.072–1.088) <0.001
Sex, female (vs. male) −0.140 0.869 (0.723–1.045) 0.136
Waist circumference, cm 0.027 1.027 (1.016–1.038) <0.001
Systolic blood pressure, mmHg −0.009 0.991 (0.985–0.996) 0.001
NAFLD, yes (vs. no) 1.430 4.178 (3.318–5.260) <0.001

OR calculated using logistic regression. All variables are continuous, unless indicated otherwise. OR, odd ratio; CI, confidence interval; NAFLD, nonalcoholic fatty liver disease.

A calibration plot of the predicted and actual probabilities for DM and ROC curve (AUC, 0.902) revealed that the full prediction model was derived excellently (Figure S1). A nomogram involving variables from laboratory tests is also shown in Figure 2A. A calibration plot and ROC curve of the simplified model are shown in Figure S2. The AUC was 0.830, which is 0.072 lower than the full prediction model. The simplified nomogram is provided in Figure 2B.

Figure 2 Nomograms to predict DM. Nomograms were generated to provide an easy way to apply the prediction models. Drawing lines from the top for each variable and summation of the values directly allows estimation of the risk probability for DM. (A) A full prediction model for DM involving laboratory variables that require hospital visits. (B) A hospital visit-free simplified prediction model for DM. DM, diabetes mellitus; NAFLD, nonalcoholic fatty liver disease.

Validation of the full and simplified prediction model for DM

The calibration plot (Figure 3A) and ROC curve (AUC, 0.903; Figure 3B) of the full prediction model in the validation cohort had shown the excellent performance of the nomogram. In addition, the simplified model was revealed to be well-calibrated with a slope close to 1 in the calibration plot (Figure 3C) and satisfactorily predict DM in a ROC (AUC, 0.824; Figure 3D).

Figure 3 Performance of the full and simplified prediction models for DM in the validation cohort. (A) Calibration plot comparing predicted and actual probability of DM in the full prediction model. (B) Receiver operating characteristic curve for DM of the full prediction model. (C) Calibration plot comparing predicted and actual probability of DM in the simplified prediction model. (D) Receiver operating characteristic curve for DM of the simplified prediction model. DM, diabetes mellitus; AUC, area under the curve.

Sensitivity analysis of the operational definition for NAFLD using the NRS

The full prediction model based on the NRS-defined NAFLD (Table S4) revealed that it was derived excellently, with an AUC value of 0.896. Also, the AUC value of the ROC curve for the simplified model (Table S5) was 0.811, indicating satisfactory performance. In the validation cohort, the calibration plot and ROC curve (AUC, 0.897; Figure S3) of the full prediction model had shown excellent performance. In addition, the simplified model was found to be well-calibrated, with a slope close to 1 in the calibration plot, and to predict DM satisfactorily in a ROC (AUC, 0.810; Figure S4).

Consequently, we were able to confirm that the performance of both the full and simplified models was comparable to that of models developed using the K-NAFLD score. Further detailed information regarding the sensitivity analysis based on the NRS index is provided in the Supplementary section.


This study has developed and validated a well-performing DM prediction model based on the following variables: age, sex, WC, SBP, total cholesterol, TG, AST, BUN, urinary protein, urinary glucose, and NAFLD. This study also highlights a unique and accurate simplified model that supports self-diagnosis of DM, which was developed based on the hospital visit-free data (i.e., age, sex, WC, SBP) and NAFLD. Given the growing population and high incidence of DM, as well as the clinical and socioeconomic burden that it entails, the availability of self-diagnosis would be useful for early detection and management of DM. Both the full and simplified models were validated to be well-calibrated and accurate. We further provided a nomogram, an easily adopted tool that allows direct estimation of DM risk without the need for complex calculation. The derived nomogram may be promising in promoting early diagnosis and reducing the socioeconomic burden of DM.

Although numerous DM risk prediction models have been established, the vast majority are developed in non-Asian populations and rely heavily on laboratory variables (18). Besides, the presence of NAFLD was not accounted for. However, other risk predictors included in our models, such as age, sex, BMI (or WC), SBP, total cholesterol, TG, and AST, have been widely recognized as attributes related to DM (19,20) and were thus commonly included in previous DM prediction models (1,19). Furthermore, after validation, our model demonstrated comparable performance to earlier models (1), with an AUC value greater than 0.80.

DM has long been recognized as a complicated and multifactorial metabolic disease characterized by hyperglycemia caused by defects in insulin production, insulin action, or both (26). Abnormalities in carbohydrate, fat, and protein metabolism in DM are deficient action of insulin on target tissues (26), resulting in elevated levels of glucose and lipids within the blood (5). Chronic exposure to a high level of glucose and lipids triggers a number of pathways that result in impaired insulin secretion from the pancreatic β-cells, insulin resistance and decreased glucose utilization in peripheral tissues, and aberrant hepatic glucose production (27). Growing lines of evidence suggest that reactive oxygen species and oxidative stress are among the primary causal factors responsible for these pathogeneses (28). The predictive effects of the predictors for DM identified in this study have been evaluated in previous studies (1,29). Older age, for example, is a risk factor for the onset of DM since aging causes a decline of glucose sensitivity and impaired insulin secretion in pancreatic β-cells (18). In addition, obesity, as measured by higher WC or BMI, has been found to increase the liver and pancreatic fat content, damaging pancreatic β-cells and potentially leading to insulin resistance via metabolic derangements (18,30).

As high serum glucose levels, insulin resistance, and damaged islet cell function characterize DM, patients with NAFLD may be more vulnerable to the risk of incident DM (4,9,15). This owes to the disruption of key physiological functions of the liver, including glucose and lipid metabolism, in the setting of NAFLD, which may be accompanied with a systemic inflammation triggered in part by liver-secreted cytokines and molecules (4,31). According to the lipotoxic hypothesis, the influx of free fatty acid from the excessive adipose tissue to peripheral tissues would induce insulin resistance (10,32,33). Excessive intrahepatic fat content may deter muscle insulin sensitivity, leading to impaired peripheral insulin resistance (31,32). Also, the accumulation of lipid intermediates in the liver, such as diacylglycerol and ceramides, causes hepatic insulin resistance, increased hepatic gluconeogenesis, and exhaustion of pancreatic β-cells (34). Hence, NAFLD exacerbates hepatic/peripheral insulin resistance, predisposes to atherogenic dyslipidemia and releases inflammatory mediators, such as interleukin-6, tumor necrosis factor-α, and fetuin-A, all of which can be provoked by increased oxidative stress (34). In addition, a variety of procoagulant, thrombogenic and profibrogenic factors may together engage in the onset of DM (31). Therefore, we attempted to apply the strong association between NAFLD and DM in development of a NAFLD-based hospital visit-free self-diagnostic prediction model for DM since most individuals are aware of self-status regarding the presence of NAFLD.

To the best of our knowledge, this is the first study to develop and validate a NAFLD-based hospital visit-free self-diagnosis tool for DM. Unlike the majority of current predictive models that necessitates a hospital visit for the evaluation of DM, we could confirm that non-laboratory variables could also well-predict the risk of DM. The derived simplified model for DM may allow early diagnosis of DM, which is vital in reducing DM complications and economic burden. Especially in the pandemic era where clinical services have experienced a significant decline worldwide, digitalized health care such as telehealth has been advocated as one of the major solutions to overcome the lack of elective care (35). Thus, the use of self-diagnosis tool to predict the risk of DM would relieve the burden of hospital visits and could promote better management of the disease, regardless of the socioeconomic status of individuals or public health issues.

However, there are some limitations that need to be considered. First, because it is a cross-sectional study, we could not identify causal relationships. Second, NAFLD was operationally defined using the K-NAFLD score. Therefore, we have adopted the NRS as a sensitivity analysis for the operational definition of NAFLD. However, future studies with imaging or liver biopsy-based diagnosis of NAFLD would still be helpful to support our results. In addition, since the presence of NAFLD was operationally defined using the K-NAFLD score after excluding those with alcohol consumption, some participants with NAFLD may had fatty liver. Further studies with information regarding medication use and history of diseases are required to confirm whether either NAFLD or fatty liver is better predictive of DM. Third, the derived prediction models lack external validation. Therefore, we have stratified the cohort into training and validation cohorts with independent participants to evaluate the performance of the prediction models. Fourth, individuals without a history of hospital visits may not be able to use the simplified prediction model for DM since it is based on the presence of NAFLD, which requires prior health screening that involved imaging modalities, such as ultrasound and computed tomography. Fifth, prognostic variables such as WC may vary depending on the location or method to measure waist or whether participants had meal. Therefore, variables such as BMI could be substituted in order to reduce bias and improve the diagnostic tool’s accuracy.

This study found that age, sex, WC, SBP, total cholesterol, TG, AST, BUN, urinary protein, urinary glucose, and NAFLD were all good predictors of DM risk. The simplified model, which only included variables that did not necessitate hospital visits, had a marginally lower AUC value after validation, but its performance was still satisfactory. Consideration of the presence of NAFLD may have allowed for reliable prediction of DM in the simplified model. We believe that our model could alleviate the clinical and economic burden of DM by promoting its prevention and effective management. Future external validation is warranted for this self-diagnosis tool to be widely used in a real-world.


Funding: This work was supported by the Bio Industry Technology Development Program (No. 20015086); funded by the Ministry of Trade, Industry, & Energy (MOTIE, Korea); supported by Basic Science Research Program through the National Research Foundation of Korea (NRF); funded by the Ministry of Education (No. 2020R1F1A1068423); and funded by the Promotion of Innovative Businesses for Regulation-Free Special Zones funded by the Ministry of SMEs and Startups (MSS, Korea; No. P0011221).


Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All participants provided informed consent to the KNHANES before participation. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Institutional Review Board of Bundang CHA Hospital (IRB No. CHAMC 2022-04-041).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See:


  1. Joshi RD, Dhakal CK. Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. Int J Environ Res Public Health 2021;18:7346. [Crossref] [PubMed]
  2. Williams R, Karuranga S, Malanda B, et al. Global and regional estimates and projections of diabetes-related health expenditure: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Res Clin Pract 2020;162:108072.
  3. Katsimpris A, Brahim A, Rathmann W, et al. Prediction of type 2 diabetes mellitus based on nutrition data. J Nutr Sci 2021;10:e46. [Crossref] [PubMed]
  4. Tanase DM, Gosav EM, Costea CF, et al. The Intricate Relationship between Type 2 Diabetes Mellitus (T2DM), Insulin Resistance (IR), and Nonalcoholic Fatty Liver Disease (NAFLD). J Diabetes Res 2020;2020:3920196. [Crossref] [PubMed]
  5. Ko SH. 2021 Clinical Practice Guidelines for Diabetes Mellitus in Korea. J Korean Diabetes 2021;22:244-9. [Crossref]
  6. Sitasuwan T, Lertwattanarak R. Prediction of type 2 diabetes mellitus using fasting plasma glucose and HbA1c levels among individuals with impaired fasting plasma glucose: a cross-sectional study in Thailand. BMJ Open 2020;10:e041269. [Crossref] [PubMed]
  7. Leite NC, Salles GF, Araujo AL, et al. Prevalence and associated factors of non-alcoholic fatty liver disease in patients with type-2 diabetes mellitus. Liver Int 2009;29:113-9. [Crossref] [PubMed]
  8. Sung KC, Jeong WS, Wild SH, et al. Combined influence of insulin resistance, overweight/obesity, and fatty liver as risk factors for type 2 diabetes. Diabetes Care 2012;35:717-22. [Crossref] [PubMed]
  9. Yki-Järvinen H. Non-alcoholic fatty liver disease as a cause and a consequence of metabolic syndrome. Lancet Diabetes Endocrinol 2014;2:901-10. [Crossref] [PubMed]
  10. Kitade H, Chen G, Ni Y, et al. Nonalcoholic Fatty Liver Disease and Insulin Resistance: New Insights and Potential New Treatments. Nutrients 2017;9:387. [Crossref] [PubMed]
  11. Chalasani N, Younossi Z, Lavine JE, et al. The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases. Hepatology 2018;67:328-57. [Crossref] [PubMed]
  12. Ahmed A, Wong RJ, Harrison SA. Nonalcoholic Fatty Liver Disease Review: Diagnosis, Treatment, and Outcomes. Clin Gastroenterol Hepatol 2015;13:2062-70. [Crossref] [PubMed]
  13. Jeong S, Kim K, Chang J, et al. Development of a simple nonalcoholic fatty liver disease scoring system indicative of metabolic risks and insulin resistance. Ann Transl Med 2020;8:1414. [Crossref] [PubMed]
  14. Kotronen A, Peltonen M, Hakkarainen A, et al. Prediction of non-alcoholic fatty liver disease and liver fat using metabolic and genetic factors. Gastroenterology 2009;137:865-72. [Crossref] [PubMed]
  15. Lonardo A, Nascimbeni F, Mantovani A, et al. Hypertension, diabetes, atherosclerosis and NASH: Cause or consequence? J Hepatol 2018;68:335-52. [Crossref] [PubMed]
  16. Ballestri S, Zona S, Targher G, et al. Nonalcoholic fatty liver disease is associated with an almost twofold increased risk of incident type 2 diabetes and metabolic syndrome. Evidence from a systematic review and meta-analysis. J Gastroenterol Hepatol 2016;31:936-44. [Crossref] [PubMed]
  17. Chen SC, Tsai SP, Jhao JY, et al. Liver Fat, Hepatic Enzymes, Alkaline Phosphatase and the Risk of Incident Type 2 Diabetes: A Prospective Study of 132,377 Adults. Sci Rep 2017;7:4649. [Crossref] [PubMed]
  18. Wu Y, Hu H, Cai J, et al. A prediction nomogram for the 3-year risk of incident diabetes among Chinese adults. Sci Rep 2020;10:21716. [Crossref] [PubMed]
  19. Collins GS, Mallett S, Omar O, et al. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med 2011;9:103. [Crossref] [PubMed]
  20. Tuso P. Prediabetes and lifestyle modification: time to prevent a preventable disease. Perm J 2014;18:88-93. [Crossref] [PubMed]
  21. Collins GS, Reitsma JB, Altman DG, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55-63. [Crossref] [PubMed]
  22. Kweon S, Kim Y, Jang MJ, et al. Data resource profile: the Korea National Health and Nutrition Examination Survey (KNHANES). Int J Epidemiol 2014;43:69-77. [Crossref] [PubMed]
  23. Hong YS, Kim H, Zhao D, et al. Chronic Kidney Disease on Health-Related Quality of Life in Patients with Diabetes Mellitus: A National Representative Study. J Clin Med 2021;10:4639. [Crossref] [PubMed]
  24. Kwan BS, Kim SC, Jo HC, et al. The Association between Menstrual Irregularities and the Risk of Diabetes in Premenopausal and Postmenopausal Women: A Cross-Sectional Study of a Nationally Representative Sample. Healthcare (Basel) 2022;10:649. [Crossref] [PubMed]
  25. Yip TC, Ma AJ, Wong VW, et al. Laboratory parameter-based machine learning model for excluding non-alcoholic fatty liver disease (NAFLD) in the general population. Aliment Pharmacol Ther 2017;46:447-56. [Crossref] [PubMed]
  26. American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care 2009;32:S62-7. [Crossref] [PubMed]
  27. Rehman K, Akash MSH. Mechanism of Generation of Oxidative Stress and Pathophysiology of Type 2 Diabetes Mellitus: How Are They Interlinked? J Cell Biochem 2017;118:3577-85. [Crossref] [PubMed]
  28. Rehman K, Akash MS. Mechanisms of inflammatory responses and development of insulin resistance: how are they interlinked? J Biomed Sci 2016;23:87. [Crossref] [PubMed]
  29. Lindström J, Tuomilehto J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care 2003;26:725-31. [Crossref] [PubMed]
  30. Barazzoni R, Gortan Cappellari G, Ragni M, et al. Insulin resistance in obesity: an overview of fundamental alterations. Eat Weight Disord 2018;23:149-57. [Crossref] [PubMed]
  31. Adams LA, Anstee QM, Tilg H, et al. Non-alcoholic fatty liver disease and its relationship with cardiovascular disease and other extrahepatic diseases. Gut 2017;66:1138-53. [Crossref] [PubMed]
  32. Park SK, Seo MH, Shin HC, et al. Clinical availability of nonalcoholic fatty liver disease as an early predictor of type 2 diabetes mellitus in Korean men: 5-year prospective cohort study. Hepatology 2013;57:1378-83. [Crossref] [PubMed]
  33. Mendez-Sanchez N, Cruz-Ramon VC, Ramirez-Perez OL, et al. New Aspects of Lipotoxicity in Nonalcoholic Steatohepatitis. Int J Mol Sci 2018;19:2034. [Crossref] [PubMed]
  34. Hamed AE, Elsahar M, Elwan NM, et al. Managing diabetes and liver disease association. Arab J Gastroenterol 2018;19:166-79. [Crossref] [PubMed]
  35. Sighinolfi MC, Zoeir A, Eissa A, et al. Review of nomograms to counsel patients after oncologic surgery: a support for telemedicine to stratify the risk of relapse and customize the follow-up scheduling. Minerva Urol Nephrol 2021;73:402-4. [Crossref] [PubMed]
Cite this article as: Kim HJ, Lim Y, Yoon SS, Lee SJ, Lee MH, Park H, Park SJ, Jeong S, Han HW. Development and validation of a nonalcoholic fatty liver disease-based self-diagnosis tool for diabetes. Ann Transl Med 2022;10(21):1158. doi: 10.21037/atm-22-2195

Download Citation