Deep learning based classification of ultrasound images for thyroid nodules: a large scale of pilot study

Qing Guan, Yunjun Wang, Jiajun Du, Yu Qin, Hongtao Lu, Jun Xiang, Fen Wang


Background: To explore the ability of the deep learning network Inception-v3 to differentiate between papillary thyroid carcinomas (PTCs) and benign nodules in ultrasound images.
Methods: A total of 2,836 thyroid ultrasound images from 2,235 patients were divided into a training dataset and a test dataset. Inception-v3 was trained and tested to crop the margin of the images of nodules and provide a differential diagnosis. The sizes and sonographic features of nodules were further analysed to identify the factors that may influence diagnostic efficiency. Statistical analyses included χ2 and Fisher’s exact tests and univariate and multivariate analyses.
Results: There were 1,275 PTCs and 1,162 benign nodules in the training group and 209 PTCs and 190 benign nodules in the test group. A margin size of 50 pixels and an input size of 384×384 showed the best outcome after training, and these parameters were selected for the test group. In the test group, the sensitivity and specificity for Inception-v3 were 93.3% (195/209) and 87.4% (166/190), respectively. Inception-v3 displayed the highest accuracy for 0.5–1.0 cm nodules. The accuracy differed according to the margin description (P=0.024). Taller nodules were more accurately diagnosed than were wider nodules (P=0.015). Microcalcification [odds ratio (OR) =0.254, 95% confidence interval (CI): 0.076–0.847, P=0.026] and taller shape (OR =0.243, 95% CI: 0.073–0.810, P=0.021) were negatively associated with misdiagnosis rate.
Conclusions: Inception-v3 can achieve an excellent diagnostic efficiency. Nodules that are 0.5–1.0 cm in size and have microcalcification and a taller shape can be more accurately diagnosed by Inception-v3.