Lung cancer is the second most common cancer and the primary cause of cancer mortality in US (1). The number of new cases and deaths of lung cancer in US are estimated to be 234,030 and 154,050 in 2018, respectively (1). About 85% of the lung cancer cases are non-small cell lung cancer (NSCLC). Although 5-year survival rate was only 15%, it has been reported that survival for NSCLC has improved because of the improvement in treatment and medical care for NSCLC patients (2). With the improvement of survival for NSCLC, studies on second primary malignancy (SPM) in survivors from NSCLC patients are becoming urgent. A single-center study of 569 patients showed that the incidence for SPM in resected stage I NSCLC was 15% (3). Another research also reported an observed/expected (O/E) ratio (2.04) of SPM for patients with stage IA NSCLC in the United States (US) (4). While in stage III NSCLC patients successfully treated with chemo-radiotherapy, the O/E ratio was 2.8, which increased with time (5). Hu et al. reported that 1,412 (3.33%) survivors from initial primary lung cancer developed metachronous second primary lung cancer (6). Besides, when considering about SPM, researchers mainly focus on the second primary lung cancer, but actually SPMs are not just limited at lung, but also involved other organs, which occupy a substantial proportion (7). Although there are several studies about SPM on patients with certain stage NSCLC (3-5), comprehensive evaluation of SPM in all stages of NSCLC patients is still warranted. Considering the increasing number of survivors from NSCLC, a risk predicting model for SPM in NSCLC patients is particularly needed for the guidance of screening.
The present study was designed to evaluate the risk of SPM in NSCLC patients utilizing the SEER database and identify the clinical and demographic factors associated with SPM risk. We tried to produce a competing risk nomogram to predict SPM in NSCLC patients based on subdistribution hazard methods. Moreover, we also estimated the risk-stratification ability and clinical utility of the risk predicting nomogram.
The National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database is a population-based database that assembles data related to demographic, incidence and survival of cancer patients in the US. SEER-13 registries consist of patients from Atlanta, Connecticut, Detroit, Hawaii, Iowa, New Mexico, San Francisco-Oakland, Seattle-Puget Sound, Utah, Los Angeles, San Jose-Monterey, Rural Georgia and the Alaska Native Tumor Registry, which covers approximately 13.4% of the population in the US. Specifically, we extracted data from the SEER-13 registries database (https://seer.cancer.gov/) by using the SEER*Stat software version 8.3.5 (accession number: 13693-Nov2015), to estimate the risk of SPM in patients whose initial primary cancers (IPC) are NSCLC (8,9).
We retrieved all records of patients diagnosed with NSCLC as IPC between January 2004 and December 2010. The study cohort composed of patients with the following International Classification of Diseases for Oncology, Third Edition (ICD-O-3), morphology codes: 8012/3, 8046/3, 8070/3, 8140/3, 8240/3, 8250/3, 8560/3 and 9053/3; and the site codes: C33.9, C34.0, C34.1, C34.2, C34.3, C34.8 and C34.9 (10). The exclusion criteria were as follows: (I) age at diagnosis younger than 18 years; (II) patients with incomplete survival data and follow-up information; and (III) patients with only autopsy or death certificate records. The year 2004 was selected as the first year of the study given that several employed covariates were introduced in SEER in 2004 [American Joint Committee on Cancer: AJCC Staging Manual (6th edition), http://www.cancerstaging.org] (11). The year 2014 was set as the follow-up cutoff date, ensuring that all included cases were followed up for at least 4 years. Demographic and clinicopathological data extracted from SEER database included age, gender, race, marital status, IPC tumor site, tumor size, histologic type, histologic grade, summary stage, TNM stage, surgery history, follow-up information and SPM.
Definition of SPM
SPM of NSCLC is defined as a second primary cancer with more than 6 months latency between SPM and IPC based on Warren and Gates criteria (12). In addition, as is proposed by Nael Martini and Myron R Melamed, a second malignancy in lung specially can be considered to be SPM if it fits any of the three criteria listed below: (I) the histologic results of SPM and IPC are different; (II) the latency between SPM and IPC is more than 2 years; (III) the two tumors are located in different lobe and there is no record of positive intervening lymph nodes and evidence of metastasis (13). The selection criteria were summarized in Figure S1.
The crude incidence of SPM was defined as the ratio of SPM cases among overall patients. The incidence of SPM was compared to the expected incidence computed using age-specific rates from a reference population in the calculation of standardized incidence ratio (SIR) and 95% confidence interval (CI). We calculated SIR of SPM by SEER*Stat MP-SIR session. In consideration of the different incidences and therapies outcomes between squamous cell carcinoma (SCC) (23.1%) and adenocarcinoma (ADC) (44.3%), we then analyzed SPM in patients with NSCLC overall and different histological groups respectively. In addition, we also analyzed how SPM are influenced by some demographic or clinical factors. Demographic and clinical factors considered here were site of SPM, age at diagnosis of IPC, gender, race, summary stage of IPC, latency and attained calendar year of IPC.
We adopted a proportional subdistribution hazards regression to obtain unbiased estimates of the risk of SPM in the presence of competing risks, in order to identify the clinical and demographic factors associated with SPM risk (14). We regarded SPM and any cause of death as two competing events in our competing-risk analysis. We used the cumulative incidence function (CIF) to show the probability for each event. The differences in CIF between the groups were estimated by Gray’s test. The variables to be included in the final prediction model were selected based on the Bayesian information criterion (15). Nomogram has been suggested as a reliable tool to quantify the risk (16-19). We then built a competing-risk nomogram based on Fine and Gray’s model to predict the 3-, 5-, 10-year probability for patients with SPM. Model performance was evaluated by discrimination and calibration using a bootstrap cross-validation approach with 100 bootstrap resamples. We used the index of probability of concordance between predicted probability and response (c-index) to evaluate discrimination. Calibration was evaluated using a calibration plot. We also examined the risk stratification ability of the model because of some criticisms of the c-index (20-22). Estimated risks are useful to identify the high-risk individuals for prevention. We divided patients into 10 groups by deciles based on estimated risks. We then estimated CIF for each group using the Gray method and compared them across the deciles (23). In order to assess the potential clinical utility of our prediction nomogram, we applied decision curve analysis by calculating its net benefit (24,25). A model is clinically useful if the application of the model produces a larger net benefit (24).
All statistical analyses were performed using R version 3.4.2 software (Institute for Statistics and Mathematics, Vienna, Austria; www.r-project.org). Statistical significance was set at two-sided P<0.05.
Population characteristic and SPM incidence
We identified 78,175 eligible NSCLC patients diagnosed between 2004 and 2010. Characteristics of the study population are listed in Table 1. The median follow-up was 10 months from the diagnosis of IPC (interquartile range, 3–32 months). There were 3,161 (4.04%) patients developing SPM during the follow-up period. The crude incidence of SPM is greater in NSCLC patients with the following characteristics, such as age at 60–74 (4.99%), female (4.09%), white (4.28%), married (4.54%), middle lung lobe (4.93%), small tumor size (1–2 cm, 9.05%), bronchioloalveolar (10.02%), well-differentiated (I–II, 7.78%), localized (11.18%), early stage (IA 13.48%, IB 10.65%) and surgery history (11.85%). A vast majority of the SPM is solid tumors. It’s worth noting that the sites of SPM are not only limited to the lung (1,425/3,161, 45.1%), but also other organs (1,736/3,161, 54.9%) (Table S1).
The SIR of SPM among NSCLC patients were listed in Table 2. SIR in overall NSCLC patients, SCC group and ADC group were 1.62, 1.73 and 1.64 respectively. The SIR of second primary lung cancer (4.87) was higher than that of other sites, followed by larynx (3.18) and thyroid (2.14) cancers. We also analyzed the SIR in different subgroups, classified by clinical or demographic factors. SIR in patients whose NSCLC were diagnosed at 18–65 years old was higher than that in patients diagnosed at 65+ years old (2.08 vs. 1.42; Table S2). Higher SIR was observed in female patients (1.71; Table S3). There was higher SIR for NSCLC patients who were black than white or other (American Indian/AK Native, Asian/Pacific Islander) (1.80 vs. 1.61 vs. 1.53; Table S4). Those with localized IPC got the highest SIR (1.84; Table S5). In NSCLC patients, SIR of developing SPM during the latency of 6–11 months and 12–23 months were seen to be similar and relatively low, and SIR changed as latency extended (Table S6). The highest SIR belongs to patients with a latency of 60–119 months. SPM incidence has increased with time (Table S7). All these trends can also be observed in both SCC group and ADC groups. Subgroup SIR information was shown in Supplementary Materials in detail (Tables S2-S7).
The 10-year cumulative incidence of SPM for NSCLC was 5.05% (95% CI: 4.87–5.25%). The 10-year estimates of the cumulative incidence of SPM by age, sex, race, marital status, IPC tumor site, tumor size, histologic type, histologic grade, TNM stage, surgery history was summarized in Table 1. The corresponding CIF curves were plotted in Figure 1. Patients with characteristics of younger age, white race and married were at a higher cumulative incidence of SPM. Small tumor size, earlier TNM stage, well differentiated histological grade, localized, large cell carcinoma and surgery history were also highly related with higher SPM risk. CIF for SPM did not differ significantly among different gender.
Factors associated with SPM
As the results of competing risk model displayed on Table 3, the age of IPC diagnosis, sex, race, marital status, IPC tumor site, tumor size, TNM stage, extent of disease and surgery history could strongly predict SPM risk. Tumor TNM stage, extension of disease and surgery history was the most crucial factors in predicting SPM risk. Patients with advanced TNM stage were less likely to develop SPM, with a subdistribution hazard ratios (sdHR) of 0.45 (95% CI: 0.34–0.61) for IV stage, when compared with IA stage. Regional and distant extensions of IPC were associated with a gradually decreased risk of SPM (regional 0.83; distant 0.43). Patients who underwent surgery had a higher SPM incidence, with a sdHR of 2.95 (95% CI: 2.64–3.29), compared with those without surgery. Besides, survivors of IPC had significantly gradually reduced SPM risks as the tumor size increase.
Competing risk nomogram
The nomogram based on Fine and Gray’s model is shown in Figure 2. We can use this predictive tool to predict the probability of 3-, 5-, 10-year SPM events for NSCLC survivors, by calculating the sum of points corresponding to patient’s characteristic.
Evaluation of the model
Our nomogram showed good accuracy with c-index of 0.80, which suggests good model discriminatory ability of the risk predicting model. As shown in Figure S2, the calibration curve was in good concordance with the 45° diagonal line, thus the nomogram was well-calibrated.
The CIF of SPM were estimated for each decile of the estimated risk using our risk model (Figure 3). The lowest CIF belonged to the first decile (1.04%) and the highest CIF was observed in the tenth decile (16.70%). It shows a wide CIF interval between the first-decile and tenth-decile group, which has a statistically significant difference (P<0.05). All these indicated that NSCLC patients can be stratified to different risk groups according to their estimated SPM risk. The model may perform well in figuring out the high-risk NSCLC patients from the survivors of IPC.
We compared the clinical net benefit of the risk model with those in two alternative scenarios: all-screening and no-screening scenarios. As shown in Figure 4, the net benefit of our risk model was larger than that in other two scenarios in a wide range of threshold probabilities (1% to 20%). Screening is recommended if an individual’s risk is above the given risk threshold (selected from 1% to 20%), then the calculated net benefit (the weighted sum of true positives subtracted by the number of false positives) is larger for the prediction model than that in other strategies (all screening or no-screening).
Among 3,161 SPM cases from our present study, 1,736 (54.92%) cases were not in lung or bronchus. There were 2,329 (73.68%) cases occurring within the latency of 6 to 59 months in our study. The SIR of SPM after the initial primary NSCLC was 1.62, which was larger than 1. The SIR which was larger than 1 indicated that patients once diagnosed with initial primary NSCLC are more likely to develop another primary cancer as compared with the reference population and need surveillance.
Considering the elevated risk of developing SPM after NSCLC, it is necessary to build a risk predicting model for the guidance of screening. There has been a model for the prediction of second primary lung cancer for patients who survived over 5 years after the diagnosis of initial primary lung cancer (26). But we thought this model was significant and yet insufficient. Han’s model mentioned above mainly focused on the second primary lung cancer. But prediction for SPM from other sites is obviously necessary, because more than half of SPM did not involve lung. Moreover, Han’s model merely predicts SPM developing 5 years after the diagnosis of IPC. However, we found that most of cases occurring within 5 years. Therefore, it is reasonable for us to predict SPM after the 6 months latency for patients diagnosed with IPC, to cover more potential risk population.
Traditional risk predicting models have been well documented, such as those predicting the overall survival of cancer patients. These models were mainly based on Kaplan-Meier method and COX proportional approach, which can handle only one outcome and may produce biased results in the presence of competing risks. However, in the study of SPM, competing risks are especially relevant, because a substantial proportion of the survivors of NSCLC often die as the result of other causes before developing SPM, such as heart disease (10). Therefore, competing risks methods based on the subdistribution hazard function is recommended, instead of the conventional ones (26,27). But to our knowledge, there is no such risk predicting model for SPM in NSCLC patients presented as nomogram.
In this study, a risk predicting model was set up and presented in intuitive nomogram. Our results have indicated that, patients with IPC diagnosed when aged 60–74 years old, who are black or white, with smaller IPC tumor size, with surgery performed, early TNM stage or localized/regional extension got higher sdHR for SPM. We observed the trend that patients with factors indicating better prognosis got higher sdHR, which coincides the former report (28). The explanation may be that these patients have more time to develop SPM. Therefore, NSCLC survivors with risk factors, like with IPC diagnosed when aged 60–74 years old, as mentioned above are in high need of intense surveillance for SPM.
Our model performed well in the evaluation. It was of good accuracy, calibration and strong ability in stratifying high-risk individuals from low-risk individuals. In recent years, there has been criticism that c-index is merely an estimation of the accuracy of risk predicting models and is not clinically relevant (24). Reporting c-index only is a neglect of the harm of false-positive result and false-negative result. For example, in situations when false-negative result is considered more harmful, a prediction model with higher sensitivity will be preferable, which c-index does not tell. So, we additionally performed the decision curve analysis for estimation of our model, which is clinically useful. In the calculation of net benefit, benefit from screening individuals who turn out to have true-positive result and loss caused by screening individuals who have the false-positive result were taken into consideration, using threshold probability as a reflection of how doctors and patients weight the benefit and loss mentioned above. Finally, net benefit was presented with the set threshold probability. In this way, doctors and patients can refer to the net benefit of our model according to their threshold probability, for the decision to use or not to use our model.
Our study is based on a large, population-based cohort from SEER database. The sampling error was minimized as compared with data from single-institution based studies. The estimated SIR and the distribution of demographic and clinical factors were closer and more similar to those in the population. What’s more, SEER registers are distributed throughout the US and collect data according to consistent criteria, ensuring the quality of the data.
However, there are still some limitations in our study. Although we fully trust in the quality of records in SEER database, some relapse, intrapulmonary metastasis or distant metastasis may have been mistaken as SPM, due to the technical difficulty in discriminating between them. For this concern, we deleted 46 SPM records according to the definition of second primary lung cancer as has been proposed by Martini N and Melamed MR, which was explained in detail in the part of Material and Methods (13). Moreover, as has been mentioned, we observed that the extrapulmonary SPM sites with the highest SIR were larynx and thyroid, which are not the common sites of distant metastasis of lung cancer. This result may indicate that the SPM records in SEER database are not distant metastasis and are trustworthy. The SEER database does not provide detailed information about the smoking history, working environment, family history, complications, existing diseases, gene mutation, therapy, which may be risk factors of SPM (29-32). There is evidence that hazard ratio (HR) of SPM in radiotherapy-associated sites in early breast cancer patients got higher if radiotherapy was received (33). With these factors considered, our model may have gotten a better performance. The predicting model was built based on data collected in the US. The application of this model in other nations may cause some unknown bias. However, this study still helps figure out the potential risk factors of SPM after NSCLC in different national settings, and patients in different countries can simply get access to their estimated probability of SPM using our nomogram. Notably, the predicted risk based on our nomogram is only a reference, not an absolutely accurate prediction. Although it demonstrates a satisfactory performance when we used a bootstrap approach for internal validation, external validation based on external patient cohorts is still needed. That will be a major part in our future research.
In the present study, we firstly provided a systematic estimation of SPM in NSCLC patients using a large population-based cohort from the SEER database. Moreover, we developed the first competing risk nomogram to predict the risk of SPM, which might be a convenient and predictive tool for individualized SPM screening. Further external validation is warranted.
The authors acknowledge the efforts of the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in providing high quality open resources for researchers. The authors would like to thank the editors and the anonymous reviewer for their valuable comments and suggestions to improve the quality of the paper.
Funding: This work was supported by the National Key R&D Program of China (Grant No. 2016YFC0905500, 2016YFC0905503); Science and Technology Program of Guangdong (Grant No. 2017B020227001); Science and Technology Program of Guangzhou (Grant No. 201704020072).
Conflicts of Interest: The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Institutional review board approval was waived for this study because SEER database is a public anonymized database. The author Zhou. has gotten the access to the SEER database (accession number: 13693-Nov2015). The authors are accountable for all aspects of the work.
- Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin 2018;68:7-30. [Crossref] [PubMed]
- Xia W, Yu X, Mao Q, et al. Improvement of survival for non-small cell lung cancer over time. Onco Targets Ther 2017;10:4295-303. [Crossref] [PubMed]
- Rice D, Kim HW, Sabichi A, et al. The risk of second primary tumors after resection of stage I nonsmall cell lung cancer. Ann Thorac Surg 2003;76:1001-7; discussion 1007-8. [Crossref] [PubMed]
- Khanal A, Uprety D, Basnet B, et al. Risk of second primary malignancy in patient with stage IA non-small cell lung cancer in US. J Clin Orthod 2016;34:e20027.
- Kawaguchi T, Matsumura A, Iuchi K, et al. Second primary cancers in patients with stage III non-small cell lung cancer successfully treated with chemo-radiotherapy. Jpn J Clin Oncol 2006;36:7-11. [Crossref] [PubMed]
- Hu ZG, Li WX, Ruan YS, et al. Incidence trends and risk prediction nomogram of metachronous second primary lung cancer in lung cancer survivors. PLoS One 2018;13:e0209002. [Crossref] [PubMed]
- Duchateau CS, Stokkel MP. Second primary tumors involving non-small cell lung cancer: prevalence and its influence on survival. Chest 2005;127:1152-8. [PubMed]
- National Cancer Institute. Surveillance, Epidemiology, and End Results Program. SEER*Stat software version 8.3.5, 2018. Available online: http://www.seer.cancer.gov/seerstat. Accessed May 08, 2018.
- National Cancer Institute. Surveillance, Epidemiology, and End Results (SEER) Program () SEER*Stat Database: Incidence - SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2016 Sub (1973-2014 varying) - Linked To County Attributes - Total U.S., 1969-2015 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2017, based on the November 2016 submission. Available online: www.seer.cancer.gov. Accessed May 08, 2018.
- Zhou H, Zhang Y, Qiu Z, et al. Nomogram to Predict Cause-Specific Mortality in Patients With Surgically Resected Stage I Non-Small-Cell Lung Cancer: A Competing Risk Analysis. Clin Lung Cancer 2018;19:e195-203. [Crossref] [PubMed]
- Aizer AA, Chen MH, McCarthy EP, et al. Marital status and survival in patients with cancer. J Clin Oncol 2013;31:3869-76. [Crossref] [PubMed]
- Warren S, Gates O. Multiple primary malignant tumors. A survey of the literature and a statistical study. Am J Cancer 1932;16:1358-414.
- Martini N, Melamed MR. Multiple primary lung cancers. J Thorac Cardiovasc Surg 1975;70:606-12. [PubMed]
- Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 1999;94:496-509. [Crossref]
- Kuk D, Varadhan R. Model selection in competing risks regression. Stat Med 2013;32:3077-88. [Crossref] [PubMed]
- She Y, Zhao L, Dai C, et al. Development and validation of a nomogram to estimate the pretest probability of cancer in Chinese patients with solid solitary pulmonary nodules: A multi-institutional study. J Surg Oncol 2017;116:756-62. [Crossref] [PubMed]
- Han DS, Suh YS, Kong SH, et al. Nomogram predicting long-term survival after d2 gastrectomy for gastric cancer. J Clin Oncol 2012;30:3834-40. [Crossref] [PubMed]
- Valentini V, van Stiphout RG, Lammering G, et al. Nomograms for predicting local recurrence, distant metastases, and overall survival for patients with locally advanced rectal cancer on the basis of European randomized clinical trials. J Clin Oncol 2011;29:3163-72. [Crossref] [PubMed]
- Liang W, Zhang L, Jiang G, et al. Development and validation of a nomogram for predicting survival in patients with resected non-small-cell lung cancer. J Clin Oncol 2015;33:861-9. [Crossref] [PubMed]
- Lobo JM, Jiménez-Valverde A. others. AUC: a misleading measure of the performance of predictive distribution models. Glob Ecol Biogeogr 2008;17:145-51. [Crossref]
- Cook NR, Ridker PM. Advances in measuring the effect of individual predictors of cardiovascular risk: the role of reclassification measures. Ann Intern Med 2009;150:795-802. [Crossref] [PubMed]
- Chatterjee N, Shi J, García-Closas M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet 2016;17:392-406. [Crossref] [PubMed]
- Gray RJ. A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 1988;16:1141-54. [Crossref]
- Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006;26:565-74. [Crossref] [PubMed]
- Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. JAMA 2015;313:409-10. [Crossref] [PubMed]
- Han SS, Rivera GA, Tammemägi MC, et al. Risk Stratification for Second Primary Lung Cancer. J Clin Oncol 2017;35:2893-9. [Crossref] [PubMed]
- Andersen PK, Geskus RB, de Witte T, et al. Competing risks in epidemiology: possibilities and pitfalls. Int J Epidemiol 2012;41:861-70. [Crossref] [PubMed]
- Grundmann RT, Meyer F. Second primary malignancy among cancer survivors - epidemiology, prognosis and clinical relevance. Zentralbl Chir 2012;137:565-74. [PubMed]
- Griffiths AJ, Gelbart WM, Miller JH, et al. Mutation and cancer. New York: W. H. Freeman, 2000.
- Sasco AJ, Secretan MB, Straif K. Tobacco smoking and cancer: a brief review of recent epidemiological evidence. Lung Cancer 2004;45 Suppl 2:S3-9. [Crossref] [PubMed]
- Turati F, Edefonti V, Bosetti C, et al. Family history of cancer and the risk of cancer: a network of case–control studies. Ann Oncol 2013;24:2651-6. [Crossref] [PubMed]
- Liang F, Zhang S, Xue H, et al. Risk of second primary cancers in cancer patients treated with cisplatin: a systematic review and meta-analysis of randomized studies. BMC Cancer 2017;17:871. [Crossref] [PubMed]
- Grantzau T, Mellemkjær L, Overgaard J. Second primary cancers after adjuvant radiotherapy in early breast cancer patients: a national population based study under the Danish Breast Cancer Cooperative Group (DBCG). Radiother Oncol 2013;106:42-9. [Crossref] [PubMed]