Query for gene upregulation in cBioPortal

Query for gene upregulation in cBioPortal

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

there! Could anyone help me with some biostatistical problems using cBioPortal.

We are looking for cell lines with upregulation of certain genes on cBioPortal. My supervisor is teaching me to use this website because I have no prior experience. She is using EXP >= 0.5 to define upregulation. Although I am of no bioinformatics background, my opinion is that 0.5SD away from the average is not strong enough to define an upregulation. I think it better to use EXP >= 2 or 1.5. I have discussed this with my supervisor but she insists using EXP >= 0.5.

Does anyone have any experience with this issue since I am still not convinced by what I was told. Thanks a lot!

Well, I have worked with expression patterns. Unfortunately there is no clear cut or magical numbers for cutoffs . I agree with you that 0.5 is not stringent. However, to explore data usually one can parse it using lower cutoffs and later on you can rise the bar to see what happens.

Integration and Analysis of CPTAC Proteomics Data in the Context of Cancer Genomics in the cBioPortal

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced extensive mass spectrometry-based proteomics data for selected breast, colon, and ovarian tumors from The Cancer Genome Atlas (TCGA). We have incorporated the CPTAC proteomics data into the cBioPortal to support easy exploration and integrative analysis of these proteomic datasets in the context of the clinical and genomics data from the same tumors. cBioPortal is an open source platform for exploring, visualizing, and analyzing multidimensional cancer genomics and clinical data. The public instance of the cBioPortal ( hosts more than 200 cancer genomics studies, including all of the data from TCGA. Its biologist-friendly interface provides many rich analysis features, including a graphical summary of gene-level data across multiple platforms, correlation analysis between genes or other data types, survival analysis, and per-patient data visualization. Here, we present the integration of the CPTAC mass spectrometry-based proteomics data into the cBioPortal, consisting of 77 breast, 95 colorectal, and 174 ovarian tumors that already have been profiled by TCGA for mutations, copy number alterations, gene expression, and DNA methylation. As a result, the CPTAC data can now be easily explored and analyzed in the cBioPortal in the context of clinical and genomics data. By integrating CPTAC data into cBioPortal, limitations of TCGA proteomics array data can be overcome while also providing a user-friendly web interface, a web API, and an R client to query the mass spectrometry data together with genomic, epigenomic, and clinical data.

Keywords: Cancer Biology Cancer Biomarker(s) Mass Spectrometry Phosphoproteome Proteogenomics.

Identification of the potential therapeutic target gene UBE2C in human hepatocellular carcinoma: An investigation based on GEO and TCGA databases

Copyright: © Wei et al. This is an open access article distributed under the terms of Creative Commons Attribution License.

This article is mentioned in:



Hepatocellular carcinoma (HCC) is the main pathological type of liver cancer. The incidence of HCC in developed countries has significantly increased in the past few decades (1). Due to its complexity, heterogeneity and high recurrence following surgical resection, HCC ranks the second or third major cause of cancer-associated mortalities in the world (2,3). The fact that the diagnosis of HCC primarily depends on the serologic alterations that occur in advanced HCC restricts the therapeutic options for HCC (4). Thus, the identification of specific molecular targets for early diagnosis and timely treatment for HCC is imperative (5).

With the extensive application of gene chip technologies, abundant expression profile information and screening of differentially expressed genes (DEGs), biomarkers in tumor tissues could be detected efficiently by integrating publicly available datasets (3,6,7). Several studies based on the integrated analysis of microarray data have provided valuable insights to the underlying molecular mechanisms of diseases (7–10). For instance, Gan et al (10) reported that microRNA (miR)-145-5p was associated with lymph node metastasis in non-small cell lung cancer through revealing that the expression levels of miR-145-5p were significantly lower in these tissues compared with those in healthy tissues following a meta-analysis and integrated analysis of microarray data. Guo et al (7) performed an integrated bioinformatics analysis of four colorectal cancer (CRC) expression profiles in Gene Expression Omnibus (GEO) and identified 31 key candidate genes in CRC. Similarly, Liang et al (3) and Zhang et al (11) analyzed a publicly available data of HCC in GEO and The Cancer Genome Atlas (TCGA) database, and explored the diagnostic role of miR-133a-3p and miR-224-5p in HCC, respectively. These findings provide novel and valuable directions for further research into various cancer types, including HCC. It was proposed that a thorough re-analysis using integrated bioinformatics methods combined with the newest datasets would be innovative, and may provide novel additional insights into the underlying molecular mechanisms of HCC.

The GEO database is a public database of microarray profile founded by the National Center for Biotechnology Information and provides access to high-throughput screening of abnormally expressed genes in cancerous tissues (7). Numerous studies have explored microarray data profiling (8,9). However, although reliable molecular biomarkers have been clinically observed in patients with HCC, the underlying molecular mechanisms of HCC remain to be clarified.

The present study identified a number of DEGs in HCC tissues by integrated analysis of the two newest datasets (GSE76427 and GSE84402) from the GEO database and HCC samples in TCGA (12,13). Functional enrichment was conducted on key candidate genes with the DEGs threshold of | log 2 (fold change) |≥1.5 in the GEO data or fold change ≥10 in the TCGA data and P≤0.001. GSE14520 and GSE3500 (14,15), and the TCGA dataset with expression profiles of cancer and adjacent non-cancerous tissues were used for validation of candidate genes. In addition, overall survival (OS) analysis of candidate genes was performed to analyze the potential of using such genes as prognostic biomarkers of HCC. The present study may provide novel insights into the molecular mechanism of HCC and may serve as a reference for clinical studies in HCC.

Materials and methods

Microarray data information

The gene expression profiles of GSE76427, GSE84402, GSE14520 and GSE3500 were downloaded from the GEO database (, and gene expression data for HCC and adjacent non-cancerous tissue were obtained. The newest datasets available (GSE76427 and GSE84402) were based on the platforms GPL10558 (Illumina HumanHT-12 v4.0 Expression BeadChip Illumina, Inc., San Diego, CA, USA) and GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array Affymetrix Thermo Fisher Scientific, Inc., Waltham, MA, USA), respectively. GSE76427 data included 115 HCC primary tumor tissues and 52 adjacent non-tumor tissues (submission date, 30th December 2015) (12). GSE84402 included 14 HCC tissues and 14 adjacent non-tumor tissues (submission date, 14th July 2016) (13). GSE14520 was based on the GPL571 (Affymetrix Human Genome U133A 2.0 Array Affymetrix Thermo Fisher Scientific, Inc.) and GPL3921 platforms (Affymetrix HT Human Genome U133A Array Affymetrix Thermo Fisher Scientific, Inc.). GSE14520 included 66 tumor and paired non-tumor samples (submission date, 22nd January 2009) (16). GSE3500 data was based on 13 platforms (GPL2648, GPL2649, GPL2831, GPL2868, GPL2906, GPL2935, GPL2938, GPL2948, GPL3007, GPL3008, GPL3009, GPL3010 and GPL3011), and included 102 primary HCC tissues (from 82 patients), 74 non-tumor tissues (from 72 patients), seven benign tumor tissues (three adenoma and four focal nodular hyperplasia), 10 metastatic cancer tissues, and 10 HCC cell lines (14).

Another expression profile of liver cancer was obtained from the TCGA database by Cancer RNA-seq Nexus (CRN) online ( using University of California, Santa Cruz Refseq Gene Array, which consisted of 84 HCC tissues and 42 adjacent non-tumor tissues. GSE76427, GSE84402 and TCGA data were used for the identification of DEGs. The GSE14520 and GSE3500 datasets and TCGA data were used for the validation of expression profiles. The details and patient information of the datasets in GEO and TCGA are listed in Table I.

Table I.

Detailed information of the datasets in GEO and TCGA patients.

Table I.

Detailed information of the datasets in GEO and TCGA patients.

GEO ID GSE14520 GSE3500 GSE76427 GSE84402 TCGA
Total no. of patients 22 82 Not mentioned 14 371
Total no. of samples 132 193 167 14 373
No. of non-tumor samples 66 (paired non-tumor samples) 74 (non-tumor samples) 52 (adjacent non-tumor tissues) 14 (>5 cm laterally from the edge of the cancerous) region 42 (healthy tissues)
No. of primary HCC samples 66 (tumor samples) 102 (tumor samples from HBV+ patients) 115 (primary tumor tissues) 14 (percentage of tumor cells >70%) 84 (tumor samples)
Tumor types Hepatocellular carcinoma Hepatocellular carcinoma Hepatocellular carcinoma Hepatocellular carcinoma Hepatocellular carcinoma
Grading of tumors (TNM stage) No subdivision No subdivision No subdivision No subdivision Stage II
Pathological grade No subdivision No subdivision No subdivision No subdivision No subdivision
Comment N.A. N.A. Percentage of HCC patients with HBV infection and cirrhosis were 46% and 54%. N.A. Using CRN (Cancer RNA-Seq Nexus tool) for analysis.

[i] GEO, Gene Expression Omnibus TCGA, The Cancer Genome Atlas HCC, hepatocellular carcinoma TNM, tumor node metastasis HBV, hepatitis B virus.

Identification of overexpressed DEGs in HCC tissues

The raw data from the downloaded datasets GSE76427 and GSE84402 as well as TCGA data were analyzed following integrated transformation and correlation analysis using Funrich (version 3.3 and Morpheus software ( Data processing was conducted by two professional analysts. DEGs between human HCC tissues and paired non-tumor or healthy tissues were defined using the Student's t-test with a cut-off criterion of P<0.001 and log 2 (fold change) |≥1.5 in GEO data or fold change |≥10 in TCGA data. Upon overlapping with Funrich software, candidate genes with highly overexpressed levels in HCC tissues were identified in the three datasets.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of candidates

Functional enrichment analysis of GO and KEGG pathways was parsed using the online tool Database for Annotation Visualization and Integrated Discovery (, which contains integrated gene visualization. Funrich software was also utilized to identify functional enrichment under a threshold of P<0.05.

Construction of protein-protein interaction (PPI) network and heatmap analysis

DEGs-encoded proteins and PPI network information data were obtained using the Search Tool for the Retrieval of Interacting Genes (STRING) database ( Cytoscape software (version 3.7.0 was used for visualization of the interactions among the candidate DEGs. Heatmap hot spots with gradient from red to blue color, which were generated from the TCGA dataset, were used to show the relative expression level of key candidate genes in HCC.

Validation of the aberrant expression of UBE2C in HCC based on GEO and TCGA datasets

The expression profiles of candidate genes in DEGs were validated using another two GEO data (GSE14520 and GSE3500) with the online tool Oncomine (, which functions as a profiler for intuitive expression of the GEO database. An independent sample Student's t-test was used to compare the DEG levels between HCC and non-cancerous tissues. The online tool Cancer Cell Line Encyclopedia ( and Firebrowse ( were employed to evaluate the relative transcription level of candidate genes among human malignancies and different human HCC cell lines according to the data obtained from the GEO and TCGA databases.

Survival association of candidate genes with clinicopathological parameters of patients with HCC

Survival analysis of patients with high expression of candidate DEGs in TCGA HCC data was performed using Kaplan-Meier estimator survival curves with several clinicopathological characteristics taken into consideration, including sex and pathological stage. Data was obtained from cBioPortal ( and Kaplan-Meier Plotter ( was used for analysis. A threshold of P<0.05 was used to set the cut-off criterion.


Identification of DEGs and overexpressed candidate genes in human HCC tissues

To date, the GEO database contains the most comprehensive public microarray/gene data resources, while TCGA processes the largest quantity of cancer genes. The gene expression profile of HCC and adjacent non-cancerous tissues was downloaded from GEO (GSE76427 and GSE84402) and TCGA. Under the inclusion criteria, a total of 1,650 and 1,960 DEGs were identified from GSE76427, GSE84402 and TCGA data, respectively. Using the FunRich software, five upregulated genes were identified [ubiquitin-conjugating enzyme 2C (UBE2C), topoisomerase II α (TOP2A), pituitary tumor transforming gene 1 (PTTG1), glypican-3 (GPC3) and polycomb-repressive complex 1 (PRC1)], which were overlapped in three groups and were considered as candidate genes for selection of HCC biomarkers (Fig. 1A). The OncoPrint created from 373 HCC tissues in the TCGA database using available data from the cBioPortal website indicated that 27% (100/373) of clinical cases exhibited gene upregulation in patients with HCC (Fig. 1B). A heatmap of these five DEGs in the TCGA data was visualized using STRING and Morpheus software (Fig. 1C), which partly revealed the significant discrepancy between human HCC and healthy tissues.

Figure 1.

DEGs expression profile in the GEO and TCGA datasets. (A) Venn chart of significantly overexpressed genes in the GSE76427, GSE88402 and TCGA datasets. Five significantly overexpressed genes were overlaped in three datasets, including ubiquitin-conjugating enzyme 2C, topoisomerase II α, pituitary tumor transforming gene 1, glypican-3 and polycomb-repressive complex 1. (B) Genetic alteration of these five genes in patients with in TCGA datasets. Pink samples represent upregulation. CBioPortal was used to obtain the transcription level by messenger RNA sequencing analysis of 373 samples in 371 patients with hepatocellular carcinoma. TCGA OncoPrint portal was used. Z-score±2.0 was used as a threshold for aberrant expression in TCGA RNA-Seq V2 data. (C) Heatmap of DEGs in TCGA datasets. Hot spots ranged from red (high expression) to blue (low expression). Search Tool for the Retrieval of Interacting Genes was used to obtain TCGA data, while Morpheus was used to process and visulize data. DEGs were defined with P<0.001 and |log2 (fold change) |≥1.5 in GEO data or fold Change >10 in TCGA data. GEO, Gene Expression Omnibus TCGA, The Cancer Genome Atlas DEGs, differentially expressed genes.

GO and KEGG pathway enrichment analysis for DEGs

The five overlapped upregulated candidated genes were subjected to functional enrichment analysis via GO. It was observed that they were primarily enriched in: i) Cellular components, including nucleoplasm, nuclear chromosome and kinetochore and DNA topoisomerase complex (ATP-hydrolyzing) (Fig. 2A) and ii) biological processes, including cell communication, signal transduction and cell cycle (Fig. 2B). Candidate genes were associated with various functions, including apoptosis regulation, stabilization of p53 and CDK-mediated phosphorylation, and the removal of cell division cycle 6 (Fig. 2C). These results suggested that these five candidate genes serve important biological roles during HCC development.

Figure 2.

Functional enrichment analysis of DEGs. Charts of DEGs enriched for (A) cellular components, (B) biological processes and (C) molecular functions in Gene Ontology analysis using the online tool Database for Annotation Visualization and Integrated Discovery and Funrich software. P<0.05 was considered to indicate a statistically significant difference. DEGs, differentially expressed genes.

KEGG pathway enrichment analysis revealed that PTTG1 was enriched in two pathways, including ‘Oocyte meiosis’ and ‘Cell cycle’ (P<0.05) (data not shown).

Construction of PPI network

Five DEGs and their interactions were included in a PPI network, in which there were 105 nodes (proteins) and 1,036 edges (interactions) (Fig. 3A). In the integrated PPI network, a module consisting of UBE2C and PTTG1, was identified, which included 11 nodes and 50 edges (Fig. 3B). Based on the expression levels of these five genes, it was observed that UBE2C exhibited relatively higher expression levels compared with other genes in HCC tissues from TCGA. Thus, UBE2C was considered a candidate biomarker for HCC.

Figure 3.

Protein-protein interaction network for selected DEGs. (A) Integrated network for the five candidated genes (UBE2C, topoisomerase II α, pituitary tumor transforming gene 1, glypican-3 and polycomb-repressive complex 1) and their interactions using Search Tool for the Retrieval of Interacting Genes. (B) The module including UBE2C. Nodes indicate proteins and edges represented as lines between DEGs indicate interactions. Red nodes indicate the DEGs discussed in the present manuscript and green nodes indicate other genes. DEGs, differentially expressed genes UBE2C, ubiquitin-conjugating enzyme 2C.

Validation of UBE2C overexpression in HCC tissues and cell lines

To verify the overexpression of UBE2C in HCC tissues, the GSE14520 and GSE3500 data were downloaded, and the expression profile of UBE2C was analyzed. As expected, the significant overexpression of UBE2C in HCC tissues compared with adjacent non-cancerous tissues was validated in the datasets GSE14520 and GSE3500 (Fig. 4A and B), as well as in other various human malignant tumors, compared with healthy or adjacent non-cancerous tissues in data from the GEO and TCGA databases (Fig. 4C and D). High expression levels of UBE2C in different human liver cell lines was further demonstrated (Fig. 4E). These results revealed that UBE2C serves important roles in the development of HCC and other cancer types, and may function as an oncogene in tumorigenesis.

Figure 4.

Validation of UBE2C overexpression in HCC tissues, cell lines and other cancer tissues in data from GEO and TCGA databases. Validation of UBE2C overexpression in (A) GSE14520 and (B) GSE3500 data. Comparisons between HCC tissues (T) and adjacent non-cancerous tissues (N) were acquired using Oncomine. UBE2C exhibited singnificantly higher expression according to the Student's t-test (P=1.17×10−65 and P=1.06×10−25). (C) UBE2C is overexpressed in numerous human malignant tumors in the TCGA database according to FireBrowse. (D) UBE2C is overexpressed in multiple human malignant tumors in the GEO database uisng CCLE. (E) A relatively high transcriptional level of UBE2C was obseved in liver cancer cell lines. Data are obtained through analysis of the GEO database using the CCLE website. GEO, Gene Expression Omnibus TCGA, The Cancer Genome Atlas HCC, hepatocellular carcinoma UBE2C, ubiquitin-conjugating enzyme 2C CCLE, Cancer Cell Line Encyclopedia.

Kaplan-Meier analysis of DEGs and UBE2C in HCC tissues

Kaplan-Meier analysis was conducted for five DEGs (UBE2C, TOP2A, PTTG1, GPC3 and PRC1). Survival analysis indicted that patients with overexpression of these five DEGs exhibited significantly significantly shorter survival times (P=0.0127) compared with patients with relatively low expression levels (Fig. 5A). Analysis of UBE2C in cohorts with different clinical characteristics revealed that patients with higher UBE2C expression exhibited short survival times [hazard ratio (HR)=1.44 confidence interval (CI)=1.02–2.04 log-rank P=0.037] (Fig. 5B), and patients in stage III HCC with high UBE2C expression had relatively short survival times compared with patients in stage III HCC with low UBE2C expression (HR=1.84 CI=1.01–3.34 log-rank P=0.043) (Fig. 5C-F). Stage IV was ignored due to low sample size (n<10). Overexpression of UBE2C may serve as a novel indicator of reduced survival time in patients with diagnosed HCC. Thus, UBE2C may be used as a prognostic biomarker for HCC.

Figure 5.

Kaplan-Meier estimator analysis of candidate genes in patients with HCC. (A) Kaplan-Meier plots patients with HCC with five differentially expressed genes (UBE2C, topoisomerase II α, pituitary tumor transforming gene 1, glypican-3 and polycomb-repressive complex 1) overexpression with data recruited from cBioPortal (P=0.0127). The blue line represents HCC patients with relatively low UBE2C, TOP2A, PTTG1, GPC3 and PRC1 mRNA expression, and the red line represents HCC patients with high expression of the aforementioned genes. Log-rank analysis revealed that patients with HCC and high mRNA expression exhibited significant short survival times. (B-G) Kaplan-Meier plots of different clinicopathological features associated with UBE2C overexpression. The black line represents UBE2C low expression and the red line represents UBE2C overexpression. The X and Y axes indicate survival rate and overall survival time (days), respectively. (B) All patients. (C) Patients in clinical stage I. (D) Patients in clinical stage II. (E) Patients in clinical stage III. (F) Male and (G) female patients with HCC. Log-rank analysis revealed that patients with HCC in clinical stage III with high UBE2C mRNA expression exhibited significantly lower survival times compared with patients with low UBE2C mRNA expression. Log rank P-value and 95% confidence intervals of the hazard ratio were used for statistical analysis. HCC, hepatocellular carcinoma mRNA, messenger RNA UBE2C, ubiquitin-conjugating enzyme 2C.


The overall survival of patients with HCC is markedly short, and morbidity is increasing in developed and developing contries. In USA, 22,000 newly diagnosed cases occurred annually, and 18,000 patients succumbed to HCC (4). Approximately 1,000,000 patients in the world are diagnosed with HCC each year (17). Numerous studies and clinical trails had attempted to uncover the molecular mechanisms of HCC (18–20). The present study integrated and thoroughly re-analyzed three datasets of HCC. UBE2C and four other overexpressed DEGs (TOP2A, PTTG1, GPC3 and PRC1) were identified to be associated with HCC. Enrichment analysis revealed that these genes were important for HCC development via different signaling pahtways. Expression validation analysis using GEO and TCGA data demonstrated that these genes were associated with HCC pathogenesis, and survival analysis revaled that the overexpression of these five genes, particularly UBE2C, was important for survival of patients with HCC.

Formerly known as UBCH10, UBE2C is essential for ubiquitination and inactivation of protein activity (21). Normal expression of UBE2C guarantees normal physiology functions, such as cell cycle progression and programmed cell death (22,23). However, high expression levels of UBE2C frequently leads to the destruction of essential proteins, thus disrupting the function of mitotic cyclins, spindle checkpoint control and euploidy status of cells (24,25). Numerous studies have identified the overexpression of UBE2C in several types of cancer, such as human cervical carcinoma, bladder cancer and breast cancer (21,26,27). It has been previously reported that UBE2C demonstrated potential in becoming a promising cancer biomarker (28). Inhibition of UBE2C, by contrast, suppressed tumor cell proliferation, inhibited tumorigenesis and sensitized cancer cells to radiation (21,27,28). Zhang et al (29) reported that the inhibtion of oncogenic miR-17/20a suppressed gastic cancer cell proflieration by downregulating UBE2C. These studies, as well as the finding that UBE2C was upregualted in HCC tiusses compared with controls in the present study, suggest that the overexpression of UBE2C may be associated with the tumorigenesis of HCC.

A previous study revealed the molecular structure of anaphase-promoting complex/cyclosome (APC/C)-coactivator complexes under cryo-electron microscopy (30). The union of APC/C complexes with UBE2C promotes ubiquitination, whereas early mitotic inhibitor-1 (Emi1) constrains this effect. Deubiquitination enzymes are activated as soon as UBE2C is separated from APC/C, but the process is slow, with unclear molecular patterns (31). In the present study, GO analysis revealed that the upregulated UBE2C, TOP2A, PTTG1, GPC3 and PRC1 genes were primarily involved in cell cycle, cell communication and protein metabolism biological processes. Paclitaxel (PTX) is a widely used microtubule-poisoning drug in anti-neoplastic strategies, which triggers cell death by activating the spindle assembly checkpoint (SAC) (32). Cancer cells senstive to PTX do not achieve SAC, and undergo mitosis by degradation of APC/C and cyclin B, which is called mitotic catastrophe, thus leading to mitotic slippage or cell apoptosis (33). High expression of UBE2C has been demonstrated to override SAC (24). Since opposite characteristics are observed for UBE2C and Emi1 when interacting with APC/C, the exact role of Emi, as well as its expression levels and significance in PTX-treated HCC, require further study.

In summary, the present study confirmed that the high-expression level of UBE2C as well as other four genes (TOP2A, PTTG1, GPC3 and PRC1) was associated with poor overall survival of patients with HCC. Overexpression of UBE2C may serve as a novel indicator of short survival in patients with HCC. Furher studies should foucus on the potential use of UBE2C as a poor prognostic factor for HCC. Whether UBE2C overexpression serves a specific role in the growth of HCC, its inhibtion may block the growth of HCC in vivo or in vitro .



The present study was supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (grant no. KYCX18_1462), National Natural Science Foundation of China (grant no. 81700392), Nanjing Health Youth Talent Training Project in 13th Five-Year (grant no. QRX17113) and Natural Science Foundation of Jiangsu Province Youth Fund Project (grant no. BK20150106).

Availability of data and materials

The datasets analyzed in the present study are all available on NCBI GEO (

Authors' contributions

ZW, YL, XS and BX designed the study. ZW and SQ wrote the manuscript. XL, QL, JH, JZ, ZHW, SQ and AS performed the bioinformatics analysis. All authors read and approved the final manuscript.

Ethics approval and consent to participate

Patient consent for publication

Competing interests

The authors declare that they have no competing interests.


Neumann H, Longo D, Fauci A, Kasper D, Hauser S, Jameson J and Loscalzo J: Harrison's Principles of Internal Medicine. 2011.

Jemal A, Bray F, Center MM, Ferlay J, Ward E and Forman D: Global cancer statistics. CA Cancer J Clin. 61:69–90. 2011. View Article : Google Scholar : PubMed/NCBI

Liang HW, Yang X, Wen DY, Gao L, Zhang XY, Ye ZH, Luo J, Li ZY, He Y, Pang YY and Chen G: Utility of miR-133a-3p as a diagnostic indicator for hepatocellular carcinoma: An investigation combined with GEO, TCGA, meta-analysis and bioinformatics. Mol Med Rep. 17:14692018.PubMed/NCBI

Yang N, Ekanem NR, Sakyi CA and Ray SD: Hepatocellular carcinoma and microRNA: New perspectives on therapeutics and diagnostics. Adv Drug Deliv Rev. 81:62–74. 2015. View Article : Google Scholar : PubMed/NCBI

Kulasingam V and Diamandis EP: Strategies for discovering novel cancer biomarkers through utilization of emerging technologies. Nat Clin Pract Oncol. 5:588–599. 2008. View Article : Google Scholar : PubMed/NCBI

Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr and Kinzler KW: Cancer genome landscapes. Science. 339:1546–1558. 2013. View Article : Google Scholar : PubMed/NCBI

Guo Y, Bao Y, Ma M and Yang W: Identification of key candidate genes and pathways in colorectal cancer by integrated bioinformatical analysis. Int J Mol Sci. 18:E7222017. View Article : Google Scholar : PubMed/NCBI

Liang L, Gao L, Zou XP, Huang ML, Chen G, Li JJ and Cai XY: Diagnostic significance and potential function of miR-338-5p in hepatocellular carcinoma: A bioinformatics study with microarray and RNA sequencing data. Mol Med Rep. 17:2297–2312. 2018.PubMed/NCBI

Zeng T, Wang D, Chen J, Chen K, Yu G, Chen Q, Liu Y, Yan S, Zhu L, Zhou H, et al: AF119895 regulates NXF3 expression to promote migration and invasion of hepatocellular carcinoma through an interaction with miR-6508-3p. Exp Cell Res. 363:129–139. 2018. View Article : Google Scholar : PubMed/NCBI

Gan TQ, Xie ZC, Tang RX, Zhang TT, Li DY, Li ZY and Chen G: Clinical value of miR-145-5p in NSCLC and potential molecular mechanism exploration: A retrospective study based on GEO, qRT-PCR, and TCGA data. Tumour Biol. 39:10104283176916832017. View Article : Google Scholar : PubMed/NCBI

Zhang L, Huang L, Liang H, Zhang R, Chen G, Pang Y and Feng Z: Clinical value and potential targets of miR-224-5p in hepatocellular carcinoma validated by a TCGA- and GEO- based study. Int J Clin Exp Pathol. 10:9970–9989. 2017.

Grinchuk OV, Yenamandra SP, Iyer R, Singh M, Lee HK, Lim KH, Chow PK and Kuznetsov VA: Tumor-adjacent tissue co-expression profile analysis reveals pro-oncogenic ribosomal gene signature for prognosis of resectable hepatocellular carcinoma. Mol Oncol. 12:89–113. 2018. View Article : Google Scholar : PubMed/NCBI

Wang H, Huo X, Yang XR, He J, Cheng L, Wang N, Deng X, Jin H, Wang N, Wang C, et al: STAT3-mediated upregulation of lncRNA HOXD-AS1 as a ceRNA facilitates liver cancer metastasis by regulating SOX4. Mol Cancer. 16:1362017. View Article : Google Scholar : PubMed/NCBI

Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai KM, Ji J, Dudoit S, Ng IO, et al: Gene expression patterns in human liver cancers. Mol Biol Cell. 13:1929–1939. 2002. View Article : Google Scholar : PubMed/NCBI

Roessler S, Long EL, Budhu A, Chen Y, Zhao X, Ji J, Walker R, Jia HL, Ye QH, Qin LX, et al: Integrative genomic identification of genes on 8p associated with hepatocellular carcinoma progression and patient survival. Gastroenterology. 142:957–966. 2012. View Article : Google Scholar : PubMed/NCBI

Roessler S, Jia HL, Budhu A, Forgues M, Ye QH, Lee JS, Thorgeirsson SS, Sun Z, Tang ZY, Qin LX and Wang XW: A unique metastasis gene signature enables prediction of tumor relapse in early-stage hepatocellular carcinoma patients. Cancer Res. 70:10202–10212. 2010. View Article : Google Scholar : PubMed/NCBI

Dhanasekaran R, Limaye A and Cabrera R: Hepatocellular carcinoma: Current trends in worldwide epidemiology, risk factors, diagnosis, and therapeutics. Hepat Med. 4:19–37. 2012.PubMed/NCBI

Guo X, Wang Z, Zhang J, Xu Q, Hou G, Yang Y, Dong C, Liu G, Liang C, Liu L, et al: Upregulated KPNA2 promotes hepatocellular carcinoma progression and indicates prognostic significance across human cancer types. Acta Biochim Biophys Sin (Shanghai). 51:285–292. 2019. View Article : Google Scholar : PubMed/NCBI

Tong H, Liu X, Li T, Qiu W, Peng C, Shen B and Zhu Z: INTS8 accelerates the epithelial-to-mesenchymal transition in hepatocellular carcinoma by upregulating the TGF-beta signaling pathway. Cancer Manag Res. 11:1869–1879. 2019. View Article : Google Scholar : PubMed/NCBI

Aziz K, Limzerwala JF, Sturmlechner I, Hurley E, Zhang C, Jeganathan KB, Nelson G, Bronk S, Velasco RF, van Deursen EJ, et al: Ccne1 overexpression causes chromosome instability in liver cells and liver tumor development in Mice. Gastroenterology. 2019. View Article : Google Scholar

Bose MV, Gopisetty G, Selvaluxmy G and Rajkumar T: Dominant negative Ubiquitin-conjugating enzyme E2C sensitizes cervical cancer cells to radiation. Int J Radiat Biol. 88:629–634. 2012. View Article : Google Scholar : PubMed/NCBI

Liu YC: Ubiquitin ligases and the immune response. Annu Rev Immunol. 22:81–127. 2004. View Article : Google Scholar : PubMed/NCBI

Williamson A, Wickliffe KE, Mellone BG, Song L, Karpen GH and Rape M: Identification of a Physiological E2 module for the human anaphase-promoting complex. Proc Natl Acad USA. 106:18213–11821. 2009. View Article : Google Scholar

Reddy SK, Rape M, Margansky WA and Kirschner MW: Ubiquitination by the anaphase-promoting complex drives spindle checkpoint inactivation. Nature. 446:921–926. 2007. View Article : Google Scholar : PubMed/NCBI

van Ree JH, Jeganathan KB, Malureanu L and van Deursen JM: Overexpression of the E2 ubiquitin-conjugating enzyme UbcH10 causes chromosome missegregation and tumor formation. J Cell Biol. 188:83. 2010. View Article : Google Scholar : PubMed/NCBI

Morikawa T, Kawai T, Abe H, Kume H, Homma Y and Fukayama M: UBE2C is a marker of unfavorable prognosis in bladder cancer after radical cystectomy. Int J Clin Exp Pathol. 6:1367–1374. 2013.PubMed/NCBI

Rawat A, Gopal G, Selvaluxmy G and Rajkumar T: Inhibition of ubiquitin conjugating enzyme UBE2C reduces proliferation and sensitizes breast cancer cells to radiation, doxorubicin, tamoxifen and letrozole. Cell Oncol (Dordr). 36:459–467. 2013. View Article : Google Scholar : PubMed/NCBI

Hao Z, Zhang H and Cowell J: Ubiquitin-conjugating enzyme UBE2C: Molecular biology, role in tumorigenesis, and potential as a biomarker. Tumor Biol. 33:723–730. 2012. View Article : Google Scholar

Zhang Y, Han T, Wei G and Wang Y: Inhibition of microRNA-17/20a suppresses cell proliferation in gastric cancer by modulating UBE2C expression. Oncol Rep. 33:2529–2536. 2015. View Article : Google Scholar : PubMed/NCBI

Chang L, Zhang Z, Yang J, McLaughlin SH and Barford D: Atomic structure of the APC/C and its mechanism of protein ubiquitination. Nature. 522:450–454. 2015. View Article : Google Scholar : PubMed/NCBI

Xie C, Powell C, Yao M, Wu J and Dong Q: Molecules in focus: Ubiquitin-conjugating enzyme E2C: A potential cancer biomarker. Int J Biochem Cell Biol. 47:113–118. 2014. View Article : Google Scholar : PubMed/NCBI

Rowinsky EK and Donehower RC: Paclitaxel (taxol). N Engl J Med. 332:1004–1014. 1995. View Article : Google Scholar : PubMed/NCBI

Gascoigne KE and Taylor SS: Article: Cancer cells display profound intra- and interline variation following prolonged exposure to antimitotic drugs. Cancer Cell. 14:111–123. 2008. View Article : Google Scholar : PubMed/NCBI

Query for gene upregulation in cBioPortal - Biology

Количество зарегистрированных учащихся: 25 тыс.

Участвовать бесплатно

This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of. We’ll investigate the different steps required to master Big Data analytics on real datasets, including Next Generation Sequencing data, in a healthcare and biological context, from preparing data for analysis to completing the analysis, interpreting the results, visualizing them, and sharing the results. Needless to say, when you master these high-demand skills, you will be well positioned to apply for or move to positions in biomedical data analytics and bioinformatics. No matter what your skill levels are in biomedical or technical areas, you will gain highly valuable new or sharpened skills that will make you stand-out as a professional and want to dive even deeper in biomedical Big Data. It is my hope that this course will spark your interest in the vast possibilities offered by publicly available Big Data to better understand, prevent, and treat diseases.

Получаемые навыки

Bioinformatics, Data Clustering Algorithms, Big Data, R Programming


A very informative course . I learn so many things from this course and this course has very good coverage in data and its analysis. thank you so much for providing this course

This course is fully amazing as it combine biology and computational skills. If you are looking for an induction to bioinformatics, this is undoubtedly the course to take.

After this module, you will be able to 1. Locate and download files for data analysis involving genes and medicine. 2. Open files and preprocess data using R language. 3. Write R scripts to replace missing values, normalize data, discretize data, and sample data.

Download and print this article for your personal scholarly, research, and educational use.

Buy a single issue of Science for just $15 USD.

Science Signaling

Vol 6, Issue 269
02 April 2013

Article Tools

Please log in to add an alert for this article.

By Jianjiong Gao , Bülent Arman Aksoy , Ugur Dogrusoz , Gideon Dresdner , Benjamin Gross , S. Onur Sumer , Yichao Sun , Anders Jacobsen , Rileen Sinha , Erik Larsson , Ethan Cerami , Chris Sander , Nikolaus Schultz

Science Signaling 02 Apr 2013 : pl1

The cBioPortal enables integration, visualization, and analysis of multidimensional cancer genomic and clinical data.


Sample acquisition, pre-processing, and information

All sequencing data were obtained from The Cancer Genome Atlas (TCGA). Details on sample acquisition, DNA extraction and quality control, sequencing, and other aspects of data generation are described elsewhere [32]. In addition to 268 WGS tumor and normal pairs and matched RNA-seq data, adjacent normal RNA-seq samples from 114 BRCA, 57 COAD/READ, 55 PRAD, and 59 THCA cases were used as controls. For BRCA, COAD/READ, and THCA samples, all WGS data had high (> 30×) coverage. For PRAD, the number of high coverage genomes was only 20 thus, we also included additional one hundred low-coverage (6–8×) samples (some samples sequenced early in the project were done at low coverage). The power to detect SVs from the low coverage data is low however, we were still able to obtain a large subset of the SVs.

Characterizing genomic variants

For SVs, we used two algorithms, Meerkat [12] and BreakDancer [13]. For Meerkat, we required at least six discordant read pairs and/or split reads for high-coverage genomes and at least two discordant read pairs and one split read for low-coverage genomes. Variants detected in a tumor sample were filtered by the variants from all normal samples to remove germline events. When both breakpoints of an event fell into simple repeats or satellite repeats, the event was filtered out. A split read had to be aligned uniquely to the predicted breakpoint by BLAT, or the mate of the split read had to be mapped to a position adjacent to the predicted breakpoint. For BreakDancer, the SVs from each tumor sample were filtered by those from its matched normal. The called variants from Meerkat and BreakDancer were combined to increase detection sensitivity. RNA-seq reads were aligned to hg19 using MapSplice [46], and expression values were quantified using RSEM [41]. For SNVs, Mutation Annotation Format (MAF) files were downloaded from the Broad Institute TCGA Genome Data Analysis Center (

Chimeric mRNA detection from RNA-seq data

To find the chimeric mRNAs, the results from ChimeraScan (v0.4.5) [47] and defuse (v0.6.2) [48] were merged. ChimeraScan uses Bowtie to align paired-end reads to a merged genome-transcriptome reference deFuse clusters discordant paired-end alignments and predicts fusion boundary with split read analysis. Combining the results from two callers increased detection sensitivity for paired-end data. For single-end RNA-seq data, FusionMap (2015-3-31 version) was used [49]. This algorithm uses a dynamically created pseudo fusion transcript library to accurately map junction-spanning reads. All findings involving known or putative chimera were curated manually after visualizing the reads.

Computational screening of gene-gene UIB fusion-producing chimeric mRNA

When a gene-gene fusion and a gene-gene UIB (chimera-producing gene-intergenic) fusion produce the same chimeric mRNA, the gene-gene fusion was assumed to be the source of the chimeric mRNA. We screened for the split reads supporting the chimeric mRNAs fused at the predicted exon-exon junctions using an in-house computational method. Briefly, split reads encompassing the exon-exon junctions inferred from WGS-based fusion analysis were extracted from each RNA-seq bam file. Based on the signal-to-noise ratio, we required at least three reads that support the chimera. Each case was visualized and manually verified at the read level [49].

Computational screening of recurrent gene-gene and recurrent gene UIB/DIB fusion

For each fusion call, intergenic breakpoints were annotated with flanking genes including their location and direction. A gene-gene fusion was considered recurrent when both breakpoints were located within genes and at least two samples shared the gene pair. We investigated recurrent intergenic fusions with two parameter settings. In the initial analysis, a fusion was considered recurrent when at least two samples had an upstream-intergenic-breakpoint between the same target gene and the nearest upstream gene, with consistent mRNA up- or downregulation. Among the 44 genes that are significantly upregulated, there was enrichment for cancer-associated genes as defined by a curated cancer gene list (p = 0.00008, Fisher’s exact test see the last section).

In subsequent analysis, we required at least four samples with the fusion and considered breakpoints within 4 Mb upstream of the gene UIB and downstream of the gene DIB . Because breakpoints further away are less likely to be functional and may introduce noise instead, we identified the fusion cases that are most likely to be functional in the following manner. To search the recurrent gene UIB cases, suppose breakpoints are ordered S1, S2, …, Sn in the 4-Mb upstream intra-chromosomal region according to the distance from the gene, with S1 being the closest. We compared the expression values of all possible groupings S1, …,Si vs Si + 1, …,Sn using the Wilcoxon test and picked the corresponding sample group with the lowest p value. If the target gene had > 4-fold upregulation in each of the fusion cases in the group and the mean expression was also > 5-fold greater than the rest of the samples, we considered this fusion to be functional. To minimize the impact of copy number amplification, cases with high-level amplification (the score “2” from putative copy number alterations from GISTIC, in in the target gene were excluded. In Fig. 5, when a fusion occurs once but it produces a previously reported oncogenic chimeric mRNA, we considered it to be part of the “recurrent” set.

Identifying pathways perturbed in the IGF2BP3 UIB fusion cases

To determine the impact of the IGF2BP3 UIB fusion, we identified the genes whose expression was correlated with that of IGF2BP3 across the 47 thyroid samples. For the 444 genes with a q value < 0.05 (|r| > 0.627, Pearson correlation), an over-representation analysis (ORA) based on Fisher’s exact test was performed using ConsensusPathDB (, which contained curated interaction networks from genetic, metabolic, gene regulatory, and other interactions [41]. Pathways with q values < 0.1 were selected similar pathways were combined based on the pathway ontology and the gene overlap (overlapping ratio > 0.9). We also performed a Gene Set Enrichment Analysis (GSEA) [50], and the top 20 pathways ranked by normalized enrichment score (NES) were compared with the ORA results. We performed a similar analysis for the target genes of other fusions, using

50 cases with the highest expression of the target gene and

50 cases with the lowest expression. Visualization was performed using an R package, ComplexHeatmap [51].

Finding the promoter region of IGF2BP3

To investigate the promoter region of IGF2BP3, six candidate regions were selected around the 5′ UTR. The six regions were (− 1891, + 99), (− 1891, − 1000), (− 1000, + 99), (− 382, + 99), (− 283, + 99), and (− 186, + 99) around the translation start site (ATG). The primers are listed in Additional file 1: Fig. S6a. Among the six, two candidates (− 1891, + 99) and (− 283, + 99) were selected based on their activity in HeLa and FTC238 (Additional file 1: Fig. S6b).

Luciferase reporter assay for finding enhancers interacting with the IGF2BP3 promoter

The IGF2BP3 promoter region was PCR-amplified using Nthy-ori 3-1 (thyroid cell line) genomic DNA as a template and cloned into a pGL3-Basic vector (Asp718/BglII) (Promega). From the H3K27ac profiles obtained from CEEHRC ( and ENCODE (, the H3K27ac peaks in all available cell lines, including thyroid cell lines, were examined. Then, PCR-amplified DNA sequences of H3K27ac peak regions in THADA (Fig. 4e and Additional file 1: Fig. S6c) were cloned into the designated enhancer cloning position (SalI/BamH1). The primers are listed in Additional file 1: Fig. S6c. FTC238 and FTC133 cells were transfected using Microporator (Digital Bio). Cells were harvested after 48 h, and luciferase activity was measured with the Dual-Luciferase Reporter Assay System (Promega) using MicroplateLuminometers (Berthold). To normalize the transfection efficiency, the pRL-TK plasmid vector (Promega), which carries a Renilla luciferase reporter gene, was co-transfected with reporter construct as described above. All assays were measured in triplicates.

Cancer gene enrichment analysis and other statistical analyses

To evaluate cancer gene enrichment in target genes dysregulated by gene UIB fusion, we utilized the cancer gene list (2862 genes) curated by the Bushman Laboratory ( After duplicated and non-human genes removed, 2102 genes remained. For up/downregulated target gene sets, Fisher’s exact test was performed. To test mutual exclusivity or co-occurrence between gene UIB fusions and tumor subtypes in breast cancer, the CoMet R package was used [52]. For testing the difference in IGF2BP3 expression between deceased and live patients, the Wilcoxon test was used. Expression differences in the IGF2BP3 UIB fusion-positive group and the negative group were tested using the two-sample t test. p value < 0.05 was considered significant.


Apoptosis is a continuous programmed cell death event that is responsible for periodic control of damaged cells during normal development, maintaining a balance of organismal homeostasis and preventing pathological autoimmunity and tumorigenesis. 6 Unfortunately, apoptosis inhibition or escape in tumour cells leads to abnormal survival and accumulation of dysfunctional cells. 8 Apoptosis is controlled by the BCL2 protein family, which includes pro-apoptotic and pro-survival members. Some members of the family, such as BCL2, are overexpressed in tumour cells, thus breaking the balance between life and death and resulting in unlimited proliferation 8, 19 (Figure 6I). Furthermore, the regulatory relationships among these proteins and DNA amplification, mutation, methylation, overall survival and chromatin accessibility have not been comprehensively examined in gynaecologic cancers.

Here, we evaluated the molecular characteristics of BCL2 family genes under the spatial pattern of chromatin in gynaecologic cancers. As noted, most prior studies have identified initiators, effectors and guardians of the BCL2 family within a unified dynamic model in a variety of cancers, leading to anti-apoptosis and drug resistance in tumour cells 21 (Figure 6I). Recent systematic analyses have uncovered the dynamics of BCL2 family regulation, particularly for anti-apoptotic members, as the most frequent somatic copy-number alteration in human cancer. 9, 15, 21 We found that BCL2L1 and MCL1 show higher expression levels as compared to other BCL2 members, which is consistent with the genetic alteration results. Our observation that anti-apoptotic members are not widely expressed in gynaecologic cancers agrees with previous studies, and low expression level and deep deletion was also observed in some patient samples for anti-apoptotic members. 39, 40 In terms of distal non-coding chromatin accessibility, both BCL2 and BCL2L1 have long-range promoter interactions that affect gene expression. Conversely, BCL2, BCL2L2, and MCL1 show downregulation across 2-3 different cancer types, despite distal BCL2 regulation. 41 These phenomena may be due to the enforced expression or activation of miRNAs that normally suppress BCL2 family expression, such as miR-15a or miR-16.1 targeting of BCL2 miR-29, miR-125 and miR-193 targeting of MCL1 or let-7 targeting of BCL2L2. 41-44 Of the remaining overexpressed members, BAX, NOXA, BIK, BID and BAK are pro-apoptotic. Enhancer signatures can be detected, and the mRNA expression levels of these pro-apoptotic genes may be affected by these distal signatures. However, active effectors (BAX and BAK) or initiator (PMAIP1, BIK and BID) cannot lead to apoptosis in tumours, and these pro-apoptotic proteins may be sequestered by guardians (BCL2 and BCL2L1) that have more potent enhancer activity to propel the anti-apoptotic mechanisms.

In our study of overall survival, BOK in OV, BAD and BIK in UCEC, BAK1 in CESC, and HRK in BRCA show noteworthy associations with overall survival. BAD and BIK regulate tumour growth in many cancers. 45-48 Furthermore, worse overall survival may be triggered by activated guardians (the antiapoptotic proteins eg BCL2L1), which can bind and neutralize BAD and BIK to mediate the inhibition of apoptosis. 15, 21, 38 BAD phosphorylation has recently been reported to promote tumour cell survival, and post-translational modification might therefore contribute to impaired pro-apoptotic proteins. 49 Overexpression of BAK1 has previously been associated with a favourable prognosis in breast cancer, 50 while BOK, BAK1 and HRK as effectors are inserted into the mitochondrial outer membrane, resulting in MOMP without sequestration by the guardians. 15, 21, 38 Overall, our results imply that BAD, BIK and BAK1 may be prognostic genes for clinical effects in gynaecologic cancer.

It is noteworthy that deep deletion and mutation were not observed widely in gynaecologic cancers, especially for pro-apoptotic genes, which differs from previous studies. 6, 8 For example, in humans, multiple tumours mutations and deletions occur at higher frequency in BAX and BAK1, 14, 51-53 but mRNA upregulation serves as the main mechanism of BAX and BAK1 activation in gynaecologic cancer. The alteration of pro-apoptotic genes indicates that dysregulated mechanisms may be influenced by epigenetic or distal enhancer-promoter contacts to control gene expression in gynaecologic cancer. Our results suggest that anti-apoptotic genes and pro-apoptotic genes display different mechanisms of dysregulation in gynaecologic cancer.

DNA methylation in promoter regions can be highly heterogeneous during tumorigenesis and progression, 54 and thus we explored the relationship between DNA methylation and BCL2 family expression level in gynaecologic cancers. In general, almost all methylation of BCL2 family genes obviously influenced gene expression. The data suggest that BCL2L2, BCL2L11 and BBC3 might be affected by DNA methylation and gene amplification. Furthermore, BCL2, BCL2L1, BAK1, BAD, BOK, BIK, BMF and HRK might be more influenced by gene amplification and distal regulatory elements, with far less exposure than other members to methylation modification.

Beyond a limited number of genetic differences, the network from cBioportal analysis was highly structurally similar for different gynaecologic cancers. Our discovery of co-mutation signatures for TP53 and PIK3CA with BCL2L1 is novel and was not revealed in a previous study of gynaecologic cancer. 55, 56 Co-mutation of TP53 and PIK3CA is a primary mediator of anti-apoptotic protein inhibition of MOMP that lead to a more aggressive phenotype with a worse prognosis in breast cancer. 56 These proteins are connected by co-mutation signatures MCL1 and BCL2L1 transcriptional activity are consistent with chromatin accessibility of the region surrounding the gene loci with more than one predicted distal enhancers. Therefore, we hypothesize that the co-mutation signature of TP53 and PIK3CA acts to stimulate the formation of chromatin loops of BCL2L1 gene loci to strengthen its transcriptional activity. BCL2L1, as the only upregulated anti-apoptotic member, prevents the activation and oligomerization of pro-apoptotic members that act as guardians against the effectors or activators on the mitochondrial membrane, which maintains the balance of mitochondrial membrane potential and thus prevents the release of cytochrome c to suppress caspases. 39, 57-59 Using BCL2L1, TP53/TP63 and PIK3CA as possible potential prognosis markers may therefore be an effective approach for diagnosis and treatment of gynaecologic cancer.

The resulting network models showed specific differences in YWHAZ amplification in BRCA, PIK3CB amplification in CESC, PIK3R1 mutation in UCEC, and downregulation of XRCC6, NMT1 and CASP3 in OV, which suggests that the BCL2 family protein network can be used to identify different types of gynaecologic cancer. YWHAZ has been shown to stimulate lung cancer cell proliferation and metastasis and promote the invasion of breast cancer cells, 60 suggesting that it might serve as a therapeutic target of breast cancer. 61 Mutation of PIK3CB, as the catalytic subunit in the PI3K signalling pathway, drives tumour cell growth and migration. 62 PIK3CB has been reported as a selective survival factor in glioblastoma. 63 Furthermore, co-mutation of PIK3R1 and PIK3CA is associated with oncogenesis and hyperactivity of the PI3K signal pathway in breast cancer, supporting an oncogenic role of the co-mutation pair. 64 Loss of PIK3R1 is an effective therapeutic mechanism for PIK3CA-positive breast cancers. 65 On the other hand, activation of CASP3 is involved in the initiation of cell apoptosis, 66 inhibition of NMT1 regulates breast cancer oncogenesis by the JNK pathway, 67 and inactive XRCC6 fails to protect genomic integrity. 68 Therefore, our findings further validate previous studies demonstrating that downregulation of XRCC6, NMT1 and CASP3 is significantly associated with tumorigenesis.

Long-range enhancer-promoter gene expression is facilitated and constrained by the 3D architecture of mammalian genomes, which plays a key role in disease. 69 We demonstrated that the significant differential expression of the BCL2 family shows a signature of chromatin accessibility. We systematically identified spatiotemporal patterns of gene expression of the BCL2 family orchestrated by distal chromatin accessibility. The chromatin accessibility profile had a similar distribution in different tumour samples, which is likely dictated by the folding of chromatin loops within the 3D topography of the genome to bring enhancers in close spatial proximity with promoters and accelerate RNA polymerase recruitment. 69-72 These findings are significant for designing new medicines based on the molecular characteristics of high tumour heterogeneity surrounding the BCL2L1 gene loci, including those that target specific topological features.

In conclusion, as the first systematic analysis of molecular feature of the BCL2 family under the spatial pattern of chromatin in gynaecologic cancer, our study broadens the therapeutic scope of the BCL2 family to the distal non-coding region. We demonstrated that differential expression of BCL2 family members occurs at different frequencies. Furthermore, we identified the relationship between overall survival, enhancer signature, gene amplification and DNA methylation. Our results also establish a shared protein regulatory network in which the co-mutation signatures of TP53 and PIK3CA interact with BCL2L1, which provides a new strategy for biomarker identification in oncotherapy.

Chapter Five - Cancer type-specific alterations in actin genes: Worth a closer look?

Actins form a strongly conserved family of proteins that are central to the functioning of the actin cytoskeleton partaking in natural processes such as cell division, adhesion, contraction and migration. These processes, however, also occur during the various phases of cancer progression. Yet, surprisingly, alterations in the six human actin genes in cancer studies have received little attention and the focus was mostly on deregulated expression levels of actins and even more so of actin-binding or regulatory proteins. Starting from the early mutation work in the 1980s, we propose based on reviewing literature and data from patient cancer genomes that alterations in actin genes are different in distinct cancer subtypes, suggesting some specificity. These actin gene alterations include (missense) mutations, gene fusions and copy number alterations (deletions and amplifications) and we illustrate their occurrence for a limited number of examples including actin mutations in lymphoid cancers and nonmelanoma skin cancer and actin gene copy number alterations for breast, prostate and liver cancers. A challenge in the future will be to further sort out the specificity per actin gene, alteration type and cancer subtype. Even more challenging is (experimentally) distinguishing between cause and consequence: which alterations are passengers and which are involved in tumor progression of particular cancer subtypes?


DNA hypermethylation and upregulated gene expression is a robust association pattern

To initially limit the number of methylation sites, we first analysed the dataset from Absher (Table 1), which focuses on promoter regions. Of the 27,578 DNA methylation probes analysed in Absher, we identified 6110 probes that gained methylation and 2916 that lost methylation when PCa samples were compared to normal samples. We then assessed the robustness of these differences by comparing them with corresponding PCa-to-normal changes in the datasets from Kirby and TCGA (Table 1). Among 11,375 corresponding probes with data in all three methylation datasets, 4557 were significantly hypermethylated in PCa compared to normal tissue samples, while 1786 were significantly hypomethylated (p < 0.05) in all three cohorts (Fig. 2a). These probes were associated with 3326 and 1502 genes, respectively (Fig. 2b). A few genes were recurrent among the top 5 most significantly hypermethylated. Genes SOSTDC1 and FLT4 are shared between the Absher and Kirby datasets, while the gene CYBA is shared between the Absher and TCGA datasets.

Number of genes and probes in the three DNA methylation datasets Absher, Kirby and TCGA with different DNA methylation and gene expression statuses in PCa compared to normal tissue samples. The resemblance between the datasets is high in terms of: a probes with gain and loss of methylation, b genes with gain and loss of methylation and c genes classified in the groups UPUP, UPDOWN, DOWNUP and DOWNDOWN based on correspondence between methylation and gene expression. Red indicates the fraction of overlapping probes/genes, while grey indicates non-overlapping probes/genes

DNA methylation results were then combined with a dataset of previously identified robust gene expression changes in PCa [35] to distinguish four groups of regulation patterns (UPUP — methylation gain, expression upregulation UPDOWN — methylation gain, expression downregulation DOWNUP — methylation loss, expression upregulation DOWNDOWN – methylation loss, expression downregulation). As expected, most genes (1476 overlapping genes in Absher, Kirby and TCGA, p < 0.05) followed the canonical pattern where hypermethylated promoters leads to downregulated expression (UPDOWN group, Fig. 2c). However, a large number of hypermethylated genes (713, p < 0.05) were associated with increased expression (UPUP group, Fig. 2c). These observations were similarly robust for UPUP and UPDOWN groups: on average 89% of the UPDOWN and 80% of the UPUP methylation changes were present in all three datasets (Additional file 1: Table S1). Genes from the UPDOWN group displayed on average higher methylation fold changes than genes form the UPUP group, and a higher negative impact on gene expression for a subset of genes (Fig. 3), supporting the UPDOWN pattern as the most important mode of regulation. However, UPUP genes also showed comparably strong positive association between DNA methylation and gene expression (Fig. 3), supporting the additional relevance of the UPUP pattern. Methylation changes are weaker and less abundant for the genes in DOWNUP and DOWNDOWN groups compared to the two groups with hypermethylation. Only 70% of methylation changes were present in all three datasets and with a noticeable poorer overlap in the TCGA dataset (Fig. 2, Additional file 1: Table S1). Average fold changes are also smaller for DOWNUP and DOWNDOWN genes (Fig. 3).

UPDOWN genes displayed higher methylation fold changes than UPUP genes. However, UPUP genes also demonstrate strong positive association between methylation and gene expression, supporting the additional relevance of the UPUP pattern in gene regulation. One hundred fifty genes from UPUP, UPDOWN, DOWNUP and DOWNDOWDN groups with the highest DNA methylation fold changes were selected, their gene expression and average DNA methylation fold changes visualized, where each data point represents one gene. Average methylation fold changes were calculated from all corresponding probes in Absher, Kirby and TCGA datasets

In the three DNA methylation datasets the majority of genes showed consistent association with either hypermethylated or hypomethylated probes (Additional file 1: Table S2). For genes associated with multiple probes (on average 266 and 697 genes in the UPUP and UPDOWN groups, respectively) less than 2% showed association with both hypermethylated and hypomethylated probes. Some of these genes, such as GNAS and PEG10, showed the same inconsistent associations in all three datasets. Inconsistency was higher in the DOWNUP and DOWNDOWN groups, with 2.21 and 6.81% of genes on average in three datasets with both hyper- and hypomethylated probes (Additional file 1: Table S2).

Since the UPUP genes was the non-canonical group with the most consistent non-canonical methylation/expression pattern, we decided to focus on the group of UPUP genes in the remaining part of this study. However, we also made a parallel analysis of UPDOWN genes to see how the UPUP group compared to the classical UPDOWN pattern in terms of robustness of the observed patterns.

UPUP gene methylation patterns are robust when expanding the number of probe-gene associations with HM450 data

The Absher DNA methylation dataset has a significantly smaller number of probes, compared to the HM450 BeadChip used in Kirby and TCGA. To investigate further the methylation pattern around UPUP genes we extended our analysis to all gene-probe associations in the HM450 reference and compared UPUP and UPDOWN genes in this extended setting. This substantially increased the number of gene-probe associations (125,704 associations in total with an average of 16.4 probes per gene, compared to 11,375, with an average of 1.5 for the 27 k).

The initial set of UPUP genes was filtered according to the methylation patterns of their associated HM450 probes. All UPUP genes with at least one significantly (p < 0.05) downregulated methylation probe were removed, reducing the number of UPUP genes from 713 to 105. This UPUP-only group thus consists of genes which either has only upregulated methylation probes or a combination of upregulated and non-differentially expressed probes, but no downregulated methylation probes. The same strategy was applied to create an UPDOWN-only group of genes, reducing the number of UPDOWN genes from 1476 to 192. Genes in UPUP-only and UPDOWN-only groups have on average 9.5 and 9.8 hypermethylated probes per gene, respectively. Moreover, 78.10% of all UPUP-only genes have more than 50% of the associated probes hypermethylated, while the corresponding number for UPDOWN-only genes is 46.35%. In addition, 11.43% of UPUP-only genes have all associated probes consistently hypermethylated, compared to 7.29% of the UPDOWN-only genes. Thus, when increasing the number of methylation probes using HM450 data, we still observe comparable robustness of gene-probe associations in the UPUP-only and UPDOWN-only groups of genes. This strengthen the indication that the observed UPUP pattern constitute a biological relevant epigenetic layer of gene regulation. The two refined groups of genes (UPUP-only and UPDOWN-only) with unambiguous methylation patterns — no probes with methylation loss associated — were analysed further.

Probes associated with UPUP-only genes demonstrate a distinct correlation pattern between DNA methylation and gene expression compared to probes associated with UPDOWN-only genes

The TCGA cohort contains gene expression and DNA methylation measured on the exact same samples (in this text defined as the TCGA combined dataset). This means that expression and methylation profiles are directly comparable, with minimal confounding by varying tumour content and tissue composition. We used the TCGA combined dataset to compare the strength of gene-probe associations for the UPUP-only and UPDOWN-only gene groups (105 and 192 genes, respectively) by calculating the Pearson correlation between TCGA combined methylation and expression profiles across all samples. The probes were assigned to different correlation groups, based on the strength and the sign of their correlation values (very strong negative to very strong positive correlation) (Additional file 1: Table S3).

As expected, probes for the UPDOWN-only genes generally display a negative correlation, with most probes in the intermediate correlation group (27.14%) (Fig. 4, Additional file 1: Table S3), and only a small number of UPDOWN-only probes show a positive correlation. Correspondingly, most of the UPUP-only probes (15.17%) have intermediate positive correlation. However, genes in the UPUP-only group are also somewhat associated with weak and intermediate negatively correlated probes. Nevertheless, the differences observed in Fig. 4 demonstrate that the UPUP-only probes follow a distinct correlation pattern compared to UPDOWN-only probes, though the overall positive association between methylation and gene expression for UPUP-only probes is weaker than the corresponding anticorrelation for UPDOWN-only group of probes.

The UPUP-only group shows a weaker correspondence to the UPUP pattern compared to the UPDOWN-only group and the corresponding UPDOWN pattern, although a clear difference between two groups can be seen. Most of the UPDOWN-only DNA methylation probes are negatively correlated with the expression of corresponding genes, while a few are positively correlated positively. UPUP-only pattern includes some negatively correlated probes, but still the larger fraction shows positive correlation. Overall, the UPUP and UPDOWN patterns are clearly distinct

UPUP-only probes are more closely associated with TSSs of the associated genes compared to UPDOWN-only probes

We calculated the distance between each hypermethylated probe and TSSs of the associated genes in the UPUP-only and UPDOWN-only groups, hypothesizing that sites closer to TSS might have a higher impact on the expression level than sites further away from TSS. When comparing calculated distances and average methylation fold change of the genes in each of the two groups, it is clear that there are more UPDOWN-only than UPUP-only probes with a higher fold change closer to the TSS, and that this is consistent across a region of at least +/− 400 bp around the TSS (Fig. 5, Additional file 1: Table S4). On the other hand, a far larger fraction of UPUP-only genes (57.14%) are enriched for hypermethylated probes most proximal to the TSS (+/− 50 bp), compared to UPDOWN-only genes (26.04%) (Additional file 1: Figure S1). The distribution of probes with a smaller fold change does not show any clear differences between the two groups (Fig. 5). More than 80% of all probes (both high and low fold changes) are located in the window of − 1500 to 1500 bp from the TSSs of the associated genes and all genes have at least one hypermethylated probe located in this region (Additional file 1: Figure S1). Somewhat fewer probes from both groups are located upstream from the TSS (46.01% of all UPUP-only probes and 47.42% of UPDOWN-only probes).

UPUP-only and UPDOWN-only genes show different distribution close to TSS for methylation probes with high fold changes. The distribution of probes with lower fold change is similar in the two groups (light blue for UPUP-only probes, light red for UPDOWN-only probes)

In addition, we checked the location of hypermethylated probes in the genome regions of particular importance to regulation of gene expression — CGIs, their shores or shelves. For UPUP-only group, 81.90% have significantly hypermethylated probes located in one of the three analysed genomic region types. This is higher than 73.44% for UPDOWN-only group. However, similar fractions of genes in both groups have hypermethylated probes in at least one of the three region types (Additional file 1: Table S5). Thus, if these three types of regions are indicative of regulatory potential, the observed similarities should indicate a comparable regulatory potential for both groups of genes. Taken together, the analysis of probe distances to TSS and genomic locations of the probes implies a robust regulatory relationship between DNA methylation and gene expression for both UPUP-only and UPDOWN-only groups of genes.

We also counted the fractions of methylation sites for genes in UPUP-only and UPDOWN-only groups found in 5’UTR, 3’UTR, exons, coding exons, introns, first exons, first coding exons and first introns (Table 2). The highest fractions of hypermethylated and non-differentially methylated UPUP-only and UPDOWN-only probes are located in exons and introns, but more than half of the probes are specifically located in the first exon and intron. UPDOWN-only genes have slightly more hypermethylated probes in first intron and exon than UPUP-only genes. However, the fraction of hypermethylated probes in the first coding exons are similar for UPDOWN-only and UPUP-only genes, suggesting that the main difference for UPDOWN-only hypermethylation is in 5’UTR exons. UPDOWN-genes also has a higher fraction of non-differentially methylated probes in the 3’UTR regions, supporting the slight bias of UPDOWN-only probes towards the beginning of the gene compared to UPUP-only probes. Apart from this, there are only subtle differences between UPDOWN-only and UPUP-only genes in their methylation site association to the gene region categories.

Genes in UPUP-only and UPDOWN-only groups are associated with the same regulatory mechanism, but affect genes in different cellular compartments

Genes from the UPUP-only and UPDOWN-only groups are associated with the same regulatory mechanism. Gene set enrichment analysis in Enrichr showed that both groups of genes were significantly associated with transcription factor SUZ12 in two categories ‘ENCODE TF ChIP-seq 2015’ and ‘ENCODE and ChEA Consensus TFs from ChIP-X’ (p < 0.001), indicating possible involvement in this cellular regulatory network. Combined enrichment score from the ‘Consensus’ category for UPUP-only genes was lower compared to the score for UPDOWN-only (38.68 and 53.42, respectively). Results from ‘ENCODE Histone Modifications 2015’ enhance relations to regulatory functions, where both gene groups were linked to H3K27me3 histone modification, known to interact with (or is modulated by) the Polycomb complex, which also includes SUZ12.

Interestingly, the clearest difference between UPUP-only and UPDOWN-only genes was observed in the ‘Jensen COMPARTMENTS’ category. In this category the UPUP-only genes showed statistically significant association (p < 0.001) with terms related to nuclear chromatin, nucleosomes, DNA packaging and protein-DNA complexes. The combined enrichment scores of the top 5 most significant hits vary from 43.24 for ‘Nuclear_chromosome’ to 52.99 for ‘Nuclear_chromatin”. In comparison, UPDOWN-only genes showed association with terms related to extracellular features, including extracellular exosome, vesicle, organelle, membrane-bounded vesicle and cytoskeletal component — type III intermediate filament. However, combined enrichment scores for top 5 hits were considerably higher for UPDOWN-only genes, ranging from 207.15 for ‘Extracellular_organelle’ to 325.07 for ‘Extracellular_region’.

Distribution of hypermethylated probes along UPUP-only genes is more complex compared to UPDOWN-only genes

We selected the top 10 most significantly hypermethylated UPUP-only and UPDOWN-only genes to investigate how the detailed distribution of methylation probes differ in the local genomic region surrounding these genes (Additional file 1: Table S6). Observing the top 10 genes from the canonical UPDOWN-only group, we spotted a clear trend hypermethylated probes tend to form a cluster around the TSS of the associated genes, and this cluster usually overlaps with a CGI. The formation of clusters is here evaluated visually. The distances of probes to the TSS vary with majority being more distant than 50 bp. The distribution of the probes for UPDOWN-only genes can be distinguished according to how hypermethylated and non-differentially methylated probes distribute across the genes. Three genes (PLA2G3, WFDC2 and MFAP4) have one or two additional significantly hypermethylated probes located away from the cluster, which are less significantly hypermethylated according to the p-value. Seven genes (SCGB3A1, EFS, KLF8, COL3A1, TMEM106A, RGN and SPARCL1) have one to three non-differentially methylated probes located outside the hypermethylated cluster.

The ten most significant genes from the UPUP-only pattern are more challenging to group based on the distribution of the probes. However, two groups of genes with similar distribution of probes patterns can be distinguished. Five genes (CPT1B, LTK, ZAR1, SRPX2 and LRRC25) are similar to the seven-gene UPDOWN-only pattern with a cluster of hypermethylated probes around TSS, and one to three non-differentially methylated probes located outside of the cluster. Three genes (GSC, FEV and HIST1H3E) do not display any clear clusters of hypermethylated probes and also have at least four non-differentially methylated probes associated, which show no systematic distribution pattern. However, hypermethylated probes for this group of genes do overlap with CGIs. The two last genes in UPUP-only group cannot be assigned to any clear pattern. The gene TLX1 is associated with 34 significantly hypermethylated probes and one non-differentially methylated probe (Fig. 6). The gene has three CGIs, which are covered by three clusters of probes with a particularly dense cluster in one of the two TSSs of the gene. The other TSS is faintly covered with four hypermethylated probes, where two of them are among the least significantly hypermethylated (Pos31 and Pos32). In addition, DNA methylation fold changes are higher for a denser cluster that covers the second TSS. This distribution of methylation probes and the change in their methylation status could indicate a usage of an alternative TSS, which could explain the upregulated gene expression due to insufficient hypermethylation of an alternative TSS. However, DBTSS and ZENBU do not show any alternative TSSs for this gene in prostate or other tissues. The last gene, TSPAN16, is the only gene in UPUP-only top ten list that has five non-differentially methylated probes and only one very significantly hypermethylated, which also overlaps with the TSS. Overall, we observe that though distribution patterns for UPUP-only genes have similarities with the patterns for UPDOWN-only genes, the UPUP-only pattern is more difficult to generalize due to its higher complexity.

UCSC Genome Browser window for the gene TLX1 together with methylation fold changes for each visualized position in the same order. The distribution of probes associated with this gene is a distinctive example for the UPUP-only regulation pattern group. Thirty-four significantly hypermethylated probes cover three CGIs with a higher density for one of the TSSs. Pos1–34 stand for significantly (p < 0.001 and p < 0.05) hypermethylated probes from most (Pos1) to least (Pos34) significantly methylated probe. PosA is a non-differentially methylated probe

Expression and Gene Regulation Network of TFF1 in Esophageal Carcinoma

*Corresponding author: Xiaolong Li, Department of Cell Biology and Genetics, School of Preclinical Medicine, Guangxi 530021, P.R. China and Yingwen Huang, Department of Central Laboratory, 89-9 Dongge Road, Nanning, Guangxi 530000, P.R. China

Submission: March 11, 2020Published: July 20, 2020

ISSN:2637-773X Volume5 Issue1


TFF1, one member of the trefoil factor family (TFFs), is an antiproteinolytic peptide. Abnormal TFF1 expression is associated with carcinogenesis. In order to investigate the expression of TFF1 in esophageal carcinoma and its potential gene regulatory network. We used sequencing data from the Cancer Genome Atlas database and Gene Expression Omnibus, analyzed TFF1 expression and gene regulation networks in esophageal carcinoma (ESCA). TFF1 expression profiling was analyzed using Oncomine TM, while TFF1 mutation and related functional networks were identified using cBioPortal. Linked Omics was used to identify differential gene expression with TFF1 and to analyze Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathways. We found that TFF1 is overexpressed in ESCA, and deletion is the most common TFF1 mutation type in ESCA, and TFF1 gene mutation may also significantly affect the prognosis of ESCA patients. Functional network analysis showed that TFF1 may play a role in ESCA by participating in NF-κB signaling pathway and Hippo signaling pathway. Our results demonstrate that data mining efficiently reveals information about TFF1 expression and potential regulatory networks in ESCA, laying a foundation for further study of the role of TFF1 in carcinogenesis.

Keywords: Trefoil factor family Esophageal carcinoma Bioinformatics


Esophageal carcinoma (ESCA) has been ranked the seventh position in the world cancer incidence rate, and the mortality rate is sixth [1]. According to histological classification, ESCA includes squamous cell carcinoma (ESCC), adenocarcinoma (EAC) and undifferentiated carcinoma. Among them, squamous cell carcinoma is the most popular type of esophageal carcinoma [2]. In Asia, more than 90% of esophageal carcinoma is ESCC, while EAC mostly occurs in European and American countries [3]. China is one of the countries with a high incidence of esophageal cancer, most of which are ESCC. It has been reported that ESCC ranks fifth in cancer incidence and fourth in mortality, in China mainland [4]. ESCA, as the main histopathological manifestation of malignant tumors, is a fatal malignant tumor with low survival rate [5]. Despite advanced diagnosis and treatment methods, most cases are diagnosed as advanced and have a poor prognosis due to rapid metastasis [6]. To date, several studies have investigated that more than 50% of patients start metastasis at an early diagnosis [3]. According to the stage of metastasis, malignant tumors can be divided into local metastases, local metastases, and distant metastases. In addition, it has been determined that the distant metastatic stage can cause death in approximately 50% of patients with malignancies [7]. In addition, the prognosis of patients with ESCA is miserable due to late diagnosis and poor response to treatment, and the limit of 5-year survival rate is 15% [4,8-10]. Consequently, it is a pressing and essential work to find biomarkers for preclinical diagnosis and hazard ranking of ESCA. It is possible to discover new ESCA-related biomarkers by screening gene networks related to tumor formation and progression.

Members of the trefoil factor family (TFF) include TFF1 or gastric peptide (pS2), TFF2 or spasmolytic peptide (SP), and TFF3 or intestinal trefoil factor. Structurally, a cysteine-rich motif is a conserved structure for members of the TFF family. This motif can form a tricyclic structure through a disulfide bond, the so-called &ldquothree-leaf&rdquo domain. Previous studies have found that gastrointestinal mucosal epithelial cells express TFFs at high levels, and TFFs are conducive to improving the ability of mucosal cells to resist damage and protecting the structural integrity of mucosal epithelium [11]. However, the pathophysiological state of the tissue will also greatly affect the physiological activity of TFFs [12]. Due to the various expression levels of TFFs in tumors and different stages of tumor development, its role in tumorigenesis is still controversial. TFF1 was originally found in breast cancer cells [13]. Some studies have suggested that TFF1 is a promoter of cancer development [14,15], other research has found that TFF1 is a predictor of positive response to hormone therapy in breast cancer, however [16,17]. In addition, a number of researches in different tumor tissues, including colon, pancreatic, and ovarian cancer, have indicated that the survival, invasion, and metastasis of tumor cells may be regulated by TFF1 [18-20]. Furthermore,

TFF1 knockout mice have a high incidence of pyloric adenoma, in this trend, 30% of them boosted malignant gastric cancer [21].

These previous results suggest that TFF1 may be a new tumor marker. However, the expression of TFF1 in esophageal cancer and its regulated gene network have not been reported. Therefore, the expression characteristics and mutation rate of TFF1 in ESCA were analyzed in this study, based on data from multiple public databases including The Cancer Genome Atlas (TCGA). Furthermore, TFF1-related genomic changes and functional regulatory networks in ESCA have been described through a multidimensional analysis method. Consequently, our findings may provide novel biomarkers and strategies fronting early diagnosing and intervention of ESCA.

Materials and Methods

Oncomine analysis

TFF1 mRNA expression in ESCA was analyzed in Oncome 4.5 database. Oncomine ( is currently the most thorough database and comprehensive data mining platform for tumor-related gene research, including 715 gene expression data sets and data from 86733 cancer and normal tissues [22]. In this study, differences in TFF1 transcription levels among specific tumors and adjoining normal tissues were acquired from the Oncomine database. A series of ESCA studies including Kim's, kimchi's and Hao's were used in this study. Expression profiling of TFF1 in ESCA was compared with that in normal tissue. The t test was applied for analyzing the difference between transcription levels, and a p value less than 0.01 was regarded as statistically significant. The mRNA data was resolved using the following thresholds: 1.5 as basic fold difference and 10% gene alignment ratio.

GEPIA analysis

The expression of TFF1 in esophageal cancer tissues and normal tissues was assessed using the Gene Expression Profiling Interactive Analysis (GEPIA) platform. GEPIA is a web-based tool that enables the interactive analysis of cancer and normal gene expression using data from The Tumor Genome Map (TCGA) and Genotype Tissue Expression (GTEx) [23]. On this basis, the analysis of box diagram and tumor stage diagram was carried out. If the p value was less than 0.01, the difference between results was considered as statistically significant.

c-BioPortal analysis

The c-BioPortal is an open-access database that integrates and simplifies the contents of multiple cancer genome databases including TCGA, International Cancer Genome Consortium (ICGC) and GEO, and provides a friendly and visual interface. Here in this study c-BioPortal was used for resolving the characteristic of TFF1 in ESCA samples from TCGA. The search parameters included mutation and mRNA expression. The oncoprint tab displays an overview of the genetic mutations in each sample in TFF1. And the network visualizes the biological interaction network of TFF1 from the public pathway database, using color coding and screening options based on the frequency of each gene genome change. These include neighboring genes with high frequency of change. Kaplan Meier chart was used to show TFF1 gene mutation and its correlation with OS in ESCA patients. Log-rank test was performed to determine the significance of the difference among survival curves. The p value less than 0.05 regarded with a statistical difference between these results.

Analysis based on linked omics database

These 32 TCGA cancer-related cubes included in this study were dissected using the online database Linked Omics [24]. The link finder module of linked omics was applied to research the differentially expressed genes related to TFF1 in the TCGA ESCA cohort. The results were statistically analyzed by Pearson correlation coefficient. All the results are presented graphically in the volcanic map, thermal map or scatter map. The link interpreter module of linked omics analyzes the pathways and networks of differentially expressed genes. The data in link finder results are signed and sorted in Web Gestalt [25], and go (CC, BP, MF) and KEGG pathway are analyzed by GSEA.


TFF1 mRNA expression in cancers

The main function of the Oncomine database is to analyze gene expression differences, correlation between gene expression and clinical, and co-expression of multiple genes. Data from the Oncomine database was used to assess the profiling of TFF1 mRNA expression in different tumor tissues as well as normal clinical specimens. In this study, 650 data sets, including 80, 551 samples were included. The results show that TFF1 mRNA expression was significantly elevated in head and central nervous system tumors, breast, esophageal, ovarian, and pancreatic cancers, while downregulated in colorectal, kidney and malignant mesenchymal tumors (Figure 1). Thus, there are a number of notable differences in TFF1 mRNA expression among diverse tumor types.

Figure 1: Pooled analyses on the mRNA expression of TFF1 in various carcinoma types. The mRNA expression of TFF1 (cancer vs. corresponding normal tissue) was evaluated using the Oncomine database (red represents significant overexpression and blue represents reduced expression). The following parameters were used as thresholds: P 2 and gene ranking in the top 10%.

The Expression of TFF1 in ESCA

Two databases, Gene Expression Omnibus (GEO) and TCGA were used in order to evaluate the difference in the transcription level of TFF1 in different ESCA tissues. Data from the Oncomine 4.5 database show that the level of TFF1 mRNA expression in ESCA is raised significantly than that in normal tissues (p<0.05=. The fold changes were all greater than 2 and TFF1 was ranked in the top 1% based on mRNA expression (Figure 2). Based on the analysis of 182 esophageal cancer (ESCA) samples in the TCGA database, TFF1 also maintained high transcription standard in each tumor stage subgroup, and the highest level was found in stage 1 (Figure 3). Therefore, TFF1 expression may be a potential diagnostic indicator of ESCA.

Genomic mutation analysis of TFF1 in ESCA

Based on the sequencing data of ESCA patients from the TCGA database, cBioPortal was applied in analyzing the type and frequency of TFF1 changes in ESCA. TFF1 was altered in 10 of the 265 ESCA patients (Figure 4A). These changes were deleted in 8 cases (3%) and amplified in 2 cases (1%). Therefore, deletion is the most common type of TFF1 mutation in ESCA. In addition, as shown in (Figure 4B), Kaplan-Meier plots indicate that TFF1 gene mutations are companied with shorter OS in ESCA patients (p = 0.0258). These results suggest that mutations in the TFF1 gene may also significantly affect the prognosis of ESCA patients.

Figure 2: TFF1 transcription in Esophagus carcinoma (Oncomine). Levels of TFF1 mRNA were significantly higher in Esophagus carcinoma than in normal tissue. Shown are fold change, associated p values, based on Oncomine 4.5 analysis. (A–C) Box plot showing TFF1 mRNA levels in, respectively, Kim Esophagus, Kimchi Esophagus, the Hao Esophagus.

Figure 3: TFF1 transcription in subgroups of patients with Esophagus carcinoma, stratified based on Stage (GEPIA). (A) Boxplot showing relative expression of TFF1 in normal and ESCA samples. (B) Expression of TFF1 in ESCA based on individual cancer stages.

To assess the biological interaction network of TFF1 in ESCA, gene network toolkit in cBioPortal to further analyze the genes that interact with TFF1 in lung cancer. The analysis results show that the network contains 13 nodes, including 1 query gene TFF1 and 12 TFF1 neighboring genes. The type of interaction is derived from BioPAX: blue connections indicate that the first protein is involved in regulating the second protein, while red connections indicate that these proteins are parts of the same complex. The shade of the color represents the degree of mutation. The darker red nodes including nuclear receptor coactivators 1, 2, 3, estrogen receptor 1, and N-methylated purine DNA glycosylase, these genes have a higher degree of mutation than other genes in EACA (Figure 4C).

Figure 4: Visual summary of TFF1 alterations biological interaction network and TFF1 association with OS in Esophagus carcinoma (cBioPortal). (A) OncoPrint of TFF1 alterations in ESCA. The OncoPrint provides an overview of genomic alterations in TFF1 affecting individual samples (columns) in ESCA from the TCGA. The different types of genetic alterations are highlighted in different colors. (B) Genetic alterations in TFF1 were associated with shorter OS(P

Analysis of TFF1-related co-expressed genes in ESCA

Figure 5: Genes differentially expressed in correlation with TFF1 in Esophagus carcinoma (Linked Omics). (A) A Pearson test was used to analyze correlations between TFF1, and genes differentially expressed in ESCA. (B) Heat maps showing genes positively and negatively correlated with TFF1 in ESCA (TOP 50). (C) Red indicates positively correlated genes and green indicates negatively correlated.

The mRNA data of 184 ESCA patients from TCGA were analyzed using the Linked Omics functional module. As present in Figure 5A and Figure 5B, 5370 genes (red dots) are significantly positively correlated with TFF1 while 896 genes (green dots) are negatively correlated with TFF1. Furthermore, 50 important genes that are negatively or positively correlated with TFF1 are shown in Figure 5C. These results indicate that TFF1 has a comprehensive influence on the transcriptome. Figure 6 shows that there are strong correlations between the high expression of TFF1 and expression of CAPN8 (Calpain 8, Pearson correlation coefficient = 0.90, p = 1.702 e-68), CTSE (Cathepsin E, Pearson correlation coefficient = 0.89, p = 6.33e-64), REG4 (Regenerative gene 4, Pearson correlation coefficient = 0.88, p = 7.97e-61), suggesting that TFF1 may participate in tumorigenesis and development through regulating the expression of CAPN8, CTSE and REG4.

Analysis of GO and KEGG pathways of TFF1-related co-expressed genes in ESCA

The GO analysis based on the Gene Set Enrichment Analysis (GSEA) indicated that the TFF1 related gene polymorphisms are mainly located in the microsome, endoplasmic reticulum membrane and Golgi apparatus, and involved in cell transport, nucleotide-sugar metabolism and endoplasm chiefly. Their molecular functions mainly include triggering lipase activity, carboxylate hydrolase activity, and hydrolase activity acting on glycosyl bonds (Figure 7A-7C). On the other hand, KEGG pathway analysis shows that TFF1 may play a role in ESCA by participating in the signaling transduction of NF-&kappaB and Hippo pathway (Figure 7D).

Figure 6: Gene expression correlation analysis for TFF1, CAPN8, CTSE and REG4(Linked Omics). The scatter plot shows Pearson correlation of TFF1 expression with expression of CAPN8 (A), CTSE (B), and REG4 (C).

Figure 7: Significantly enriched GO annotations and KEGG pathways of TFF1 in Esophagus carcinoma. The significantly enriched GO annotations and KEGG pathways of TFF1 co-expression genes in ESCA were analyzed using GSEA. (A) Cellular components. (B) Biological processes. (C) Molecular functions. (D) KEGG pathway analysis.


TFF1 is a protein expressed in various tissues such as the mucosal epithelium, it is one of the earliest members of the trefoil factor family (TFF) to be discovered, which contains a conserved tricyclic domain, called the TFF domain [26,27]. Maintaining the integrity of the mucosa in the gastrointestinal tract and promoting the regeneration of damaged mucosal epithelial cells are the primary functions of TFFs [11]. It has been reported that the expression of TFF1 in gastric cancer is impaired, and the incidence of gastric cancer in mice lacking TFF1 gene is enhanced [28]. In addition, TFF1 is expressed at a higher level in colon, pancreas and ovarian tumors than other tumor tissues, and is involved with stimulating cell survival, migration, invasiveness, and tumor spread. It has been shown that TFF1 is a potential tumor marker [19,20].

Our results here show that in ESCA patients, TFF1 is overexpressed, and its mRNA expression is significantly correlated with the individual cancer stage of patients. Moreover, we found that the mutation rate of TFF1 in ESCA patients was 4%, and the deletion was the most common type of TFF1 mutation in ESCA. The mutation of gene TFF1 may also significantly affect the prognosis of ESCA patients. Functional network analysis shows that TFF1 may play a role in ESCA by participating in the NF-&kappaB signal pathway and Hippo signal pathway. The mechanism that led to the progress of ESCA has not been fully elucidated, and its early diagnosis biomarkers have not been confirmed and applied, which leads to poor diagnosis and treatment results. Our data shows that TFF1 expression in ESCA patients' tumor tissues is higher than in normal tissues, especially in esophageal cancer tumors at stage &Iota. The results suggest that TFF1 is worthy of further research as a candidate biomarker for early diagnosis of ESCC. TFF1 is a mucosal protective factor, which is up regulated by stimuli of mucosal damage, promoting of mucosal repair and maintaining integrity [29]. TFF1 can suppress the activation of NF-&kappaB signaling pathway by down-regulating the expression of inflammatory factors and anti-apoptotic proteins. Due to the lack of inhibition of TFF1, the NF-&kappaB signaling pathway is over-activated in gastric cancer cells [30]. Similar to this, Functional network analysis shows that TFF1 may play a role in ESCA by participating in the NF-&kappaB signal pathway and Hippo signal pathway. Upregulation of TFF1 expression in Barrett's esophagus (BE) has been reported to increase the incidence of esophageal adenocarcinoma, suggesting that upregulation of TFF1 is a characteristic of precancerous symptoms. Our results confirm previous observations [16,31]. Cause gastric acid reflux is closely related to BE. Mucosal damage due to gastric acid reflux may be one of the reasons for the upregulation of TFF1 expression in BE. In summary, these results note that TFF1 is enrolled in the occurrence and development of ESCA.

Recent studies have shown that EAC is similar to gastric adenocarcinoma in molecular changes, while ESCC is more analogous to head and neck squamous cell carcinomas [32]. But there are few studies on the TFF1 expression in cancers of head and neck. It has been found that squamous epithelial cells in the upper respiratory and digestive tracts of patients with head and neck cancer also undergo abnormal changes [33]. As a biomarker of early canceration, the expression of TFF1 in the esophageal mucosa may not only be used for the early diagnosis of squamous cell carcinoma of the head and neck, but also as a warning indicator for the second primary tumor of the esophagus (SPTE). In salivary gland tumors, the expression of TFF1, TFF2, and TFF3 is improved, while in oral squamous cell carcinoma the expression of TFF2, TFF3 is depressed, and expression of TFF1 is raised, compare to healthy tissue [34]. However, based on so few studies, we have not yet fully concluded the function of TFFs, especially in head and neck cancer, has led to ESCC, and this issue needs further exploration. In addition, functional network analysis indicates that TFF1 may play a role in ESCA by participating in the NF-&kappaB signaling pathway, Hippo signaling pathway, however, the specific way through which it works need further confirmation.

In summary, our results show that high expression of TFF1 is found in ESCA patients, which is related to the stage of clinical cancer, with the highest expression in tumor stage Ⅰ. In addition, ESCA patients have also observed that TFF1 mutations are mainly deletions, and TFF1 gene mutations are associated with shorter OS in ESCA patients. These results indicate that TFF1 may be a new tumor marker and provide new targets and strategies for the diagnosis and treatment of ESCA.


  1. Natural Science Foundation of Guangxi Province (Grant number: 2017GXNSFAA198045).
  2. Natural Science Foundation of Guangxi Province (Grant number: 2017GXNSFAA198063).

Availability of Data and Materials

The datasets used and/or analyzed during the present study are available from the corresponding authors on reasonable request.

Watch the video: Cancer Genomics Part 3: Easy guide to access RNA sequencing data of cancer Example: Cbioportal (June 2022).


  1. Shu

    probably yes

  2. Blaeey

    whether there are analogues?

  3. Asa

    Granted, this is a wonderful thing

  4. Olivier

    Wacker, it seems to me a great idea

  5. Rico

    but another variant is?

  6. Beldon

    This is accurate information

  7. Marvin

    Rest assured.

  8. Westleah

    I think I make mistakes. Let us try to discuss this. Write to me in PM, it talks to you.

  9. Kellan

    This is a common conditionality

Write a message