Tremendous amount of RNA sequencing data have been produced by large consortium projects such as TCGA and GTEx, creating new opportunities for data mining and deeper understanding of gene functions. We introduce GEPIA (Gene Expression Profiling Interactive Analysis), a web-based tool to deliver fast and customizable functionalities based on TCGA and GTEx data. GEPIA is available at http://gepia.cancer-pku.cn/.
The HIGHLIGHT of GEPIA:
1. GEPIA is an interactive web-based tool for gene expression analysis based on 9,736 tumors and 8,587 normal samples from the TCGA and the GTEx databases, using a standard processing pipeline for RNA sequencing data.
2. GEPIA provides key interactive functions including differential expression analysis, customizable profiling plotting, correlation analysis, patient survival analysis,similar gene detection and dimensionality reduction analysis.
3. Analysis results cover ~20,000 coding and ~25,000 non-coding genes, as well as ~14,000 pseudogenes and ~400 T-cell receptor segments.
4. GEPIA provides rapid and customizable selections and publication-quality vector statistical plots for commonly used analyses (We also provide the tutorial for modifying vector statistical plots using Adobe Illustrator).
5. GEPIA automatically adjusts the look and feel according to different browsers and devices, ranging from desktop computers to tablets and smart phones.
There are some examples of GEPIA usage:
With GEPIA, experimental biologists can easily explore the TCGA and GTEx datasets, find answers for their questions, and test their hypotheses.
In differential analysis
and expression profile
, users can easily discover differentially expressed genes, such as MPO in leukemia and UPK2 in bladder cancer.
MPO specifically expressed in leukemia:
UPK2 specifically expressed in bladder cancer:
The chromosomal distribution of over- or under- expressed genes can be plotted in Differential Genes
.
Over-expressed genes:
Under-expressed genes:
Both over-expressed and under-expressed genes:
In Survival
analysis, genes with the most significant association with patient survival can be identified, such as MCTS1 in breast cancer and HILPDA in liver cancer.
MCTS1 in breast cancer
HILPDA in liver cancer:
Gene expression is visualized by both a bodymap and a bar plot in General
.
Gene expression by pathological stage is plotted in Stage plot
.
Users can compare the expression of one gene in multiple cancers by Boxplot
, or compare multiple genes by a matrix plot in Multiple gene comparison
.
Boxplot:
Matrix plot:
GEPIA provides pair-wise gene correlation
analysis of a given set of TCGA and/or GTEx expression data. Normalization is optional and customizable.
GEPIA provides Principal Component Analysis of multiple genes and cancer types in PCA
, and presents results by 2D or 3D plots.
2D plots:
3D plots:
Variances distribution:
Genes with similar expression pattern can be identified in Similar Genes
, for example, PGAP3 and GRB7 are similar to ERBB2.
ERBB2:
PGAP3:
GRB7: