The cell line transcriptome

The word transcriptome refers to the full set of RNA molecules that are transcribed from the genome in a population of cells, or in a specific cell, at a given time point. In contrast to the genome, which is characterized by its stability across different cell types within an organism, the transcriptome varies greatly between cell types, developmental stages, and in response to internal or external cues. The plastic nature of the transcriptome, and its potential to serve as a proxy for cellular identity and diversity, makes it appealing to study and the advances in high-throughput technologies has made it possible to analyze RNA expression in great detail.

In the Cell Atlas, the expression of 19670 protein-coding genes are analyzed by RNA sequencing of mRNA extracted from unsynchronized log phase growing cells. The expression level of gene-specific transcripts are given as normalized expression (NX) values, and transcripts with NX values ≥1 are considered as detected. Genes are then classified according to the specificity and distribution of mRNA expression across a panel of 69 different human cell lines (Figure 1, Thul PJ et al. (2017)).

The Cell Atlas presents RNA expression for 98% (n=19242) of all protein-coding human genes, which can be used for various analyses of transcriptomics, as well as a resource for selection of cell lines expressing particular genes of interest.

A diversity of cell lines

The 69 different cell lines used in the Cell Atlas have been selected to represent various cell populations in different tissue types and organs of the human body. The selection also aims at mimicking to the origin and phenotype of solid cancer types represented in the Pathology Atlas (Uhlen et al., 2017), abut with an additional emphasis on cancer cell types in the hematopoietic and immune systems. In addition to cancer-derived cell lines, there is a number of cell lines that have been generated through in vitro protocols for immortalization of normal cells, some primary cell lines and one type of induced pluripotent stem cells. Details regarding the different cell lines can be found here.

Cell lines are adapted to cultivation in vitro and many of the cell lines used in the Cell Atlas are human cancer cell lines. While this in some aspects limit their ressemblance to normal human cells in the context of tissues and organs, unbiased hierarchical clustering of global RNA expression (Figure 1) shows that the cell lines cluster well in agreement with similarities in origin and phenotype of the cancer cells from which thy are derived. Groups of related cell lines, such as the immortalized and transformed fibroblastic cell lines (BJ derivatives), the glioma cell lines(U-138 MG and U-251 MG), the melanoma cell lines (WM-115 and SK-MEL-30), the breast cancer cell lines (SK-BR-3, MCF7 and T47d) and the endothelial cell lines (TIME and HUVEC), cluster closely together. At the highest level of separation, cell lines that grow in solution and also represent hematopoietic and lymphoid cell systems cluster together and separate into two major clusters dependent on their myeloid or lymphoid origin/phenotype.


Figure 1. Hierarchical clustering based on RNA sequencing data for the 69 cell lines. The color of the cell line name represents its origin: Grey - Lymphoid, Light red - Muscle, Dark red - Myeloid, Bright green - Mesenchymal, Green - Pancreas, Dark green - Lung, Yellow bold - Brain, Yellow thin - Eye, Light pink - Proximal digestive tract, Pink - Female reproductive system, Dark pink - Endothelial, Beige - Skin, Orange - Kidney and urinary bladder, Blue - Gastrointestinal tract, Light blue - Male reproductive system, Light purple - Liver and gallbladder.

Specificity of RNA expression

Approximately one third of all protein-coding genes (n=6186) are expressed in all cell lines, which is indicative of roles in fundamental cellular functions, or 'house-keeping' functions, for the corresponding proteins (Figure 2). In contrast, 2% (n=428) of all protein-coding genes were not detected in any of the analyzed cell lines, suggesting that the corresponding proteins are only expressed in unrepresented cell types, during specific developmental stages or under specific conditions, such as cellular stress. 1640 of the protein-coding genes display high RNA expression in a single cell line, while 1517 display high RNA expression in a smaller group of cell lines, relative to any of the other cell lines. 8849 of the protein-coding genes show elevated RNA expression in a group of cell lines compared to the average expression in all other cell lines. Table 1 shows the distribution of genes within these expression categoried for each of the analyzed cell lines.

Figure 2. Pie chart showing the number of genes in the different RNA-based categories of gene expression in the panel of cell lines.

Table 1. Table showing the number of detected genes per cell line based on RNA sequencing (NX ≥1), and the number of genes in the enriched and enhanced categories.

Cell line Detectable genes Enriched genes Group enriched genes Enhanced genes
A-431 11378 8 26 270
A549 11761 7 35 324
AF22 11829 26 82 535
AN3-CA 11349 20 31 354
ASC diff 11377 31 65 571
ASC TERT1 11413 2 37 481
BEWO 11780 54 114 620
BJ 11655 3 17 276
BJ hTERT+ 11579 14 36 403
BJ hTERT+ SV40 Large T+ 11316 0 7 120
BJ hTERT+ SV40 Large T+ RasG12V 11355 1 6 142
CACO-2 11525 20 81 430
CAPAN-2 12003 12 59 530
Daudi 10312 13 74 395
EFO-21 12273 22 63 448
fHDF/TERT166 11440 8 22 390
GAMG 11829 14 31 312
HaCaT 11766 19 79 475
HAP1 11297 6 35 258
HBEC3-KT 11148 6 27 251
HBF TERT88 10878 0 2 99
HDLM-2 11134 85 74 590
HEK 293 11911 12 35 407
HEL 11166 54 113 483
HeLa 11877 18 41 407
Hep G2 11370 97 125 472
HHSteC 11309 6 29 322
HL-60 10202 3 29 233
HMC-1 11534 71 101 675
HSkMC 11799 15 68 491
hTCEpi 11291 19 51 377
hTEC/SVTERT24-B 11321 2 8 164
hTERT-HME1 10823 3 22 237
hTERT-RPE1 11674 7 19 406
HUVEC TERT2 11101 16 64 346
JURKAT 11374 7 58 305
K-562 10735 22 75 331
Karpas-707 11095 37 89 668
LHCN-M2 11209 11 25 257
MCF7 11380 11 36 456
MOLT-4 10412 20 60 280
NB-4 11275 28 83 517
NTERA-2 12345 45 120 597
OE19 11290 55 108 570
PC-3 11747 6 40 338
REH 10918 20 51 349
RH-30 11197 37 42 370
RPMI-8226 11107 35 93 507
RPTEC TERT1 11753 34 67 451
RT4 11634 39 83 533
SCLC-21H 12411 110 182 819
SH-SY5Y 12198 54 129 660
SiHa 11420 4 26 237
SK-BR-3 11252 36 64 559
SK-MEL-30 11420 31 44 376
SuSa 12401 20 99 487
T-47d 11779 20 59 504
THP-1 11539 38 80 455
TIME 11372 5 52 452
U-138 MG 11448 7 13 257
U-2 OS 12631 39 73 439
U-2197 11396 19 34 375
U-251 MG 11110 2 10 140
U-266/70 11678 51 108 737
U-266/84 11075 30 84 485
U-698 10250 21 65 392
U-87 MG 11817 14 33 416
U-937 10954 21 74 411
WM-115 11707 17 42 362

The cell line transcriptomes have been compared to the bulk transcriptomes of 37 different normal tissues and organs analyzed in the Tissue Atlas (Uhlén M et al. (2015)).There are 65 protein-coding genes that are only expressed in the panel of cell lines and not detected in any of the analyzed normal tissue types, while there are 277 protein-coding genes that are only expressed in normal human tissues and not detected in any of the analyzed cell lines. Several of the proteins in the latter category encode proteins that have functions associated with differentiated cells in specialized tissues or subcompartments of tissues, which are not represented in the cell line panel. One example is ADAM30, which is expressed in spermatids of human testis.

  • 65 genes found only in cell lines and not tissues
  • 277 genes found only in tissues and not cell lines

Cell line enriched genes

Overall, there is a large degree of agreement between the RNA expression categories in cell lines and tissues. A majority of the cell line enriched genes, defined as having at least four times higher RNA expression in a single cell line compared to any other cell line, also belong to the tissue elevated gene expression categories (tissue enriched, group enriched and tissue enhanced). For example, the secreted proteins AHSG and ALB that are only expressed in normal liver tissue, are also highly enriched in the liver derived cell line Hep-G2, where immunofluorescent analysis shows localizations to the secretory pathway. The transcription factor HOXB13 that shows expression inthe prostate, colon and rectum, is also enriched in the prostate-derived cell line PC-3, where it is localized to the nucleoplasm. The adhesion glycoprotein CDH15 that is enriched in skeletal muscle tissue is also enriched in the sarcoma cell line RH-30, with some expression in the other sarcoma cell line LHCN-M2. The enzyme TYR that is exclusively expressed in skin is highly enriched in the melanoma-derived skin cell line SK-MEL-30, while the epidermal growth factor receptor EGFR that is enriched in female tissues and skin, is enriched in the other skin-derived cell line A-431. The expression pattern in normal tissues and function of these proteins relate to the specific traits and functions of the corresponding normal tissue type and organ.


AHSG

ALB

HOXB13

AHSG - Hep G2

ALB - Hep G2

HOXB13 - PC-3

CDH15

TYR

EGFR

CDH15 - RH-30

TYR - SK-MEL-30

EGFR - A-431

Figure 3. Examples of proteins with enriched expression in a cell line and the corresponding tissue of origin. The proteins are AHSG, ALB, HOXB13, CDH15, TYR, and EGFR. The immunohistochemical (IHC) staining shows the protein expression pattern in tissue in brown. The immunofluorescent (IF) staining shows the protein subcellular expression pattern in cell lines in green. The nucleus and microtubules are shown in blue and red respectively in the IF images.

Relevant links and publications

Parikh K et al., Colonic epithelial cell diversity in health and inflammatory bowel disease. Nature. (2019)
PubMed: 30814735 DOI: 10.1038/s41586-019-0992-y

Menon M et al., Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration. Nat Commun. (2019)
PubMed: 31653841 DOI: 10.1038/s41467-019-12780-8

Wang L et al., Single-cell reconstruction of the adult human heart during heart failure and recovery reveals the cellular landscape underlying cardiac function. Nat Cell Biol. (2020)
PubMed: 31915373 DOI: 10.1038/s41556-019-0446-7

Wang Y et al., Single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine. J Exp Med. (2020)
PubMed: 31753849 DOI: 10.1084/jem.20191130

Liao J et al., Single-cell RNA sequencing of human kidney. Sci Data. (2020)
PubMed: 31896769 DOI: 10.1038/s41597-019-0351-8

MacParland SA et al., Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations. Nat Commun. (2018)
PubMed: 30348985 DOI: 10.1038/s41467-018-06318-7

Vieira Braga FA et al., A cellular census of human lungs identifies novel cell states in health and in asthma. Nat Med. (2019)
PubMed: 31209336 DOI: 10.1038/s41591-019-0468-5

Vento-Tormo R et al., Single-cell reconstruction of the early maternal-fetal interface in humans. Nature. (2018)
PubMed: 30429548 DOI: 10.1038/s41586-018-0698-6

Qadir MMF et al., Single-cell resolution analysis of the human pancreatic ductal progenitor cell niche. Proc Natl Acad Sci U S A. (2020)
PubMed: 32354994 DOI: 10.1073/pnas.1918314117

Solé-Boldo L et al., Single-cell transcriptomes of the human skin reveal age-related loss of fibroblast priming. Commun Biol. (2020)
PubMed: 32327715 DOI: 10.1038/s42003-020-0922-4

Henry GH et al., A Cellular Anatomy of the Normal Adult Human Prostate and Prostatic Urethra. Cell Rep. (2018)
PubMed: 30566875 DOI: 10.1016/j.celrep.2018.11.086

Chen J et al., PBMC fixation and processing for Chromium single-cell RNA sequencing. J Transl Med. (2018)
PubMed: 30016977 DOI: 10.1186/s12967-018-1578-4

Guo J et al., The adult human testis transcriptional cell atlas. Cell Res. (2018)
PubMed: 30315278 DOI: 10.1038/s41422-018-0099-2

Uhlen M et al., A proposal for validation of antibodies. Nat Methods. (2016)
PubMed: 27595404 DOI: 10.1038/nmeth.3995

Stadler C et al., Systematic validation of antibody binding and protein subcellular localization using siRNA and confocal microscopy. J Proteomics. (2012)
PubMed: 22361696 DOI: 10.1016/j.jprot.2012.01.030

Poser I et al., BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals. Nat Methods. (2008)
PubMed: 18391959 DOI: 10.1038/nmeth.1199

Skogs M et al., Antibody Validation in Bioimaging Applications Based on Endogenous Expression of Tagged Proteins. J Proteome Res. (2017)
PubMed: 27723985 DOI: 10.1021/acs.jproteome.6b00821

Takahashi H et al., 5' end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat Protoc. (2012)
PubMed: 22362160 DOI: 10.1038/nprot.2012.005

Lein ES et al., Genome-wide atlas of gene expression in the adult mouse brain. Nature. (2007)
PubMed: 17151600 DOI: 10.1038/nature05453

Kircher M et al., Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. (2012)
PubMed: 22021376 DOI: 10.1093/nar/gkr771

Pollard TD et al., Actin, a central player in cell shape and movement. Science. (2009)
PubMed: 19965462 DOI: 10.1126/science.1175862

Mitchison TJ et al., Actin-based cell motility and cell locomotion. Cell. (1996)
PubMed: 8608590 

Pollard TD et al., Molecular Mechanism of Cytokinesis. Annu Rev Biochem. (2019)
PubMed: 30649923 DOI: 10.1146/annurev-biochem-062917-012530

dos Remedios CG et al., Actin binding proteins: regulation of cytoskeletal microfilaments. Physiol Rev. (2003)
PubMed: 12663865 DOI: 10.1152/physrev.00026.2002

Campellone KG et al., A nucleator arms race: cellular control of actin assembly. Nat Rev Mol Cell Biol. (2010)
PubMed: 20237478 DOI: 10.1038/nrm2867

Rottner K et al., Actin assembly mechanisms at a glance. J Cell Sci. (2017)
PubMed: 29032357 DOI: 10.1242/jcs.206433

Bird RP., Observation and quantification of aberrant crypts in the murine colon treated with a colon carcinogen: preliminary findings. Cancer Lett. (1987)
PubMed: 3677050 DOI: 10.1016/0304-3835(87)90157-1

HUXLEY AF et al., Structural changes in muscle during contraction; interference microscopy of living muscle fibres. Nature. (1954)
PubMed: 13165697 

HUXLEY H et al., Changes in the cross-striations of muscle during contraction and stretch and their structural interpretation. Nature. (1954)
PubMed: 13165698 

Svitkina T., The Actin Cytoskeleton and Actin-Based Motility. Cold Spring Harb Perspect Biol. (2018)
PubMed: 29295889 DOI: 10.1101/cshperspect.a018267

Kelpsch DJ et al., Nuclear Actin: From Discovery to Function. Anat Rec (Hoboken). (2018)
PubMed: 30312531 DOI: 10.1002/ar.23959

Malumbres M et al., Cell cycle, CDKs and cancer: a changing paradigm. Nat Rev Cancer. (2009)
PubMed: 19238148 DOI: 10.1038/nrc2602

Massagué J., G1 cell-cycle control and cancer. Nature. (2004)
PubMed: 15549091 DOI: 10.1038/nature03094

Hartwell LH et al., Cell cycle control and cancer. Science. (1994)
PubMed: 7997877 DOI: 10.1126/science.7997877

Barnum KJ et al., Cell cycle regulation by checkpoints. Methods Mol Biol. (2014)
PubMed: 24906307 DOI: 10.1007/978-1-4939-0888-2_2

Weinberg RA., The retinoblastoma protein and cell cycle control. Cell. (1995)
PubMed: 7736585 DOI: 10.1016/0092-8674(95)90385-2

Morgan DO., Principles of CDK regulation. Nature. (1995)
PubMed: 7877684 DOI: 10.1038/374131a0

Teixeira LK et al., Ubiquitin ligases and cell cycle control. Annu Rev Biochem. (2013)
PubMed: 23495935 DOI: 10.1146/annurev-biochem-060410-105307

King RW et al., How proteolysis drives the cell cycle. Science. (1996)
PubMed: 8939846 DOI: 10.1126/science.274.5293.1652

Cho RJ et al., Transcriptional regulation and function during the human cell cycle. Nat Genet. (2001)
PubMed: 11137997 DOI: 10.1038/83751

Whitfield ML et al., Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell. (2002)
PubMed: 12058064 DOI: 10.1091/mbc.02-02-0030.

Boström J et al., Comparative cell cycle transcriptomics reveals synchronization of developmental transcription factor networks in cancer cells. PLoS One. (2017)
PubMed: 29228002 DOI: 10.1371/journal.pone.0188772

Lane KR et al., Cell cycle-regulated protein abundance changes in synchronously proliferating HeLa cells include regulation of pre-mRNA splicing proteins. PLoS One. (2013)
PubMed: 23520512 DOI: 10.1371/journal.pone.0058456

Ohta S et al., The protein composition of mitotic chromosomes determined using multiclassifier combinatorial proteomics. Cell. (2010)
PubMed: 20813266 DOI: 10.1016/j.cell.2010.07.047

Ly T et al., A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells. Elife. (2014)
PubMed: 24596151 DOI: 10.7554/eLife.01630

Pagliuca FW et al., Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery. Mol Cell. (2011)
PubMed: 21816347 DOI: 10.1016/j.molcel.2011.05.031

Ly T et al., Proteomic analysis of the response to cell cycle arrests in human myeloid leukemia cells. Elife. (2015)
PubMed: 25555159 DOI: 10.7554/eLife.04534

Dueck H et al., Variation is function: Are single cell differences functionally important?: Testing the hypothesis that single cell variation is required for aggregate function. Bioessays. (2016)
PubMed: 26625861 DOI: 10.1002/bies.201500124

Snijder B et al., Origins of regulated cell-to-cell variability. Nat Rev Mol Cell Biol. (2011)
PubMed: 21224886 DOI: 10.1038/nrm3044

Thul PJ et al., A subcellular map of the human proteome. Science. (2017)
PubMed: 28495876 DOI: 10.1126/science.aal3321

Cooper S et al., Membrane-elution analysis of content of cyclins A, B1, and E during the unperturbed mammalian cell cycle. Cell Div. (2007)
PubMed: 17892542 DOI: 10.1186/1747-1028-2-28

Davis PK et al., Biological methods for cell-cycle synchronization of mammalian cells. Biotechniques. (2001)
PubMed: 11414226 DOI: 10.2144/01306rv01

Domenighetti G et al., Effect of information campaign by the mass media on hysterectomy rates. Lancet. (1988)
PubMed: 2904581 DOI: 10.1016/s0140-6736(88)90943-9

Scialdone A et al., Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods. (2015)
PubMed: 26142758 DOI: 10.1016/j.ymeth.2015.06.021

Sakaue-Sawano A et al., Visualizing spatiotemporal dynamics of multicellular cell-cycle progression. Cell. (2008)
PubMed: 18267078 DOI: 10.1016/j.cell.2007.12.033

Grant GD et al., Identification of cell cycle-regulated genes periodically expressed in U2OS cells and their regulation by FOXM1 and E2F transcription factors. Mol Biol Cell. (2013)
PubMed: 24109597 DOI: 10.1091/mbc.E13-05-0264

Semple JW et al., An essential role for Orc6 in DNA replication through maintenance of pre-replicative complexes. EMBO J. (2006)
PubMed: 17053779 DOI: 10.1038/sj.emboj.7601391

Kilfoil ML et al., Stochastic variation: from single cells to superorganisms. HFSP J. (2009)
PubMed: 20514130 DOI: 10.2976/1.3223356

Ansel J et al., Cell-to-cell stochastic variation in gene expression is a complex genetic trait. PLoS Genet. (2008)
PubMed: 18404214 DOI: 10.1371/journal.pgen.1000049

Colman-Lerner A et al., Regulated cell-to-cell variation in a cell-fate decision system. Nature. (2005)
PubMed: 16170311 DOI: 10.1038/nature03998

Liberali P et al., Single-cell and multivariate approaches in genetic perturbation screens. Nat Rev Genet. (2015)
PubMed: 25446316 DOI: 10.1038/nrg3768

Elowitz MB et al., Stochastic gene expression in a single cell. Science. (2002)
PubMed: 12183631 DOI: 10.1126/science.1070919

Kaern M et al., Stochasticity in gene expression: from theories to phenotypes. Nat Rev Genet. (2005)
PubMed: 15883588 DOI: 10.1038/nrg1615

Bianconi E et al., An estimation of the number of cells in the human body. Ann Hum Biol. (2013)
PubMed: 23829164 DOI: 10.3109/03014460.2013.807878

Malumbres M., Cyclin-dependent kinases. Genome Biol. (2014)
PubMed: 25180339 

Collins K et al., The cell cycle and cancer. Proc Natl Acad Sci U S A. (1997)
PubMed: 9096291 

Zhivotovsky B et al., Cell cycle and cell death in disease: past, present and future. J Intern Med. (2010)
PubMed: 20964732 DOI: 10.1111/j.1365-2796.2010.02282.x

Cho RJ et al., A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell. (1998)
PubMed: 9702192 

Spellman PT et al., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. (1998)
PubMed: 9843569 

Orlando DA et al., Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature. (2008)
PubMed: 18463633 DOI: 10.1038/nature06955

Rustici G et al., Periodic gene expression program of the fission yeast cell cycle. Nat Genet. (2004)
PubMed: 15195092 DOI: 10.1038/ng1377

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Cellosaurus