Nomenclature |
---|
HUGO :
The Human Gene Nomenclature Database
(Hinxton, Uk)
The Human Gene Nomenclature Database Search tool provides access to the list of currently
approved human gene symbols as
maintained by the HUGO gene nomenclature committee. Many previously approved symbols are
also listed, with links
directing users to the current symbol. Minor changes to a previously approved symbol,
such as adding a number (eg NRAMP
becomes NRAMP1), may not be listed in this way, so users should try a "Symbol begins
with" search using the first few letters
of a symbol, instead of an exact search, if they fail to find a specific symbol.
Other symbols used in the literature (known as aliases) are collected and stored
by the HUGO Nomenclature Committee, and are now
searchable with this tool. The "Find a
gene" facility in GDB may be useful to search for other names/symbols which cannot
be found in the Human Gene
Nomenclature Database.
LRG
(Hinxton, UK)
A Locus Reference Genomic (LRG) record contains stable reference sequences
that are used for reporting sequence variants with clinical implications
International Classification of Diseases for oncology
(WHO, IARC, Lyon, Fr)
Purpose/Definition
Used principally in tumour or cancer registries for coding the site
(topography) and the histology (morphology) of neoplasms, usually
obtained from a pathology report.
Classification structure
A multi-axial classification of the site, morphology, behaviour, and grading of neoplasms.
The topography axis uses the ICD-10 classification of malignant
neoplasms (except those categories which relate to secondary neoplasms
and to specified morphological types of tumours) for all types of
tumours, thereby providing greater site detail for non-malignant tumours
than is provided in ICD-10. In contrast to ICD-10, the ICD-O includes
topography for sites of haematopoietic and reticuloendothelial tumours.
The morphology axis provides five-digit codes ranging from M-8000/0 to
M-9989/3. The first four digits indicate the specific histological term.
The fifth digit after the slash (/) is the behaviour code, which
indicates whether a tumour is malignant, benign, in situ, or uncertain
(whether benign or malignant).
A separate one-digit code is also provided for histologic grading (differentiation).
Portals |
---|
ICGC Data Portal
(Ontario, Ca)
The ICGC Data Portal provides tools for visualizing, querying and
downloading the data released quarterly by the consortium's member
projects.
The Pancancer Analysis of Whole Genomes (PCAWG) study is an
international collaboration to identify common patterns of mutation in
more than 2,800 cancer whole genomes from the International Cancer
Genome Consortium.
TCGA cBIOPortal
(MSKCC, New_York, Us)
The cBioPortal for Cancer Genomics provides visualization, analysis and download
of large-scale cancer genomics data sets.
Broad Tumor Portal
(Broad Institute, Boston, Us)
Explore dataset by tumor types : Genes, Cancers, DNA Mutations & Annotations
Firebrowse Broad GDAC
(Broad Institute, Boston, Us)
Explore TCGA and Broad dataset by tumor cohort with different type of analyses :
clinical, copy number, correlation, miR, mRNA, mutation, pathways, RPPA ..
intoGen
Integrative Onco Genomics
(Barcelone, Es)
Mutational cancer driver database
OASIS Portal
(Pfizer, Us)
OASIS is an open-access web portal that enables cancer researchers to
perform exploratory and integrative analyses of somatic mutation, copy
number changes (CNV) and gene expression data from thousands of tumor,
normal tissues and cell lines representing a broad spectrum of
malignancies.
OASIS is developed by Pfizer Oncology Research Computational Biology in
collaboration with Research Business Technology (RBT). Please cite
"OASIS: A web-based platform for exploratory analysis of cancer genome
and transcriptome data (manuscript in preparation)" when publishing
results based on OASIS.
Cancer Browser
UCSC (San Diego, Us)
The UCSC Cancer Browser allows researchers to interactively explore
cancer genomics data and its associated clinical information. Data can
be viewed in a variety of ways, including by value, chromosome location,
clinical feature, biological pathway or geneset of interest. It is also
possible to quickly perform and easily view statistical analysis on
subsets of the data.
canSAR
( ICR, Uk)
canSAR is an integrated knowledge-base that brings together
multidisciplinary data across biology, chemistry, pharmacology,
structural biology, cellular networks and clinical annotations, and
applies machine learning approaches to provide drug-discovery useful
predictions.
canSAR's goal is to enable cancer translational research and drug discovery
through providing this knowledg
e to
researchers from across different disciplines. It provides a single information
portal to answer complex mu
lti-disciplinary
questions including - among many others: what is known about a protein,
in which cancers is it expressed or mutated and
what chemical tools and cell line models can be used to experimentally probe
its activity? What is known ab
out a drug,
its cellular sensitivity profile and what proteins is it known to bind that
may explain unusual bioactivity ?
CancerResource
(La Charite, Berlin, De)
Database CancerResource. It is a comprehensive knowledgebase for
drug-target relationships related to cancer as well as for supporting
information or experimental data. Furthermore, large-scale cancer
genomics data is integrated into the CancerResource database including
mRNA expression and non-synonymous mutations data. Therefore,
CancerResource allows an explorative data analysis based on cancer
related drug-target interactions, expression and mutation data as well
as drug sensitivity data.
Cards |
---|
Atlas of Genetics in Oncology and Haematology
(USAL, Spain)
Entrez_Gene
( NCBI, Bethesda, Us)
Entrez_Gene is a part of Entrez devoted to search informations on genes and links
to other database as RefSeq, maps, OMIM, Unigene, Pubmed....
It is developped and maintained by NCBI.
EnSembl
(Sanger_EBI, Hinxton, Uk)
The Ensembl project produces genome databases for vertebrates and other
eukaryotic species, and makes this information freely available online
GeneCards: human genes, proteins and diseases
(Weizmann, Rehovot, Is)
GeneCards is a database of human genes, their products and their
involvement in diseases.
It offers concise information about the functions of all human genes
that have an approved symbol, as well as selected others [gene
listing]. It is especially useful for those who are
searching for information about large sets of genes or proteins,
e.g. for scientists working in functional genomics and proteomics.
AceView
(NCBI, Bethesda, Us)
AceView offers a comprehensive and non-redundant cDNA-supported annotation of
human and nematode genes. Our program
co-aligns the million mRNAs and ESTs available from GenBank,
dbEST and RefSeq on the genome sequence, quality-filters the cDNAs and clusters
them into alternative transcripts an
d genes. By construction, the cooperative accuracy of these sequences,
ESTs or mRNAs, is brought up to the exceptional quality of the genome sequence.
GENATLAS
(Imagine , Paris, Fr)
The GENATLAS database compiles the information relevant to the mapping
efforts of the Human Genome Project. This information is collected from
original articles in the literature or from the proceedings of Human
Gene Mapping and Single Chromosome Workshops. It is repertoried in three
interactive directories GENATLAS/GEN, GENATLAS/ LINK, GENATLAS/REF. A
series of graphical maps GENATLAS/ MAP is associated as well
as a Comparative Map database edited by John H Edwards.
WikiGenes
WikiGenes is a non-profit initiative to provide a global collaborative
knowledge base for the life sciences, where authorship matters.
Search thousands of genes, chemicals, pathologies and much more...
SOURCE
(Princeton, Us)
SOURCE contains two types of pages, Gene Reports and CloneReports.
GeneReports display information about genes including functional,
structural and expression data. GeneReports give an overview of a gene's
biology by describing its protein function, the tissue sources of cDNA
clones associated with the gene, links to microarray experiments that
included the queried gene, and the mapping of the gene within the human
genome.
CloneReports display information about a given cDNA clone (also known as
an Expressed Sequence Tag or EST) including putative ID, the size of
the insert, vector information, and links to BLAST searches and genome
browsing tools. Users can switch between the two types of reports by
clicking on the button at the top of each report page.
GHR
Genetics Home Reference
(Bethesda, Us)
Genetics Home Reference provides consumer-friendly information about the effects
of genetic variation on human health.
miRBase
(Hinxton, Uk)
miRBase: the microRNA database
miRBase provides the following services:
The miRBase database is a searchable database of published miRNA
sequences and annotation. Each entry in the miRBase Sequence database
represents a predicted hairpin portion of a miRNA transcript (termed mir
in the database), with information on the location and sequence of the
mature miRNA sequence (termed miR). Both hairpin and mature sequences
are available for searching and browsing, and entries can also be
retrieved by name, keyword, references and annotation. All sequence and
annotation data are also available for download.
The miRBase Registry provides miRNA gene hunters with unique names for
novel miRNA genes prior to publication of results. Visit the help pages
for more information about the naming service.
dbDEMC2.0
(Cn)
A Database of Differentially Expressed miRNAs in Human Cancers (version 2.0)
dbDEMC (database of Differentially Expressed MiRNAs in human Cancers)
is an integrated database that designed to store and display
differentially expressed microRNAs (miRNAs) in human cancers detected by
high-throughput methods. In this updated version of dbDEMC, a total of
209 newly published data sets were collected from Gene Expression
Omnibus (GEO) and The Cancer Genome Atlas (TCGA).
H-invDB
(Ja)
An Integrated Database of Annotated Human Genes
H-Invitational Database (H-InvDB) is an integrated database of human
genes and transcripts. By extensive analyses of all human transcripts,
we provide curated annotations of human genes and transcripts that
include gene structures, alternative splicing variants, non-coding
functional RNAs, protein functions, functional domains, sub-cellular
localizations, metabolic pathways, protein 3D structure, genetic
polymorphisms (SNPs, indels and microsatellite repeats) , relation with
diseases, gene expression profiling, and molecular evolutionary features
, protein-protein interactions (PPIs) and gene families/groups. H-InvDB
is produced by the "Genome Information Integration Project" (2005-2008)
based upon the annotation technology established in the H-Invitational
Project for annotation of human full-length cDNAs, and presented as a
key integrated database of human genes in METI integrated database
project (2008-).
Genomic and cartography |
---|
The complete sequence of a human genome
Science 31 Mar 2022 Vol 376, Issue 6588 pp. 44-53
Since its initial release in 2000, the human reference genome has covered only the
euchromatic fraction of the genome, leaving important heterochromatic regions unfinished.
Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T)
Consortium presents a complete 3.055 billion base pair sequence of a human genome,
T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects
errors in the prior references, and introduces nearly 200 million base pairs of
sequence containing 1956 gene predictions, 99 of which are predicted to be protein
coding. The completed regions include all centromeric satellite arrays, recent
segmental duplications, and the short arms of all five acrocentric chromosomes,
unlocking these complex regions of the genome to variational and functional studies.
GoldenPath : Human Genome Project Working Draft - Human Genome Browser
(San Diego, Us)
EnSembl : Human Genome Project Working Draft - Ensembl Map view
(Sanger_EBI, Hinxton, Uk)
GENCODE project
(Sanger Institute, Hinxton, Uk)
ImmunoBase
Vega : Human Genome Project Working Draft - Ensembl Map view
(Sanger Institute, Hinxton, Uk)
Genome Data Viewer:
(NCBI, Bethesda, Us)
Unigene
(NCBI, Bethesda, Us)
GenBank
(NCBI, Bethesda, Us)
RefSeq
(NCBI, Bethesda, Us)
CCDS
(UCSC, San Diego, Us)
Easana-Genosplice
(Genosplice, Paris, Fr)
The ArrayExpress Gene Expression Atlas
(EBI, Hinxton, Uk)
GEO Profiles
(NCBI, Bethesda, Us)
SEEK
(Princeton, Us)
MEM - Multi Experiment Matrix
(Est)
Genevestigator
(Us)
BIOGPS
(Scripps, Us)
GTEX Portal
(Broad, Boston Us)
UniProt : Protein Sequence Database
(EBI, Hinxton, Uk)
SwissProt Protein Sequence Database
(SIB, Geneve, Ch)
This page contains links to an assembly of the current draft of the human genome.
The human genome is approximately
3.1 billion bases. Roughly 88% of the genome has been sequenced by the International
Human Genome Project. The Oct.
7th draft genome is composed of hundreds of thousands of fragments of various sizes.
The order and orientation of the
fragments is often not known from the sequencing process itself. In some cases
the same part of the genome will be
duplicated in several fragments.
Human Genome Browser
MapView
(NCBI, Bethesda, Us)
The Map Viewer provides special browsing capabilities for a subset of
organisms in Entrez Genomes. The organism subset is shown below and also
on the Map Viewer Home Page. Map Viewer allows you to view and search
an organism's complete genome, display chromosome maps, and zoom into
progressively greater levels of detail, down to the sequence data for a
region of interest. The number and types of available maps vary by
organism, and are described in the "data and search tips" file for each
organism. If multiple maps are available for a chromosome, it displays
them aligned to each other based on shared marker and gene names, and,
for the sequence maps, based on a common sequence coordinate system.
Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop
a software system which produces and
maintains automatic annotation on eukaryotic genomes.
The National Human Genome Research Institute (NHGRI) launched a public
research consortium named ENCODE, the Encyclopedia Of DNA Elements, in
September 2003, to carry out a project to identify all functional
elements in the human genome sequence. After a successful pilot phase on
1% of the genome, the scale-up to the entire genome is now underway.
The Wellcome Trust Sanger Institute was awarded a grant to carry out a
scale-up of the GENCODE project for integrated annotation of gene
features.
Having been involved in successfully delivering the definitive
annotation of functional elements in the human genome, the GENCODE group
were awarded a second grant in 2013 in order to continue their human
genome annotation work and expand GENCODE to include annotation of the
mouse genome.
The GENCODE gene sets are used by the entire ENCODE consortium and by
many other projects (eg. 1000 Genomes) as reference gene sets.
ImmunoBase is a web based resource focused on the genetics and genomics of
immunologically related human diseases. Our mission is to provide a curated
and integrated set of datasets and tools, across multiple diseases, to support
and promote research in this area.
Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop
a software system which prod
uces and
maintains automatic annotation on eukaryotic genomes.
The NCBI Genome Data Viewer (GDV) is a genome browser supporting the exploration
and analysis of eukaryotic RefSeq genome assemblies. It allows users to visualize
different types of sequence-associated data in a genomic context. Genome Data Viewer
is also used by different NCBI resources, such as GEO, to display datasets associated
with specified experiments or samples in a genome browser context. Release notes
are available for each browser version, describing new features and bug fixes.
Videos are available on the GDV playlist to help you get started with various browser features.
UniGene is an experimental system for automatically partitioning GenBank
sequences into a non-redundant set of gene-oriented clusters. Each UniGene
cluster contains sequences that represent a unique gene, as well as related
information such as the tissue types in which the gene has been expressed and
map location.
Gene and transcription
GenBank is the NIH's database of all known nucleotide and protein
sequences including supporting bibliographic and biological information.
Since 1992 it has been based at the National Center for Biotechnology
Information (NCBI), a division of the National Library of Medicine, located
on the NIH campus. NCBI was created by Congress in 1988 and specifically
charged with developing automated information systems to support
molecular biology and biotechnology. Its other mission is to conduct basic
research and as part of the NIH Intramural Program, NCBI scientists pursue
research in genome analysis, molecular structure modeling and prediction,
and mathematical methods for sequence analysis.
The NCBI Reference Sequence project (RefSeq) will provide reference
sequence standards for the naturally occurring molecules of the central dogma,
from chromosomes to mRNAs to proteins. RefSeq standards provide a foundation
for the functional annotation of the human genome. They provide a stable reference
point for mutation analysis, gene expression studies, and polymorphism discovery.
The Consensus CDS (CCDS) project is a collaborative effort to identify a
core set of human and mouse protein coding regions that are
consistently annotated and of high quality. The long term goal is to
support convergence towards a standard set of gene annotations.
the Friendly Alternative Splicing and Transcripts Database
FAST DB: a website ressource for the study of the expression regulation of human gene products.
Fast DB provides three kinds of analysis: human mRNAs, human mRNAs and ESTs, and mouse mRNAs.
The ArrayExpress Gene Expression Atlas is a semantically enriched
database of meta-analysis based summary statistics over a curated su
bset of ArrayExpress Archive, servicing queries for condition-specific
gene expression patterns as well as broader exploratory searche
s for biologically interesting genes/samples.
To cite the Atlas in your research or to learn more about it, please
refer to Kapushesky M et al. (2009) Gene Expression Atlas at the
European Bioinformatics Institute, Nucleic Acids Research Database Issue
(NAR 2009)
This database stores individual gene expression profiles from curated
DataSets in the Gene Expression Omnibus (GEO) repository. Search
for specific profiles of interest based on gene annotation or
pre-computed profile characteristics.
Search-Based Exploration of Expression Compendium [Human]
SEEK is a computational gene co-expression search engine. SEEK provides
biologists with a way to navigate the massive human expression
compendium that now contains thousands of expression datasets. SEEK
returns a robust ranking of co-expressed genes in the biological area of
interest defined by the user's query genes. In the meantime, it also
prioritizes thousands of expression datasets according to the user's
query of interest. The unique strengths of SEEK include its support for
multi-gene query and cross-platform analysis, as well as its rich
visualization features.
Mining for coexpression across hundreds of datasets using novel rank aggregation and visualisation methods.
GENEVESTIGATOR : The world's expression data at your fingertips
Characterize genes by finding out where, when and in response to what they are expressed
Learn more from your experiments by integrating and comparing them with public datasets
Explore expertly curated experiments to find supporting evidence for your hypotheses
Discover and prioritize your biomarkers and targets against thousands of conditions
A free extensible and customizable gene annotation portal, a complete resource for
learning about gene and protein function.
The GTEx Project
Correlations between genotype and tissue-specific gene expression levels
will help identify regions of the genome that influence whether and how
much a gene is expressed. GTEx will help researchers to understand
inherited susceptibility to disease and will be a resource database and
tissue bank for many studies in the future.
The Genotype-Tissue Expression (GTEx) project aims to provide to the
scientific community a resource with which to study human gene
expression and regulation and its relationship to genetic variation.
This project will collect and analyze multiple human tissues from donors
who are also densely genotyped, to assess genetic variation within
their genomes. By analyzing global RNA expression within individual
tissues and treating the expression levels of genes as quantitative
traits, variations in gene expression that are highly correlated with
genetic variation can be identified as expression quantitative trait
loci, or eQTLs.
Protein : sequence, function, domain, 3D structure
The mission of UniProt is to provide the scientific community with a
comprehensive, high-quality and freely accessible resource of protein
sequence and functional information.
The SWISS-PROT Protein Sequence Database is a database of protein
sequences produced collaboratively by Amos Bairoch (University of Geneva)
and the EMBL Data Library. The data in Swiss-Prot are derived from
translations of DNA sequences from the EMBL Nucleotide Sequencef
Database, adapted from the Protein Identification Resource (PIR) collection,
extracted from the literature and directly submitted by researchers. It contains
high-quality annotation,is non-redundant, and cross-referenced to several
other databases, notably the EMBL nucleotide sequence database, PROSITE
pattern database and PDB. SWISS-PROT is a curated protein sequence
database which strives to provide a high level of annotation (such as the
description of the function of a protein, its domain structure,
post-translational modifications, variants, etc), a minimal level of
redundancy and a high level of integration with other databases. Recent
developments of the database include: an increase in the number and scope
of model organisms; cross-references to seven additional databases; a variety
of new documentation files; the creation of TREMBL, an unannotated
supplement to SWISS-PROT. This supplement consists of entries in
SWISS-PROT-like format derived from the translation of all coding
sequences (CDS) in the EMBL nucleotide sequence database, except CDS
already included in SWISS-PROT.
NextProt :
Exploring the universe of human proteins
(SIB, Geneve, Ch)
Developed in collaboration between the SIB Swiss Institute of
Bioinformatics and Geneva Bioinformatics (GeneBio) SA, neXtProt will be a
comprehensive human-centric discovery platform, offering its users a
seamless integration of and navigation through protein-related data.
ENZYME
(SIB, Geneve, Ch)
The ENZYME data bank contains the following data for each type of characterized enzyme
for which an EC number has
been provided: EC number, Recommended name, Alternative names, Catalytic activity,
Cofactors, Pointers to the
SWISS-PROT entrie(s) that correspond to the enzyme, Pointers to disease(s) associated
with a deficiency of the
enzyme.
INTENZ
(EBI, Uk)
IntEnz (Integrated relational Enzyme database) is a freely available
resource focused on enzyme nomenclature. IntEnz is created in
collaboration with the Swiss Institute of Bioinformatics (SIB). This
collaboration is responsible for the production of the ENZYME resource.
IntEnz contains the recommendations of the Nomenclature Committee of the
International Union of Biochemistry and Molecular Biology (NC-IUBMB) on
the nomenclature and classification of enzyme-catalysed reactions.
PhosPhoSitePlus
(Denvers, Us)
PhosphoSitePlus (PSP) is an online systems biology resource providing
comprehensive information and tools for the study of protein
post-translational modifications (PTMs) including phosphorylation,
ubiquitination, acetylation and methylation. See About PhosphoSite above
for more information.
Prosite :
Protein signatures
(SIB, Geneve, Ch)
The PROSITE database consists of a large collection of biologically meaningful
signatures that are described
as patterns or profiles. Each signature is linked to documentation that provides
useful biological information
on the protein family, domain or functional site identified by the signature.
The PROSITE web page has been
redesigned and several tools have been implemented to help the user discover
new conserved regions in their
own proteins and to visualize domain arrangements. We also introduced the facility
to search PDB with a
PROSITE entry or a user's pattern and visualize matched positions on 3D structures.
The latest version of
PROSITE (release 18.17 of November 30, 2003) contains 1676 entries.
The database is accessible at
http://www.expasy.org/prosite/.
Interpro :
(Integrated Resource of Protein domains and Functionnal sites)
(EBI, Hinxton, Uk)
release 1.0 (March 2000) was built from Pfam 5.0, PRINTS 25.0, PROSITE 16 and the
current SWISS-PROT + TrEMBL data. This release of InterPro contains 2990 entries, representing
2373 families, 556 domains, 47 repeats and 14 post-translational modification sites encoded by 4884
different regular expressions, profiles, fingerprints and HMMs.
Interpro is a useful resource for whole genome analysis and has already been used for the proteome
analysis of a number of completely sequenced organisms. A preliminary proteome analysis was also
produced for the human genome.
PFAM - Sanger Center
(Sanger, Hinxton, Uk)
Pfam is a large collection of multiple sequence alignments and hidden Markov models
covering many common protein domains
Pfam is a collection of protein families and domains. Pfam contains
multiple protein alignments and profile-HMMs of these families. Pfam
is a semi-automatic protein family database, which aims to be
comprehensive as well as accurate. This page provides links to various
help documents that are available.
CDD A Conserved Domain Database and Search Service
(NCBI, Bethesda, Us)
Proteins often contain several modules or domains, each with a distinct
evolutionary origin and function. The CD-Search service may be used to
identify the conserved domains present in a protein sequence:
Computational biologists define conserved domains based on recurring sequence
patterns or motifs. CDD
currently contains domains derived from two popular collections, Smart and Pfam,
plus contributions from
colleagues at NCBI. The source databases also provide descriptions and links
to citations. Since
conserved domains correspond to compact structural units, CDs contain links
to 3D-structure via Cn3D whenever possible.
DMDM
Domain mapping of disease mutations (DMDM)
(Baltimore, Us)
Domain mapping of disease mutations (DMDM) is a database in which each
disease mutation can be displayed by its gene, protein or domain
location. DMDM provides a unique domain-level view where all human
coding mutations are mapped on the protein domain. To build DMDM, all
human proteins were aligned to a database of conserved protein domains
using a Hidden Markov Model-based sequence alignment tool (HMMer). The
resulting protein-domain alignments were used to provide a domain
location for all available human disease mutations and polymorphisms.
The number of disease mutations and polymorphisms in each domain
position are displayed alongside other relevant functional information
(e.g. the binding and catalytic activity of the site and the
conservation of that domain location). DMDM's protein domain view
highlights molecular relationships among mutations from different
diseases that might not be clearly observed with traditional
gene-centric visualization tools.
PRODOM
(PRABI, Lyon, Fr)
ProDom is a comprehensive set of protein domain families automatically generated from the
SWISS-PROT and TrEMBL sequence databases
PDB - Protein Database
( San Diego, Us)
The RCSB PDB provides a variety of tools and resources for studying the
structures of biological macromolecules and their relationships to
sequence, function, and disease
PDBSUM
( EBI, Hinxton, Uk)
PDBsum is a pictorial database providing an at-a-glance overview of
every macromolecular structure deposited in the Protein Data Bank (PDB).
It provides schematic diagrams of the molecules in each structure and of
the interactions between them. Entries are accessed by their PDB code,
by simple text search, or through any of the browse options on the left.
IMB
(Jena, De)
The Jena Library of Biological Macromolecules (JenaLib) is aimed at a
better dissemination of information on three-dimensional biopolymer
structures with an emphasis on visualization and analysis.
SBKB
(Rutgers, Us)
The Structural Biology Knowledgebase provides the latest research data,
resources, and highlights from structural biology and the Protein
Structure Initiative.
AlphaFold PDB-eKB
(EMBL, Hinxton, Uk)
PDBe-KB (Protein Data Bank in Europe - Knowledge Base) is a community-driven resource
managed by the PDBe team, collating functional annotations and predictions for
structure data in the PDB archive. PDBe-KB is a collaborative effort between
PDBe and a diverse group of bioinformatics resources and research teams.
PDBe-KB contains data contributed by projects such as SIFTS and FunPDBe and
aims to place structures from the PDB in their biological context.
SCOP
(Berleley, Us)
SCOPe is a database developed at the Berkeley Lab and UC Berkeley that
extends SCOP (version 1). SCOPe classifies many structures released
since SCOP 1.75 through a combination of automation and manual curation,
and corrects some errors, aiming to have the same accuracy as the fully
hand-curated SCOP releases. SCOPe also incorporates and updates the
Astral database.
CATH
( UCL, London, Uk)
CATH is a classification of protein structure downloaded from the PDB.
Human Protein Atlas
(Upsalla, Su)
The human protein atlas shows expression and localization of proteins in
a large variety of normal human tissues, cancer cells and cell lines
with the aid of immunohistochemistry (IHC) images and immunofluorescence
(IF) confocal microscopy images.
HPRD - Human Protein Reference Database
( John Hopkins, Baltimore, Us)
The Human Protein Reference Database represents a centralized platform
to visually depict and integrate information pertaining to domain
architecture, post-translational modifications, interaction networks and
disease association for each protein in the human proteome. All the
information in HPRD has been manually extracted from the literature by
expert biologists who read, interpret and analyze the published data.
HPRD has been created using an object oriented database in Zope, an open
source web application server, that provides versatility in query
functions and allows data to be displayed dynamically.
Protein Interaction databases |
---|
STRING ( EMBL)
STITCH
( EMBL)
STITCH : Chemical-Protein Interaction Networks
DIP
( UCLA, Us)
The DIP (Database of Interacting Proteins) database lists protein pairs
that are known to interact with each other. By interact we mean that two
amino acid chains were experimentally identified to bind to each other.
The database lists such pairs to aid those studying a particular
protein-protein interaction but also those investigating entire
regulatory and signaling pathways as well as those studying the
organisation and complexity of the protein interaction network at the
cellular level.
IntAct - EBI
( EBI, Hinxton, Uk)
IntAct provides a freely available, open source database system and analysis
tools for protein interaction data.
All interactions are derived from literature curation or direct user submissions
and are freely available.
Complex Portal - EBI
( EBI, Hinxton, Uk)
The Complex Portal is a manually curated, encyclopaedic resource of macromolecular
complexes from a number of key model organisms. The majority of complexes are made
up of proteins but may also include nucleic acids or small molecules. All data is
freely available for search and download.
Complexes are defined as an assembly of any two or more proteins and/or nucleic
acids that are stable enough in vitro to be reconstituted and have been demonstrated
to have a specific molecular function.
FunCoup
( KTH, Stockholm, Su)
FunCoup is a statistical framework of data integration for finding
functional coupling (FC) between proteins. It transfers information from
model organisms (M. musculus, D. melanogaster, C. elegans, S.
cerevisiae etc.) via orthologs found by InParanoid program (Remm et al.,
2001).
Data of different sources and various natures, such as contacts of whole
proteins and individual domains, mRNA and protein expression,
localization in tissues and cellular compartments, miRNA and TF
targeting, similar phylogenetic profiles etc., are collected and
probabilistically evaluated in a Bayesian network (BN), trained on sets
of known FC cases (e.g. KEGG, IntAct, HPRD, or GRID resources) vs. sets
of randomly picked protein pairs as background reference
BioGRID
( Toronto, Ca)
Biological General Repository for Interaction Datasets
BioGRID is an online interaction repository with data compiled through
comprehensive curation efforts. Our current index is version 3.1.78 and
searches 27,283 publications for 402,127 raw protein and genetic
interactions from major model organism species. All interaction data are
freely provided through our search index and available via download in a
wide variety of standardized formats.
Ontology - Pathways |
---|
Gene Ontology ( Us)
QuickGO
(EBI, Hinxton, Uk)
A fast browser for Gene Ontology terms and annotations.
PRO Protein Ontology
PRO provides an ontological representation of protein-related entities by explicitly defining
them and showing the relationships between them. Each PRO term represents
a distinct class of entities (including specific modified forms,
orthologous isoforms, and protein complexes) ranging from the taxon-neutral
to the taxon-specific (e.g. the entity representing all protein products
of the human SMAD2 gene is described in PR:Q15796; one particular
human SMAD2 protein form, phosphorylated on the last two serines of a
conserved C-terminal SSxS motif is defined by PR:000025934).
Current release: 67.0, August 08, 2022.
Kegg (NCI)
Kegg (Kyoto) Kyoto Encyclopedia of Genes and Genomes
(Kyoto, Jp)
KEGG is a database resource for understanding high-level functions and
utilities of the biological system, such as the cell, the organism and
the ecosystem, from molecular-level information, especially large-scale
molecular datasets generated by genome sequencing and other
high-throughput experimental technologies (See Release notes for new and
updated features).
BioCarta Pathways
( NCI, Bethesda, Us)
Reactome
( iOICR, ca, New-York, Us, EBI, Uk)
REACTOME is a free, online, open-source, curated pathway database encompassing many
areas of human biology. Information is
authored by expert biological researchers, maintained by the Reactome editorial staff
and cross-referenced to a
wide range of standard biological databases.
Pathway Commons
(Toronto, Dana Faber)
Pathway Commons : Access and discover data integrated from public pathway and
interactions databases. 5772 Pathways -- 2424055 Interactions -- 22 Databases
NDEx (Network Data Exchange)
(University of California, Us)
Biomolecular interactions and cellular processes assembled into
authoritative human signaling pathways
The NDEx Public Server includes a large number of networks that are
marked as PUBLIC and are therefore accessible without signing in to a
user account. Public networks can be found, viewed, and queried
anonymously using the search bar provided in the NDEx Public Server's
landing page.
Atlas of Cancer Signalling Networks global map
(Curie, Paris, Fr)
ACSN is a pathway database and a web-based environment that contains a collection of
interconnected cancer-related signalling network maps. Cell signalling mechanisms
are depicted on the maps at the level of biochemical interactions, forming a large
network of 4600 reactions covering 1821 proteins and 564 genes and connecting
several major cellular processes. The Atlas is a "geographic-like" interactive "world map"
of molecular interactions involved in cancer.
Wiki Pathways
WikiPathways is an open, public platform dedicated to the curation of biological
pathways by and for the scientific community.
Orthology - Evolution
OrthoDB
The Hierarchical Catalog of Eukaryotic Orthologs
(Univ. Geneve, Ch)
OrthoDB presents a catalog of eukaryotic orthologous protein-coding
genes across 48 vertebrates, 33 arthropods, 73 fungi, and 12 basal
metazoans. Orthology refers to the last common ancestor of the species
under consideration, and thus OrthoDB explicitly delineates orthologs at
each radiation along the species phylogeny
Homologene
(NCBI, Bethesda, Us)
HomoloGene is a system for automated detection of homologs among the
annotated genes of several completely sequenced eukaryotic genomes.
TREEFAM : Tree families database
(EBI, Hinxton, uk)
TreeFam (Tree families database) is a database of phylogenetic trees of
animal genes. It aims at developing a curated
resource that gives reliable information about ortholog and paralog
assignments, and evolutionary history of various gene families
Gene Sorter
(UCSC, San Diego, Us)
The UCSC Gene Sorter is an excellent resource for exploring gene
families and the relationships among genes. This tool displays a table
of genes
within a selected genome that are related to one another. Several
different relationships may be explored: protein-level homology,
similarity of gene
expression profiles, or genomic proximity. The Gene Sorter supports
searches on a variety of terms and phrases, including the gene name, the
SwissProt protein name, a GenBank accession, or a word or phrase
present in a gene's description. The gene family display is highly
configurable,
allowing the user to control the order and number of columns, the
number of rows, and the genes displayed. The tool provides several
output formats,
including a simple tab-delimited format that may be imported into a
spreadsheet or a relational database.
InParanoid
( Stockholm, Su)
InParanoid: Eukaryotic Ortholog Groups
Gene fusion - Chromosomal Rearrangment |
---|
COSMIC ( Sanger Center, Hinxton, Uk)
TCGA Fusion Portal
(Jackson Lab, Us)
Transcripts fusion as a result of genomic rearrangement is an important class of somatic
alteration, as a cancer initiating event and as a molecular therapeutic target for
specific tumors. Our Pipeline for RNA sequencing Data Analysis (PRADA) enables us
to detect fusion transcripts with high confidence comprehensively. Based on integrated
analysis of paired-end RNA sequencing and DNA copy number data from The Cancer
Genome Atlas(TCGA), The Tumor Fusion Gene Data Portal provides a bona-fide fusion
list across many tumor types.
Fusion Cancer (Beijing)
(Beijing, Cn)
Next-generation mRNA sequencing (RNA-seq) has long been recognized as an effcient
tool in dynamic transcriptome analysis. It
can provide not only an increased base coverage, but also a higher sample throughput.
It facilitates the ability to search fo
r alternative-spliced transcripts, post-transcriptional modifications, gene fusions,
mutations/SNPs and changes in gene expre
ssion. Many databases have been set up for fusion gene detection research, such as
Mitelman Database of Chromosome Aberration
and Gene Fusions in Cancer and ChimerDB. But they are derived either from experiments
or transcript sequences, containing li
mited records. The huge amount of RNA-seq data produced in the past few years provides
abundant resources in fusion gene dete
ction. So we can use these RNA-seq data in the Sequence Read Archive (SRA) on NCBI
to look for fusion genes in human cancer g
enome.
Y. Wang et al. (2015) Diagnostic Pathology,10,131.
FusionGDB
(UTH,Us)
ChimerDB
(Ewha Womans University, Kr)
dbCRID
(Houston, Us)
Mitelman Database of Chromosome Aberrations in Cancer
The information in the Mitelman Database of Chromosome Aberrations in Cancer relates
chromosomal aberrations to tumor characteristics, based either on individual cases or associations.
All the data have been manually culled from the literature by Felix Mitelman, Bertil Johansson, and Fredrik Merten
Archer - Quiver Fusion Database
(Boulder, Us)
arrayMap - genomic arrays for copy number profiling in human cancer
(UZH-SIB, Zurich, Ch)
CONAN : Cell lines Project: Copy Number Analysis
( Sanger Center, Hinxton, Uk)
FusionGDB is the Fusion Gene annotation DataBase, aiming to provide a resource or
reference for functional annotation of fusion genes in cancer for better therapeutic
targets. We first collected 48 117 FGs across pan-cancer from three representative
fusion gene resources: the improved database of chimeric transcripts and RNA-seq data
(ChiTaRS 3.1), an integrative resource for cancer-associated transcript fusions
(TumorFusions), and The Cancer Genome Atlas (TCGA) fusions by Gao et al.
For these ~48K FGs, we performed functional annotations including gene assessment
across pan-cancer fusion genes, open reading frame (ORF) assignment, and protein
domain retention search based on multiple isoform gene structure with multiple
break points and finally provided the fusion transcript and amino acid sequences
for each break point and gene isoforms. For each fusion partner gene, the user
can access multiple annotations such as gene summary, assessment scores of each
gene in pan-cancer, biological process gene ontologies, functional description,
retention information of 39 protein features and protein-protein interaction (PPI),
related drugs and diseases through six categories.
Among ~44K FGs checked ORFs, there were ~ 10K in-frame and ~11K frame-shift FGs.
Of these, we have identified 331, 303, 840, and 667 in-frame FGs retaining kinase
domain, DNA-binding domain, oncogene domains, and epifactor domains in fusion proteins.
Furthermore, we identified 896 and 118 in-frame FGs not-retained their functional
domains of tumor suppressor genes and DNA damage repair genes, respectively.
On the other hand, we identified 6863 FGs retaining their functional domains,
but lost the function due to the frame-shift.
Chromosome translocation and gene fusion are frequent events in the human genome
are often the cause of many types of
tumor. ChimerDB is designed to be a knowledgebase of fusion transcripts collected
from various public
resources such as the Sanger CGP, OMIM, PubMed, and Mitelman's database.
dbCRID is a curated database of human CRs and associated diseases. The current release
of dbCRID includes 2,643 individually curated entries of experimentally tested CRs,
their associated diseases and/or clinical symptoms, as well as detailed information
about the CRs, including the precise locations of the
breakpoints, the genes involved, and junction sequences, the experimental techniques applied, and
links to the original studies. These data were curated from 1,172 original studies.
Mitelman Cases Full Searcher
Quiver is a curated database of known gene fusions involved in Cancer.
The database includes internally curated data and entries imported from
publicly available sources. Current version: 4.5.
arrayMap is a curated reference database and bioinformatics resource
targeting copy number profiling data in human cancer.
The arrayMap database provides an entry point for meta-analysis and
systems level data integration of high-resolution oncogenomic CNA data.
For the majority of the samples, probe level visualization as well as
customized data representation facilitate gene level and
genome wide data review. Results from multi-case selections can be
connected to downstream data analysis and visualization tools,
as we provide through our Progenetix project.
arrayMap is developed by the group "Theoretical Cytogenetics and
Oncogenomics" at the Institute of Molecular Life Sciences of the
University of Zurich.
The link toward arrayMap is construct around the definition of the
ICD-O3 topo
and the ICD-O3 Morpho standards.
If the tags are not adequate il is aloways possible to selct other terms.
Polymorphism : SNP, mutations |
---|
dbSNP
Single Nucleotide Polymorphism
(NCBI, Bethesda, Us)
A Database of Single Nucleotide Polymorphisms : A key aspect of research
in genetics is associating sequence variations with heritable
phenotypes. The most common variations are single nucleotide
polymorphisms (SNPs), which occur approximately once every 100 to 300
bases. Because SNPs are expected to facilitate large-scale association
genetics studies, there has recently been great interest in SNP
discovery and detection.
HAPMAP
(NCBI, Bethesda, Us)
The International HapMap Project is a partnership of scientists and
funding agencies from Canada, China, Japan, Nigeria, the United Kingdom
and the United States to develop a public resource that will help
researchers find genes associated with human disease and response to
pharmaceuticals. See "About the International HapMap Project" for more
information.
Exome Variant server (EVS)
( Washington, Us)
The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover
novel genes and mechanisms contributing to heart, lung and blood
disorders by pioneering the application of next-generation sequencing of
the protein coding regions of the human genome across diverse,
richly-phenotyped populations and to share these datasets and findings
with the scientific community to extend and enrich the diagnosis,
management and treatment of heart, lung and blood disorders.
gnomAD
(Broad Institute, Boston, Us)
The Genome Aggregation Database (gnomAD) is a resource developed by an
international coalition of investigators, with the goal of aggregating
and harmonizing both exome and genome sequencing data from a wide
variety of large-scale sequencing projects, and making summary data
available for the wider scientific community.
The data set provided on this website spans 123,136 exome sequences and
15,496 whole-genome sequences from unrelated individuals sequenced as
part of various disease-specific and population genetic studies. The
gnomAD Principal Investigators and groups that have contributed data to
the current release are listed here.
Varsome
(US)
VarSome is a search engine, aggregator and impact analysis tool for
human genetic variation and a community-driven project aiming at sharing
global expertise on human variants. It renders and displays a detailed
annotation of the queried variant, including multiple notations,
predicted pathogenicity status from a variety of tools, genomic context,
as well as information from 35+ public databases. It allows users to
mark the pathogenicity of variants and to link variants to specific
phenotypes, diseases and publications. Finally, it provides an automated
pathogenicity assessment consistent with the widely accepted ACMG
guidelines. It therefore provides a powerful analysis resource as well
as a repository for the accumulated global knowledge of the genomics
community. From a technical point of view, it allows convenient
programmable single-point interface (API) for accessing all its data
M-CAP
(US)
Mendelian Clinically Applicable Pathogenicity (M-CAP) Score
M-CAP is the first pathogenicity classifier for rare missense variants
in the human genome that is tuned to the high sensitivity required in the
clinic (see Table). By combining previous pathogenicity scores
(including SIFT, Polyphen-2 and CADD) with novel features and a powerful model,
we attain the best classifier at all thresholds, reducing a typical
exome/genome rare (<1%) missense variant (VUS) list from 300 to 120,
while never mistaking 95% of known pathogenic variants as benign.
Varity
(US)
VARITY (Improved pathogenicity prediction for rare human missense variants)
Web Application User Guide. This web application provides:
1) Search and visualize VARITY predictions, features and feature contributions
for all possible single nucleotide change missense variants for
each of 18,239 human proteins. 2) Download VARITY predictions in one file
for all 18,239 proteins.
NOTE: All VARITY predictions are for research purpose and should be appropriately
validated before clinical use
ICGC
( OICR, Ontario, Ca)
ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and
epigenomic changes in 50 different tumor typ
es and/or subtypes which are of clinical and societal importance across the globe.
TCGA Copy Number Portal
(Broad Institute, Boston, Us)
This portal is designed to facilitate the use and understanding of high
resolution copy number data amassed from cancer samples in the TCGA. All
data in this portal were generated at the Broad Institute TCGA Genome
Characterization Center. This portal is modeled after Tumorscape which
contains copy number data from non-TCGA projects (Beroukhim et al.,
2010).
CENSUS
(Sanger Center, Hinxton, Uk)
The Cancer Gene Census is an ongoing effort to catalogue those genes for
which mutations have been causally implicated in cancer. The original
census and analysis was published in Nature Reviews Cancer and
supplemental analysis information related to the paper is also
available.
The census is not static but rather is updated regularly/as needed.
In particular we are grateful to Felix Mitelman and his colleagues in
providing information on more genes involved in uncommon translocations
in leukaemias and lymphomas. Currently, more than 1% of all human genes
are implicated via mutation in cancer. Of these, approximately 90% have
somatic mutations in cancer, 20% bear germline mutations that predispose
to cancer and 10% show both somatic and germline mutations.
COSMIC
(Sanger Center, Hinxton, Uk)
All cancers arise as a result of the acquisition of a series of fixed
DNA sequence abnormalities, mutations, many of which ultimately confer a
growth advantage upon the cells in which they have occurred. There is a
vast amount of information available in the published scientific
literature about these changes. COSMIC is designed to store and display
somatic mutation information and related details and contains
information relating to human cancers.
LOVD
Leiden Open Variation Database 3.0
(Leiden, Ne
LOVD stands for Leiden Open (source) Variation Database.
LOVD's purpose : To provide a flexible, freely available tool for
Gene-centered collection and display of DNA variations. LOVD 3.0 extends
this idea to also provide patient-centered data storage and storage of
NGS data, even of variants outside of genes. LOVD is open source,
released under the GPL license, and is actively being improved,
currently we have releases every month.
BioMuta v2
(Georges Washington Univ, Washington DC, Us)
BioMuta v2.0 is a curated single-nucleotide variation (SNV) and disease
association database where the variations are mapped to the
genome/protein/gene.
DoCM
DoCM Database of curated mutations
(WUSTL, Us)
DoCM, the Database of Curated Mutations, is a highly curated database of
known, disease-causing mutations that provides easily explorable
variant lists with direct links to source citations for easy
verification.
CIViC
Clinical Interpretations of Variants in Cancer
(WUSTL, Us)
The CIViC database is based on Evidence Items which reference their
parent Variants, Variant Groups, and Genes. You may explore the various
CIViC entities and their attribute using the menu to your left (or
above, if you're viewing this on a mobile display).
intoGen
Integrative Onco Genomics
(Barcelone, Es)
Mutational cancer driver database
NCG
Network of Cancer Genes
(London, Uk)
NCG collects 3,347 cancer driver genes from Census of Cancer Genes (CGC),
Vogelstein, Science 2013, Saito, Nature 2020 and screenings of cancer tissues,
well as 95 healthy drivers from screenings of non-cancer tissues
OncoKB
MSK's Precision Oncology Knowledge Base
(MSK, NY, USA)
An FDA-Recognized Human Genetic Variant Database*
Powered by the clinical expertise of Memorial Sloan Kettering Cancer Center
When using OncoKB, please cite: Chakravarty et al., JCO PO 2017.
CancerInterpreter
(Barcelona, Es)
Cancer Genome Interpreter (CGI) is designed to support the
identification of tumor alterations that drive the disease and detect
those that may be therapeutically actionable. CGI relies on existing
knowledge collected from several resources and on computational methods
that annotate the alterations in a tumor according to distinct levels of
evidence.
Cancer3D
Cancer3D
(Sanford Burham, Ca)
Cancer3D is a database that unites information on somatic missense
mutations from TCGA and CCLE, allowing users to explore two different
cancer-related problems at the same time: drug sensitivity/biomarker
identification and prediction of cancer drivers. The database is an
interface to two novel algorithms, e-Driver and e-Drug, that make use of
information about the internal structure of a protein to predict novel
cancer drivers or drug biomarkers respectively. Moreover, it maps
somatic missense mutations from over 18,000 human proteins to more than
25,000 protein structures from PDB.
HGMD
Human Gene Mutation Database
( Institute of Medical Genetics, Cardiff, Uk)
Human gene mutation is a highly specific process, and this specificity has
important implications for the nature, prevalence and therefore diagnosis of
genetic disease. Indeed, the recognition that certain DNA sequences are
hypermutable has yielded clues as to the endogenous mutational mechanisms
involved and provided insights into the intricacies of the processes of DNA
replication and repair (Cooper and Krawczak 1993). In practical terms, a fuller
understanding of the mutational process may prove important in molecular
diagnostic medicine by contributing to improvements in the design and efficacy
of mutation search procedures and strategies in different genetic disorders.
The Human Gene Mutation Database (HGMD) represents an attempt to collate
known (published) gene lesions responsible for human inherited disease. This
database, whilst originally established for the study of mutational mechanisms
in human genes (Cooper and Krawczak 1993), has now acquired a much broader
utility in that it embodies an up-to-date and comprehensive reference source to
the spectrum of inherited human gene. Thus, HGMD provides information of
practical diagnostic importance to (i) researchers and diagnosticians in human
molecular genetics, (ii) physicians interested in a particular inherited condition
in a given patient or family, and (iii) genetic counsellors.
dbVar
(Bethesda, NCBI, Us)
dbVar is NCBI's database of genomic structural variation - it contains insertions,
deletions, duplications, inversions, multinucleotide substitutions, mobile element insertions,
translocations, and
complex chromosomal rearrangements
DGV -Genomic Variants
(Toronto, Ca)
The objective of the Database of Genomic Variants is to provide a
comprehensive summary of structural variation in the human genome. We
define structural variation as genomic alterations that involve segments
of DNA that are larger than >1kb. Now we also annotate InDels in
100bp-1kb range. The content of the database is only representing
structural variation identified in healthy control samples.
The Database of Genomic Variants provides a useful catalog of control
data for studies aiming to correlate genomic variation with phenotypic
data. The database is continuously updated with new data from peer
reviewed research studies. We always welcome suggestions and comments
regarding the database from the research community.
DECIPHER
Sanger Center, Hinxton, Uk)
DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using
Ensembl Resources) is an interactive web-based database which
incorporates a suite of tools designed to aid the interpretation of
genomic variants.
DECIPHER enhances clinical diagnosis by retrieving information from a
variety of bioinformatics resources relevant to the variant found in the
patient. The patient's variant is displayed in the context of both
normal variation and pathogenic variation reported at that locus thereby
facilitating interpretation.
SNPs3D
(UMD, Us)
SNPs3D is a website which assigns molecular functional effects of non-synonymous
SNPs based on structure and sequence analysis.
Genetic Association Database
(NIH, Bethesda, Us)
The Genetic Association Database is an archive of human genetic
association studies of complex diseases and disorders. The goal of this
database is to allow the user to rapidly identify medically relevant
polymorphism from the large volume of polymorphism and mutational data,
in the context of standardized nomenclature.
Diseases |
---|
OMIM
Online Mendelian Inheritance in Man"
(John Hopkins, Baltimore, Us)
OMIM is a catalog of
human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his
colleagues at Johns Hopkins and elsewhere, and developed for the World Wide Web by NCBI, the
National Center for Biotechnology Information. The database contains textual information,
pictures, and reference information. It also contains copious links to NCBI's Entrez database of
MEDLINE articles and sequence information.
MedGen
( NCBI, Bethesda, Us)
MedGen is NCBI portal to information about human disorders and other
phenotypes having a genetic component. MedGen is structured to serve
health care professionals, the medical genetics community, and other
interested parties by providing centralized access to diverse types of
content. For example, because MedGen aggregates the plethora of terms
used for particular disorders into a specific concept, it provides a
Rosetta stone for stakeholders who may use different names for the same
disorder. Maintaining a clearly defined set of concepts and terms for
phenotypes is essential to support efforts to characterize genetic
variation by its effects on specific phenotypes. The assignment of
identifiers for those concepts allows computational access to phenotypic
information, an essential requirement for the large-scale analysis of
genomic data.
dbGap
( NCBI, Bethesda, Us)
The database of Genotypes and Phenotypes (dbGaP) was developed to
archive and distribute the data and results from studies that have
investigated the interaction of genotype and phenotype in Humans.
ClinVar
( NCBI, Bethesda, Us)
ClinVar is designed to provide a freely accessible, public archive of
reports of the relationships among human variations and phenotypes, with
supporting evidence. By so doing,
ClinVar facilitates access to and communication about the relationships
asserted between human variation and observed health status, and the
history of that interpretation.
ClinVar collects reports of variants found in patient samples,
assertions made regarding their clinical significance, information about
the submitter, and other supporting data. The alleles described in
submissions are mapped to reference sequences, and reported according to
the HGVS standard.
ClinVar then presents the data for interactive users as well as those
wishing to use ClinVar in daily workflows and other local applications.
ClinVar works in collaboration with interested organizations to meet the
needs of the medical genetics community as efficiently and effectively
as possible. Information about using ClinVar.
GTR (The Genetic Testing Registry)
( NIH, Bethesda, Us)
The Genetic Testing Registry (GTR) provides a central location for voluntary submission
of genetic test informat
ion by providers. The scope includes the test's purpose, methodology, validity,
evidence of the test's usefulnes
s, and laboratory contacts and credentials. The overarching goal of the GTR is to
advance the public health and
research into the genetic basis of health and disease.
Open Targets
(Hinxton, Uk)
The Target Validation Platform (www.targetvalidation.org) aims to
support researchers in identifying early drug targets faster and with
more confidence. The platform integrates data from several public
databases and is the result of a collaboration between the Sanger
Institute, GlaxoSmithKline (GSK), the European Bioinformatics Institute
(EBI) and Biogen.
As part of our ongoing efforts to improve this valuable public resource,
we want to talk to experimental biology researchers who study
associations of human genes with diseases. We are interested in
understanding how well the platform meets your needs and what other
information and features would make it more useful to you. A typical
session takes about an hour of your time. Previous participants have
found them to be a lot of fun!
HuGE Navigator
HuGE Navigator provides access to a continuously updated knowledge base
in human genome epidemiology, including information on population
prevalence of genetic variants, gene-disease associations, gene-gene and
gene- environment interactions, and evaluation of genetic tests
The Office of Public Health Genomics (OPHG), CDC
The Centers for Disease Control and Prevention (CDC) established the
Office of Public Health Genomics (OPHG) in 1997. OPHG aims to integrate
genomics into public health research, policy, and programs, which could
improve interventions designed to prevent and control the country's
leading chronic, infectious, environmental, and occupational diseases.
OPHG's efforts focus on conducting population-based genomic research,
assessing the role of family health history in disease risk and
prevention, supporting a systematic process for evaluating genetic
tests, translating genomics into public health research and programs,
and strengthening capacity for public health genomics in disease
prevention programs.
(Centers for Disease Control and Prevention (CDC)
ORPHANET :
Database of rare diseases and orphan drugs
(INSERM, Paris, Fr)
This project is the result of a commonly observed fact: rare diseases are difficult to
deal with for medical practitioners. This is due to their restricted knowledge of the
diseases' natural history, the patient care required, treatment, and sometimes even
of its existence. Scientific knowledge exists, or at least partial knowlege, but it is
scattered. Because of the physical media on which it is communicated, the
information is difficult to access for the great majority of physicians, not to
mention patients and their families. Only a very small number of doctors
specialize in these diseases, and their practices are scarcely known, sometimes
even totally unknown to other practitioners.
The fields currently covered are:
DisGeNET
(Es)
DisGeNET is a discovery platform containing one of the largest publicly
available collections of genes and variants associated to human diseases
(Pi ero et al., 2016; Pi ero et al., 2015). DisGeNET integrates data
from expert curated repositories, GWAS catalogues, animal models and the
scientific literature. DisGeNET data are homogeneously annotated with
controlled vocabularies and community-driven ontologies. Additionally,
several original metrics are provided to assist the prioritization of
genotype phenotype relationships.
ClinGen :
Clinical Genome resource
(NIH, Bethesda, Us)
ClinGen is a National Institutes of Health (NIH)-funded resource
dedicated to building an authoritative central resource that defines the
clinical relevance of genes and variants for use in precision medicine
and research.
Clinical trial, drugs and therapy |
---|
BioCentury BCIQ
(Redwood City, Us)
BioCentury's BCIQ is unlike any other business intelligence and research
tool on the market. It combines over 20 years of BioCentury's leading
analysis of the biopharma industry with four, easy-to-use, fully
integrated modules. The data are fully vetted and meticulously
maintained. Quite simply, BCIQ is the most accurate, in-depth tool for
your research and business intelligence need
DGIdb The Drug Gene Interaction Database
(WUSTL, Us)
Search Interactions search for drug-gene interactions by gene or drug names
My Cancer Genomes
(Vanderbilt, Us)
My Cancer Genome contains information on the clinical impact of molecular biomarkers
in cancer-related genes, proteins, and other biomarker types on the use of anticancer
therapies in cancer. This information is derived from FDA labels, NCCN and other
professional society guidelines, clinical trials, peer-reviewed publications, and more.
CTD : Comparative Genomics Database
( NC State Univ, Us)
The Comparative Toxicogenomics Database (CTD) elucidates molecular
mechanisms by which environmental chemicals affect human disease.
Chemical-gene/protein interactions and chemical- and gene-disease
relationships are curated from the published literature, and integrated
with diverse data (chemicals, genes/proteins, human diseases,
references, sequences, vertebrate and invertebrate organisms, and the
Gene Ontology) to facilitate environmental health research.
PharGKB
( Stanford, Us)
PharmGKB is a comprehensive resource that curates knowledge about the
impact of genetic variation on drug response for researchers and
clinicians. We encompass clinical information including dosing
guidelines and drug labels, potentially clinically actionable gene-drug
associations and genotype-phenotype relationships.
Genomics of Drug Sensitivity in Cancer (Welcome Trust)
(Sanger, Hinxton, Uk)
The Genomics of Drug Sensitivity in Cancer project is an academic
research program to identify molecular features of cancers that predict
response to anti-cancer drugs.
Clinical Trial
( NIH, Bethesda, Us)
ClinicalTrials.gov is a registry and results database of federally and
privately supported clinical trials conducted in the United States and
around the world. ClinicalTrials.gov gives you information about a
trial's purpose, who may participate, locations, and phone numbers for
more details. This information should be used in conjunction with advice
from health care professionals.
DEPO (Database of Evidence for Precision Oncology)
( Washington University, Us)
DEPO is the Database of Evidence for Precision Oncology, containing
druggable variant information such as drug therapy, evidence levels
(FDA-approved, Clinical Trials, Case Reports, Preclinical), and the
cancer types for intended treatments.
The pie chart below summarizes the curated variants (drug-sensitive or
-resistant) according to the following:
copy number variation (CNV), which corresponds to either copy number
amplication or loss; gene fusion;
expression outlier, which refers to genes whose elevated and reduced
expression is associated with drug response; and
mutations, which refers to missense, nonsense, in-frame indels, and
frameshift mutations
Pharos
Druggable Genome (IDG) program
( NIH, Us)
Pharos is the user interface to the Knowledge Management Center (KMC) for the
Illuminating the Druggable Genome (IDG) program funded by the National Institutes
of Health (NIH) Common Fund. (Grant No. 1U24CA224370-01). The goal of KMC is
to develop a comprehensive, integrated knowledge-base for the Druggable Genome
(DG) to illuminate the uncharacterized and/or poorly annotated portion of the DG,
focusing on three of the most commonly drug-targeted protein families:
BioGrid ORCS
( Toronto)
Welcome to the BioGRID Open Repository of CRISPR Screens (ORCS)
BioGRID ORCS is an open repository of CRISPR screens compiled through
comprehensive curation efforts. Our current index is version 1.1.12 and
searches 266 publications and 86,066 genes to return 1,592 CRISPR screens
from 4 major model organism species, 745 cell lines, and 127 cell types.
All screen data are freely provided through our search index and available
via download in a wide variety of standardized formats.
Miscellaneous |
---|
Bibliography, Data mining |
---|
PubMed (NLM , Us)
COREMINE
( Oslo, No)
Coremine Medical is a product of the PubGene Company designed to be used
by anyone seeking information on health, medicine and biology. It is
ideal for those seeking an overview of a complex subject while allowing
the possibility to "drill down" to specific details. Search results are
presented in a dashboard format comprised of panels containing various
categories of information ranging from introductory sources to the
latest scientific articles.
EVEX ( Text mining )
( Turku, Est)
EVEX is a text mining resource built on top of PubMed abstracts and
PubMed Central full texts. It contains over 40 million bio-molecular
events among more than 76 million automatically extracted gene/protein
name mentions. The text mining data further has been enriched with gene
normalization results, allowing straightforward integration with
external resources. Further, gene families from Ensembl and HomoloGene
provide homology-based event generalizations. EVEX presents both direct
and indirect associations between genes and proteins, enabling
explorative browsing of relevant literature.
The EVEX website provides summarized information on various
bio-molecular events, accounting for lexical variation of gene/protein
symbols and dealing with synonymy and abbreviations. Both direct and
indirect associations can be retrieved, and homology-based
generalizations provide the opportunity to retrieve information on
entire gene families.
Harminozome
(Ma'ayan Laboratory of Computational Systems Biology, Us)
Search for genes or proteins and their functional terms extracted and organized
from over a hundred publicly available resources.
ARCHS4
(NY , Us)
ARCHS4: Massive Mining of Publicly Available RNA-seq Data from Human and Mouse
The Ma'ayan Laboratory applies machine learning and other statistical mining
techniques to study how intracellular regulatory systems function as networks
to control cellular processes such as differentiation, dedifferentiation,
apoptosis and proliferation.
Our research team develops software systems to help experimental biologists
form novel hypotheses from high-throughput data, while aiming to better
understand the structure and function of regulatory networks in mammalian
cellular and multi-cellular systems.
Bionity
( MSKCC, New-York, Us)
A network of genes and proteins extends through the scientific literature,
touching on phenotypes, pathologies and gene fun
ction. We report the development of an information system that provides this network as
a natural way of accessing the more
than ten million abstracts in PubMed. By using genes and proteins as hyperlinks
between sentences and abstracts, we conver
t the information in PubMed into one navigable resource and bring all the advantages
of the internet to scientific literatu
re investigation.
Moreover, this literature network can be superimposed on experimental interaction
data (e.g. yeast two-hybrid data from Dro
sophila melanogaster and Caenorhabditis elegans) to make possible a simultaneous
analysis of new and existing knowledge. Th
e network, called Information Hyperlinked over Proteins (iHOP), contains half
a million sentences and 30,000 different gene
s from humans, mice, D. melanogaster, C. elegans, zebrafish, Arabidopsis thaliana,
yeast and Escherichia coli.
The iHOP server is publicly accessible here.
ZODIAC
( Evanston, Us)
Zodiac, like Google, is a search engine. You type queries in the search
bar above, and Zodiac subsequently returns search results. We advise
that you read t
he tutorial above, or continue reading below.
Zodiac helps you understand how genes interact in cancer conditions. The
networks contained in Zodiac are based on rigorous statistical
inference utilizing
prior knowledge and TCGA data.
Zodiac helps discover interacting gene, and consequently, potential drug targets.
Zodiac identifies potential genetic aberrations such as gene fusions.
Zodiac is easy to use and allows for real-time look-up of genetic
interactions while reading a paper, listening to a seminar, or browsing
the internet.
DataMed Index
( Us)
DataMed prototype(v3.0) is being developed for the NIH BD2K Data Discovery Index
(DDI) by the bioCADDIE project team. DataMed, once completed, will be of use to the
scientific community to allow users to search for and find data across different
repositories in one space. We are soliciting your feedback to help us shape
DataMeds' future development. Please take a moment to answer this brief
Survey Form and give us your thoughts. We believe your voice will be a
critical addition to the development of the bioCADDIE prototype. Thank you,
from the bioCADDIE team.