Databases links


  • Nomenclature
  • Cancer Portals
  • Cards
  • Genomic and Cartography
  • Genes and transcription
  • Protein : sequences, function, domain, 3D structure
  • Protein Interaction databases
  • Ontology - Pathways
  • Orthology - Evolution
  • Gene fusions - Chromosomal rearrangment
  • Polymorphism : SNP, mutations
  • Diseases
  • Clinical trials, drugs, therapy
  • Miscellaneous
  • Bibliography, Data mining

  • Nomenclature

    HUGO : The Human Gene Nomenclature Database (Hinxton, Uk)
    The Human Gene Nomenclature Database Search tool provides access to the list of currently approved human gene symbols as maintained by the HUGO gene nomenclature committee. Many previously approved symbols are also listed, with links directing users to the current symbol. Minor changes to a previously approved symbol, such as adding a number (eg NRAMP becomes NRAMP1), may not be listed in this way, so users should try a "Symbol begins with" search using the first few letters of a symbol, instead of an exact search, if they fail to find a specific symbol. Other symbols used in the literature (known as aliases) are collected and stored by the HUGO Nomenclature Committee, and are now searchable with this tool. The "Find a gene" facility in GDB may be useful to search for other names/symbols which cannot be found in the Human Gene Nomenclature Database.

    LRG (Hinxton, UK)
    A Locus Reference Genomic (LRG) record contains stable reference sequences that are used for reporting sequence variants with clinical implications

    International Classification of Diseases for oncology (WHO, IARC, Lyon, Fr)
    Purpose/Definition
    Used principally in tumour or cancer registries for coding the site (topography) and the histology (morphology) of neoplasms, usually obtained from a pathology report.
    Classification structure
    A multi-axial classification of the site, morphology, behaviour, and grading of neoplasms.
    The topography axis uses the ICD-10 classification of malignant neoplasms (except those categories which relate to secondary neoplasms and to specified morphological types of tumours) for all types of tumours, thereby providing greater site detail for non-malignant tumours than is provided in ICD-10. In contrast to ICD-10, the ICD-O includes topography for sites of haematopoietic and reticuloendothelial tumours.
    The morphology axis provides five-digit codes ranging from M-8000/0 to M-9989/3. The first four digits indicate the specific histological term. The fifth digit after the slash (/) is the behaviour code, which indicates whether a tumour is malignant, benign, in situ, or uncertain (whether benign or malignant).
    A separate one-digit code is also provided for histologic grading (differentiation).


    Portals

    ICGC Data Portal (Ontario, Ca)
    The ICGC Data Portal provides tools for visualizing, querying and downloading the data released quarterly by the consortium's member projects.
    The Pancancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in more than 2,800 cancer whole genomes from the International Cancer Genome Consortium.

    TCGA cBIOPortal (MSKCC, New_York, Us)
    The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets.

    Broad Tumor Portal (Broad Institute, Boston, Us)
    Explore dataset by tumor types : Genes, Cancers, DNA Mutations & Annotations

    Firebrowse Broad GDAC (Broad Institute, Boston, Us)
    Explore TCGA and Broad dataset by tumor cohort with different type of analyses : clinical, copy number, correlation, miR, mRNA, mutation, pathways, RPPA ..

    intoGen Integrative Onco Genomics (Barcelone, Es)
    Mutational cancer driver database

    OASIS Portal (Pfizer, Us)
    OASIS is an open-access web portal that enables cancer researchers to perform exploratory and integrative analyses of somatic mutation, copy number changes (CNV) and gene expression data from thousands of tumor, normal tissues and cell lines representing a broad spectrum of malignancies.
    OASIS is developed by Pfizer Oncology Research Computational Biology in collaboration with Research Business Technology (RBT). Please cite "OASIS: A web-based platform for exploratory analysis of cancer genome and transcriptome data (manuscript in preparation)" when publishing results based on OASIS.

    Cancer Browser UCSC (San Diego, Us)
    The UCSC Cancer Browser allows researchers to interactively explore cancer genomics data and its associated clinical information. Data can be viewed in a variety of ways, including by value, chromosome location, clinical feature, biological pathway or geneset of interest. It is also possible to quickly perform and easily view statistical analysis on subsets of the data.

    canSAR ( ICR, Uk)
    canSAR is an integrated knowledge-base that brings together multidisciplinary data across biology, chemistry, pharmacology, structural biology, cellular networks and clinical annotations, and applies machine learning approaches to provide drug-discovery useful predictions.
    canSAR's goal is to enable cancer translational research and drug discovery through providing this knowledg e to researchers from across different disciplines. It provides a single information portal to answer complex mu lti-disciplinary questions including - among many others: what is known about a protein, in which cancers is it expressed or mutated and what chemical tools and cell line models can be used to experimentally probe its activity? What is known ab out a drug, its cellular sensitivity profile and what proteins is it known to bind that may explain unusual bioactivity ?

    CancerResource (La Charite, Berlin, De)
    Database CancerResource. It is a comprehensive knowledgebase for drug-target relationships related to cancer as well as for supporting information or experimental data. Furthermore, large-scale cancer genomics data is integrated into the CancerResource database including mRNA expression and non-synonymous mutations data. Therefore, CancerResource allows an explorative data analysis based on cancer related drug-target interactions, expression and mutation data as well as drug sensitivity data.


    Cards

    Atlas of Genetics in Oncology and Haematology (USAL, Spain)

    Entrez_Gene ( NCBI, Bethesda, Us)
    Entrez_Gene is a part of Entrez devoted to search informations on genes and links to other database as RefSeq, maps, OMIM, Unigene, Pubmed.... It is developped and maintained by NCBI.

    EnSembl (Sanger_EBI, Hinxton, Uk)
    The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online

    GeneCards: human genes, proteins and diseases (Weizmann, Rehovot, Is)
    GeneCards is a database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others [gene listing]. It is especially useful for those who are searching for information about large sets of genes or proteins, e.g. for scientists working in functional genomics and proteomics.

    AceView (NCBI, Bethesda, Us)
    AceView offers a comprehensive and non-redundant cDNA-supported annotation of human and nematode genes. Our program co-aligns the million mRNAs and ESTs available from GenBank, dbEST and RefSeq on the genome sequence, quality-filters the cDNAs and clusters them into alternative transcripts an d genes. By construction, the cooperative accuracy of these sequences, ESTs or mRNAs, is brought up to the exceptional quality of the genome sequence.

    GENATLAS (Imagine , Paris, Fr)
    The GENATLAS database compiles the information relevant to the mapping efforts of the Human Genome Project. This information is collected from original articles in the literature or from the proceedings of Human Gene Mapping and Single Chromosome Workshops. It is repertoried in three interactive directories GENATLAS/GEN, GENATLAS/ LINK, GENATLAS/REF. A series of graphical maps GENATLAS/ MAP is associated as well as a Comparative Map database edited by John H Edwards.

    WikiGenes
    WikiGenes is a non-profit initiative to provide a global collaborative knowledge base for the life sciences, where authorship matters. Search thousands of genes, chemicals, pathologies and much more...

    SOURCE (Princeton, Us)
    SOURCE contains two types of pages, Gene Reports and CloneReports. GeneReports display information about genes including functional, structural and expression data. GeneReports give an overview of a gene's biology by describing its protein function, the tissue sources of cDNA clones associated with the gene, links to microarray experiments that included the queried gene, and the mapping of the gene within the human genome.
    CloneReports display information about a given cDNA clone (also known as an Expressed Sequence Tag or EST) including putative ID, the size of the insert, vector information, and links to BLAST searches and genome browsing tools. Users can switch between the two types of reports by clicking on the button at the top of each report page.

    GHR Genetics Home Reference (Bethesda, Us)
    Genetics Home Reference provides consumer-friendly information about the effects of genetic variation on human health.

    miRBase (Hinxton, Uk)
    miRBase: the microRNA database
    miRBase provides the following services:
    The miRBase database is a searchable database of published miRNA sequences and annotation. Each entry in the miRBase Sequence database represents a predicted hairpin portion of a miRNA transcript (termed mir in the database), with information on the location and sequence of the mature miRNA sequence (termed miR). Both hairpin and mature sequences are available for searching and browsing, and entries can also be retrieved by name, keyword, references and annotation. All sequence and annotation data are also available for download.
    The miRBase Registry provides miRNA gene hunters with unique names for novel miRNA genes prior to publication of results. Visit the help pages for more information about the naming service.

    dbDEMC2.0 (Cn)
    A Database of Differentially Expressed miRNAs in Human Cancers (version 2.0)
    dbDEMC (database of Differentially Expressed MiRNAs in human Cancers) is an integrated database that designed to store and display differentially expressed microRNAs (miRNAs) in human cancers detected by high-throughput methods. In this updated version of dbDEMC, a total of 209 newly published data sets were collected from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA).

    H-invDB (Ja)
    An Integrated Database of Annotated Human Genes H-Invitational Database (H-InvDB) is an integrated database of human genes and transcripts. By extensive analyses of all human transcripts, we provide curated annotations of human genes and transcripts that include gene structures, alternative splicing variants, non-coding functional RNAs, protein functions, functional domains, sub-cellular localizations, metabolic pathways, protein 3D structure, genetic polymorphisms (SNPs, indels and microsatellite repeats) , relation with diseases, gene expression profiling, and molecular evolutionary features , protein-protein interactions (PPIs) and gene families/groups. H-InvDB is produced by the "Genome Information Integration Project" (2005-2008) based upon the annotation technology established in the H-Invitational Project for annotation of human full-length cDNAs, and presented as a key integrated database of human genes in METI integrated database project (2008-).


    Genomic and cartography

    The complete sequence of a human genome
    Science 31 Mar 2022 Vol 376, Issue 6588 pp. 44-53
    Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

    GoldenPath : Human Genome Project Working Draft - Human Genome Browser (San Diego, Us)
    This page contains links to an assembly of the current draft of the human genome. The human genome is approximately 3.1 billion bases. Roughly 88% of the genome has been sequenced by the International Human Genome Project. The Oct. 7th draft genome is composed of hundreds of thousands of fragments of various sizes. The order and orientation of the fragments is often not known from the sequencing process itself. In some cases the same part of the genome will be duplicated in several fragments.
    Human Genome Browser MapView (NCBI, Bethesda, Us)
    The Map Viewer provides special browsing capabilities for a subset of organisms in Entrez Genomes. The organism subset is shown below and also on the Map Viewer Home Page. Map Viewer allows you to view and search an organism's complete genome, display chromosome maps, and zoom into progressively greater levels of detail, down to the sequence data for a region of interest. The number and types of available maps vary by organism, and are described in the "data and search tips" file for each organism. If multiple maps are available for a chromosome, it displays them aligned to each other based on shared marker and gene names, and, for the sequence maps, based on a common sequence coordinate system.

    EnSembl : Human Genome Project Working Draft - Ensembl Map view (Sanger_EBI, Hinxton, Uk)
    Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop a software system which produces and maintains automatic annotation on eukaryotic genomes.

    GENCODE project (Sanger Institute, Hinxton, Uk)
    The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. After a successful pilot phase on 1% of the genome, the scale-up to the entire genome is now underway. The Wellcome Trust Sanger Institute was awarded a grant to carry out a scale-up of the GENCODE project for integrated annotation of gene features.
    Having been involved in successfully delivering the definitive annotation of functional elements in the human genome, the GENCODE group were awarded a second grant in 2013 in order to continue their human genome annotation work and expand GENCODE to include annotation of the mouse genome.
    The GENCODE gene sets are used by the entire ENCODE consortium and by many other projects (eg. 1000 Genomes) as reference gene sets.

    ImmunoBase
    ImmunoBase is a web based resource focused on the genetics and genomics of immunologically related human diseases. Our mission is to provide a curated and integrated set of datasets and tools, across multiple diseases, to support and promote research in this area.

    Vega : Human Genome Project Working Draft - Ensembl Map view (Sanger Institute, Hinxton, Uk)
    Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop a software system which prod uces and maintains automatic annotation on eukaryotic genomes.

    Genome Data Viewer: (NCBI, Bethesda, Us)
    The NCBI Genome Data Viewer (GDV) is a genome browser supporting the exploration and analysis of eukaryotic RefSeq genome assemblies. It allows users to visualize different types of sequence-associated data in a genomic context. Genome Data Viewer is also used by different NCBI resources, such as GEO, to display datasets associated with specified experiments or samples in a genome browser context. Release notes are available for each browser version, describing new features and bug fixes. Videos are available on the GDV playlist to help you get started with various browser features.

    Unigene (NCBI, Bethesda, Us)
    UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.


    Gene and transcription

    GenBank (NCBI, Bethesda, Us)
    GenBank is the NIH's database of all known nucleotide and protein sequences including supporting bibliographic and biological information. Since 1992 it has been based at the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine, located on the NIH campus. NCBI was created by Congress in 1988 and specifically charged with developing automated information systems to support molecular biology and biotechnology. Its other mission is to conduct basic research and as part of the NIH Intramural Program, NCBI scientists pursue research in genome analysis, molecular structure modeling and prediction, and mathematical methods for sequence analysis.

    RefSeq (NCBI, Bethesda, Us)
    The NCBI Reference Sequence project (RefSeq) will provide reference sequence standards for the naturally occurring molecules of the central dogma, from chromosomes to mRNAs to proteins. RefSeq standards provide a foundation for the functional annotation of the human genome. They provide a stable reference point for mutation analysis, gene expression studies, and polymorphism discovery.

    CCDS (UCSC, San Diego, Us)
    The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.

    Easana-Genosplice (Genosplice, Paris, Fr)
    the Friendly Alternative Splicing and Transcripts Database
    FAST DB: a website ressource for the study of the expression regulation of human gene products.
    Fast DB provides three kinds of analysis: human mRNAs, human mRNAs and ESTs, and mouse mRNAs.

    The ArrayExpress Gene Expression Atlas (EBI, Hinxton, Uk)
    The ArrayExpress Gene Expression Atlas is a semantically enriched database of meta-analysis based summary statistics over a curated su bset of ArrayExpress Archive, servicing queries for condition-specific gene expression patterns as well as broader exploratory searche s for biologically interesting genes/samples.
    To cite the Atlas in your research or to learn more about it, please refer to Kapushesky M et al. (2009) Gene Expression Atlas at the European Bioinformatics Institute, Nucleic Acids Research Database Issue (NAR 2009)

    GEO Profiles (NCBI, Bethesda, Us)
    This database stores individual gene expression profiles from curated DataSets in the Gene Expression Omnibus (GEO) repository. Search for specific profiles of interest based on gene annotation or pre-computed profile characteristics.

    SEEK (Princeton, Us)
    Search-Based Exploration of Expression Compendium [Human] SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user's query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user's query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.

    MEM - Multi Experiment Matrix (Est)
    Mining for coexpression across hundreds of datasets using novel rank aggregation and visualisation methods.

    Genevestigator (Us)
    GENEVESTIGATOR : The world's expression data at your fingertips
    Characterize genes by finding out where, when and in response to what they are expressed
    Learn more from your experiments by integrating and comparing them with public datasets
    Explore expertly curated experiments to find supporting evidence for your hypotheses
    Discover and prioritize your biomarkers and targets against thousands of conditions

    BIOGPS (Scripps, Us)
    A free extensible and customizable gene annotation portal, a complete resource for learning about gene and protein function.

    GTEX Portal (Broad, Boston Us)
    The GTEx Project Correlations between genotype and tissue-specific gene expression levels will help identify regions of the genome that influence whether and how much a gene is expressed. GTEx will help researchers to understand inherited susceptibility to disease and will be a resource database and tissue bank for many studies in the future. The Genotype-Tissue Expression (GTEx) project aims to provide to the scientific community a resource with which to study human gene expression and regulation and its relationship to genetic variation. This project will collect and analyze multiple human tissues from donors who are also densely genotyped, to assess genetic variation within their genomes. By analyzing global RNA expression within individual tissues and treating the expression levels of genes as quantitative traits, variations in gene expression that are highly correlated with genetic variation can be identified as expression quantitative trait loci, or eQTLs.


    Protein : sequence, function, domain, 3D structure

    UniProt : Protein Sequence Database (EBI, Hinxton, Uk)
    The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.

    SwissProt Protein Sequence Database (SIB, Geneve, Ch)
    The SWISS-PROT Protein Sequence Database is a database of protein sequences produced collaboratively by Amos Bairoch (University of Geneva) and the EMBL Data Library. The data in Swiss-Prot are derived from translations of DNA sequences from the EMBL Nucleotide Sequencef Database, adapted from the Protein Identification Resource (PIR) collection, extracted from the literature and directly submitted by researchers. It contains high-quality annotation,is non-redundant, and cross-referenced to several other databases, notably the EMBL nucleotide sequence database, PROSITE pattern database and PDB. SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc), a minimal level of redundancy and a high level of integration with other databases. Recent developments of the database include: an increase in the number and scope of model organisms; cross-references to seven additional databases; a variety of new documentation files; the creation of TREMBL, an unannotated supplement to SWISS-PROT. This supplement consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except CDS already included in SWISS-PROT.

    NextProt : Exploring the universe of human proteins (SIB, Geneve, Ch)
    Developed in collaboration between the SIB Swiss Institute of Bioinformatics and Geneva Bioinformatics (GeneBio) SA, neXtProt will be a comprehensive human-centric discovery platform, offering its users a seamless integration of and navigation through protein-related data.

    ENZYME (SIB, Geneve, Ch)
    The ENZYME data bank contains the following data for each type of characterized enzyme for which an EC number has been provided: EC number, Recommended name, Alternative names, Catalytic activity, Cofactors, Pointers to the SWISS-PROT entrie(s) that correspond to the enzyme, Pointers to disease(s) associated with a deficiency of the enzyme.

    INTENZ (EBI, Uk)
    IntEnz (Integrated relational Enzyme database) is a freely available resource focused on enzyme nomenclature. IntEnz is created in collaboration with the Swiss Institute of Bioinformatics (SIB). This collaboration is responsible for the production of the ENZYME resource. IntEnz contains the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) on the nomenclature and classification of enzyme-catalysed reactions.

    PhosPhoSitePlus (Denvers, Us)
    PhosphoSitePlus (PSP) is an online systems biology resource providing comprehensive information and tools for the study of protein post-translational modifications (PTMs) including phosphorylation, ubiquitination, acetylation and methylation. See About PhosphoSite above for more information.

    Prosite : Protein signatures (SIB, Geneve, Ch)
    The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE web page has been redesigned and several tools have been implemented to help the user discover new conserved regions in their own proteins and to visualize domain arrangements. We also introduced the facility to search PDB with a PROSITE entry or a user's pattern and visualize matched positions on 3D structures. The latest version of PROSITE (release 18.17 of November 30, 2003) contains 1676 entries. The database is accessible at http://www.expasy.org/prosite/.

    Interpro : (Integrated Resource of Protein domains and Functionnal sites) (EBI, Hinxton, Uk)
    release 1.0 (March 2000) was built from Pfam 5.0, PRINTS 25.0, PROSITE 16 and the current SWISS-PROT + TrEMBL data. This release of InterPro contains 2990 entries, representing 2373 families, 556 domains, 47 repeats and 14 post-translational modification sites encoded by 4884 different regular expressions, profiles, fingerprints and HMMs.
    Interpro is a useful resource for whole genome analysis and has already been used for the proteome analysis of a number of completely sequenced organisms. A preliminary proteome analysis was also produced for the human genome.

    PFAM - Sanger Center (Sanger, Hinxton, Uk)
    Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains
    Pfam is a collection of protein families and domains. Pfam contains multiple protein alignments and profile-HMMs of these families. Pfam is a semi-automatic protein family database, which aims to be comprehensive as well as accurate. This page provides links to various help documents that are available.

    CDD A Conserved Domain Database and Search Service (NCBI, Bethesda, Us)
    Proteins often contain several modules or domains, each with a distinct evolutionary origin and function. The CD-Search service may be used to identify the conserved domains present in a protein sequence:
    Computational biologists define conserved domains based on recurring sequence patterns or motifs. CDD currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI. The source databases also provide descriptions and links to citations. Since conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible.

    DMDM Domain mapping of disease mutations (DMDM) (Baltimore, Us)
    Domain mapping of disease mutations (DMDM) is a database in which each disease mutation can be displayed by its gene, protein or domain location. DMDM provides a unique domain-level view where all human coding mutations are mapped on the protein domain. To build DMDM, all human proteins were aligned to a database of conserved protein domains using a Hidden Markov Model-based sequence alignment tool (HMMer). The resulting protein-domain alignments were used to provide a domain location for all available human disease mutations and polymorphisms. The number of disease mutations and polymorphisms in each domain position are displayed alongside other relevant functional information (e.g. the binding and catalytic activity of the site and the conservation of that domain location). DMDM's protein domain view highlights molecular relationships among mutations from different diseases that might not be clearly observed with traditional gene-centric visualization tools.

    PRODOM (PRABI, Lyon, Fr)
    ProDom is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases

    PDB - Protein Database ( San Diego, Us)
    The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease

    PDBSUM ( EBI, Hinxton, Uk)
    PDBsum is a pictorial database providing an at-a-glance overview of every macromolecular structure deposited in the Protein Data Bank (PDB). It provides schematic diagrams of the molecules in each structure and of the interactions between them. Entries are accessed by their PDB code, by simple text search, or through any of the browse options on the left.

    IMB (Jena, De)
    The Jena Library of Biological Macromolecules (JenaLib) is aimed at a better dissemination of information on three-dimensional biopolymer structures with an emphasis on visualization and analysis.

    SBKB (Rutgers, Us)
    The Structural Biology Knowledgebase provides the latest research data, resources, and highlights from structural biology and the Protein Structure Initiative.

    AlphaFold PDB-eKB (EMBL, Hinxton, Uk)
    PDBe-KB (Protein Data Bank in Europe - Knowledge Base) is a community-driven resource managed by the PDBe team, collating functional annotations and predictions for structure data in the PDB archive. PDBe-KB is a collaborative effort between PDBe and a diverse group of bioinformatics resources and research teams.
    PDBe-KB contains data contributed by projects such as SIFTS and FunPDBe and aims to place structures from the PDB in their biological context.

    SCOP (Berleley, Us)
    SCOPe is a database developed at the Berkeley Lab and UC Berkeley that extends SCOP (version 1). SCOPe classifies many structures released since SCOP 1.75 through a combination of automation and manual curation, and corrects some errors, aiming to have the same accuracy as the fully hand-curated SCOP releases. SCOPe also incorporates and updates the Astral database.

    CATH ( UCL, London, Uk)
    CATH is a classification of protein structure downloaded from the PDB.

    Human Protein Atlas (Upsalla, Su)
    The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry (IHC) images and immunofluorescence (IF) confocal microscopy images.

    HPRD - Human Protein Reference Database ( John Hopkins, Baltimore, Us)
    The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data. HPRD has been created using an object oriented database in Zope, an open source web application server, that provides versatility in query functions and allows data to be displayed dynamically.


    Protein Interaction databases

    STRING ( EMBL)
    STRING : Protein-Protein Interaction Network

    STITCH ( EMBL)
    STITCH : Chemical-Protein Interaction Networks

    DIP ( UCLA, Us)
    The DIP (Database of Interacting Proteins) database lists protein pairs that are known to interact with each other. By interact we mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level.

    IntAct - EBI ( EBI, Hinxton, Uk)
    IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.

    Complex Portal - EBI ( EBI, Hinxton, Uk)
    The Complex Portal is a manually curated, encyclopaedic resource of macromolecular complexes from a number of key model organisms. The majority of complexes are made up of proteins but may also include nucleic acids or small molecules. All data is freely available for search and download. Complexes are defined as an assembly of any two or more proteins and/or nucleic acids that are stable enough in vitro to be reconstituted and have been demonstrated to have a specific molecular function.

    FunCoup ( KTH, Stockholm, Su)
    FunCoup is a statistical framework of data integration for finding functional coupling (FC) between proteins. It transfers information from model organisms (M. musculus, D. melanogaster, C. elegans, S. cerevisiae etc.) via orthologs found by InParanoid program (Remm et al., 2001). Data of different sources and various natures, such as contacts of whole proteins and individual domains, mRNA and protein expression, localization in tissues and cellular compartments, miRNA and TF targeting, similar phylogenetic profiles etc., are collected and probabilistically evaluated in a Bayesian network (BN), trained on sets of known FC cases (e.g. KEGG, IntAct, HPRD, or GRID resources) vs. sets of randomly picked protein pairs as background reference

    BioGRID ( Toronto, Ca)
    Biological General Repository for Interaction Datasets
    BioGRID is an online interaction repository with data compiled through comprehensive curation efforts. Our current index is version 3.1.78 and searches 27,283 publications for 402,127 raw protein and genetic interactions from major model organism species. All interaction data are freely provided through our search index and available via download in a wide variety of standardized formats.


    Ontology - Pathways

    Gene Ontology ( Us)
    The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members, as well as tools to access and process this data.

    QuickGO (EBI, Hinxton, Uk)
    A fast browser for Gene Ontology terms and annotations.

    PRO Protein Ontology
    PRO provides an ontological representation of protein-related entities by explicitly defining them and showing the relationships between them. Each PRO term represents a distinct class of entities (including specific modified forms, orthologous isoforms, and protein complexes) ranging from the taxon-neutral to the taxon-specific (e.g. the entity representing all protein products of the human SMAD2 gene is described in PR:Q15796; one particular human SMAD2 protein form, phosphorylated on the last two serines of a conserved C-terminal SSxS motif is defined by PR:000025934). Current release: 67.0, August 08, 2022.

    Kegg (NCI) Kegg (Kyoto) Kyoto Encyclopedia of Genes and Genomes (Kyoto, Jp)
    KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (See Release notes for new and updated features).

    BioCarta Pathways ( NCI, Bethesda, Us)

    Reactome ( iOICR, ca, New-York, Us, EBI, Uk)
    REACTOME is a free, online, open-source, curated pathway database encompassing many areas of human biology. Information is authored by expert biological researchers, maintained by the Reactome editorial staff and cross-referenced to a wide range of standard biological databases.

    Pathway Commons (Toronto, Dana Faber)
    Pathway Commons : Access and discover data integrated from public pathway and interactions databases. 5772 Pathways -- 2424055 Interactions -- 22 Databases

    NDEx (Network Data Exchange) (University of California, Us)
    Biomolecular interactions and cellular processes assembled into authoritative human signaling pathways The NDEx Public Server includes a large number of networks that are marked as PUBLIC and are therefore accessible without signing in to a user account. Public networks can be found, viewed, and queried anonymously using the search bar provided in the NDEx Public Server's landing page.

    Atlas of Cancer Signalling Networks global map (Curie, Paris, Fr)
    ACSN is a pathway database and a web-based environment that contains a collection of interconnected cancer-related signalling network maps. Cell signalling mechanisms are depicted on the maps at the level of biochemical interactions, forming a large network of 4600 reactions covering 1821 proteins and 564 genes and connecting several major cellular processes. The Atlas is a "geographic-like" interactive "world map" of molecular interactions involved in cancer.

    Wiki Pathways
    WikiPathways is an open, public platform dedicated to the curation of biological pathways by and for the scientific community.


    Orthology - Evolution

    OrthoDB The Hierarchical Catalog of Eukaryotic Orthologs (Univ. Geneve, Ch)
    OrthoDB presents a catalog of eukaryotic orthologous protein-coding genes across 48 vertebrates, 33 arthropods, 73 fungi, and 12 basal metazoans. Orthology refers to the last common ancestor of the species under consideration, and thus OrthoDB explicitly delineates orthologs at each radiation along the species phylogeny

    Homologene (NCBI, Bethesda, Us)
    HomoloGene is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.

    TREEFAM : Tree families database (EBI, Hinxton, uk)
    TreeFam (Tree families database) is a database of phylogenetic trees of animal genes. It aims at developing a curated resource that gives reliable information about ortholog and paralog assignments, and evolutionary history of various gene families

    Gene Sorter (UCSC, San Diego, Us)
    The UCSC Gene Sorter is an excellent resource for exploring gene families and the relationships among genes. This tool displays a table of genes within a selected genome that are related to one another. Several different relationships may be explored: protein-level homology, similarity of gene expression profiles, or genomic proximity. The Gene Sorter supports searches on a variety of terms and phrases, including the gene name, the SwissProt protein name, a GenBank accession, or a word or phrase present in a gene's description. The gene family display is highly configurable, allowing the user to control the order and number of columns, the number of rows, and the genes displayed. The tool provides several output formats, including a simple tab-delimited format that may be imported into a spreadsheet or a relational database.

    InParanoid ( Stockholm, Su)
    InParanoid: Eukaryotic Ortholog Groups


    Gene fusion - Chromosomal Rearrangment

    COSMIC ( Sanger Center, Hinxton, Uk)
    All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. There is a vast amount of information available in the published scientific literature about these changes. COSMIC is designed to store and display somatic mutation information and related details a nd contains information relating to human cancers.

    TCGA Fusion Portal (Jackson Lab, Us)
    Transcripts fusion as a result of genomic rearrangement is an important class of somatic alteration, as a cancer initiating event and as a molecular therapeutic target for specific tumors. Our Pipeline for RNA sequencing Data Analysis (PRADA) enables us to detect fusion transcripts with high confidence comprehensively. Based on integrated analysis of paired-end RNA sequencing and DNA copy number data from The Cancer Genome Atlas(TCGA), The Tumor Fusion Gene Data Portal provides a bona-fide fusion list across many tumor types.

    Fusion Cancer (Beijing) (Beijing, Cn)
    Next-generation mRNA sequencing (RNA-seq) has long been recognized as an effcient tool in dynamic transcriptome analysis. It can provide not only an increased base coverage, but also a higher sample throughput. It facilitates the ability to search fo r alternative-spliced transcripts, post-transcriptional modifications, gene fusions, mutations/SNPs and changes in gene expre ssion. Many databases have been set up for fusion gene detection research, such as Mitelman Database of Chromosome Aberration and Gene Fusions in Cancer and ChimerDB. But they are derived either from experiments or transcript sequences, containing li mited records. The huge amount of RNA-seq data produced in the past few years provides abundant resources in fusion gene dete ction. So we can use these RNA-seq data in the Sequence Read Archive (SRA) on NCBI to look for fusion genes in human cancer g enome. Y. Wang et al. (2015) Diagnostic Pathology,10,131.

    FusionGDB (UTH,Us)
    FusionGDB is the Fusion Gene annotation DataBase, aiming to provide a resource or reference for functional annotation of fusion genes in cancer for better therapeutic targets. We first collected 48 117 FGs across pan-cancer from three representative fusion gene resources: the improved database of chimeric transcripts and RNA-seq data (ChiTaRS 3.1), an integrative resource for cancer-associated transcript fusions (TumorFusions), and The Cancer Genome Atlas (TCGA) fusions by Gao et al. For these ~48K FGs, we performed functional annotations including gene assessment across pan-cancer fusion genes, open reading frame (ORF) assignment, and protein domain retention search based on multiple isoform gene structure with multiple break points and finally provided the fusion transcript and amino acid sequences for each break point and gene isoforms. For each fusion partner gene, the user can access multiple annotations such as gene summary, assessment scores of each gene in pan-cancer, biological process gene ontologies, functional description, retention information of 39 protein features and protein-protein interaction (PPI), related drugs and diseases through six categories. Among ~44K FGs checked ORFs, there were ~ 10K in-frame and ~11K frame-shift FGs. Of these, we have identified 331, 303, 840, and 667 in-frame FGs retaining kinase domain, DNA-binding domain, oncogene domains, and epifactor domains in fusion proteins. Furthermore, we identified 896 and 118 in-frame FGs not-retained their functional domains of tumor suppressor genes and DNA damage repair genes, respectively. On the other hand, we identified 6863 FGs retaining their functional domains, but lost the function due to the frame-shift.

    ChimerDB (Ewha Womans University, Kr)
    Chromosome translocation and gene fusion are frequent events in the human genome are often the cause of many types of tumor. ChimerDB is designed to be a knowledgebase of fusion transcripts collected from various public resources such as the Sanger CGP, OMIM, PubMed, and Mitelman's database.

    dbCRID (Houston, Us)
    dbCRID is a curated database of human CRs and associated diseases. The current release of dbCRID includes 2,643 individually curated entries of experimentally tested CRs, their associated diseases and/or clinical symptoms, as well as detailed information about the CRs, including the precise locations of the breakpoints, the genes involved, and junction sequences, the experimental techniques applied, and links to the original studies. These data were curated from 1,172 original studies.

    Mitelman Database of Chromosome Aberrations in Cancer The information in the Mitelman Database of Chromosome Aberrations in Cancer relates chromosomal aberrations to tumor characteristics, based either on individual cases or associations. All the data have been manually culled from the literature by Felix Mitelman, Bertil Johansson, and Fredrik Merten
    Mitelman Cases Full Searcher

    Archer - Quiver Fusion Database (Boulder, Us)
    Quiver is a curated database of known gene fusions involved in Cancer. The database includes internally curated data and entries imported from publicly available sources. Current version: 4.5.

    arrayMap - genomic arrays for copy number profiling in human cancer (UZH-SIB, Zurich, Ch)
    arrayMap is a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. For the majority of the samples, probe level visualization as well as customized data representation facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools, as we provide through our Progenetix project.
    arrayMap is developed by the group "Theoretical Cytogenetics and Oncogenomics" at the Institute of Molecular Life Sciences of the University of Zurich.
    The link toward arrayMap is construct around the definition of the ICD-O3 topo and the ICD-O3 Morpho standards. If the tags are not adequate il is aloways possible to selct other terms.

    1. select all the terms you want in the interface
    2. submit
    3. selections options and parameters
    4. Analyse

    CONAN : Cell lines Project: Copy Number Analysis ( Sanger Center, Hinxton, Uk)


    Polymorphism : SNP, mutations

    dbSNP Single Nucleotide Polymorphism (NCBI, Bethesda, Us)
    A Database of Single Nucleotide Polymorphisms : A key aspect of research in genetics is associating sequence variations with heritable phenotypes. The most common variations are single nucleotide polymorphisms (SNPs), which occur approximately once every 100 to 300 bases. Because SNPs are expected to facilitate large-scale association genetics studies, there has recently been great interest in SNP discovery and detection.

    HAPMAP (NCBI, Bethesda, Us)
    The International HapMap Project is a partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom and the United States to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals. See "About the International HapMap Project" for more information.

    Exome Variant server (EVS) ( Washington, Us)
    The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.

    gnomAD (Broad Institute, Boston, Us)
    The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. The data set provided on this website spans 123,136 exome sequences and 15,496 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies. The gnomAD Principal Investigators and groups that have contributed data to the current release are listed here.

    Varsome (US)
    VarSome is a search engine, aggregator and impact analysis tool for human genetic variation and a community-driven project aiming at sharing global expertise on human variants. It renders and displays a detailed annotation of the queried variant, including multiple notations, predicted pathogenicity status from a variety of tools, genomic context, as well as information from 35+ public databases. It allows users to mark the pathogenicity of variants and to link variants to specific phenotypes, diseases and publications. Finally, it provides an automated pathogenicity assessment consistent with the widely accepted ACMG guidelines. It therefore provides a powerful analysis resource as well as a repository for the accumulated global knowledge of the genomics community. From a technical point of view, it allows convenient programmable single-point interface (API) for accessing all its data

    M-CAP (US)
    Mendelian Clinically Applicable Pathogenicity (M-CAP) Score M-CAP is the first pathogenicity classifier for rare missense variants in the human genome that is tuned to the high sensitivity required in the clinic (see Table). By combining previous pathogenicity scores (including SIFT, Polyphen-2 and CADD) with novel features and a powerful model, we attain the best classifier at all thresholds, reducing a typical exome/genome rare (<1%) missense variant (VUS) list from 300 to 120, while never mistaking 95% of known pathogenic variants as benign.

    Varity (US)
    VARITY (Improved pathogenicity prediction for rare human missense variants) Web Application User Guide. This web application provides: 1) Search and visualize VARITY predictions, features and feature contributions for all possible single nucleotide change missense variants for each of 18,239 human proteins. 2) Download VARITY predictions in one file for all 18,239 proteins.
    NOTE: All VARITY predictions are for research purpose and should be appropriately validated before clinical use

    ICGC ( OICR, Ontario, Ca)
    ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor typ es and/or subtypes which are of clinical and societal importance across the globe.

    TCGA Copy Number Portal
    (Broad Institute, Boston, Us)
    This portal is designed to facilitate the use and understanding of high resolution copy number data amassed from cancer samples in the TCGA. All data in this portal were generated at the Broad Institute TCGA Genome Characterization Center. This portal is modeled after Tumorscape which contains copy number data from non-TCGA projects (Beroukhim et al., 2010).

    CENSUS (Sanger Center, Hinxton, Uk)
    The Cancer Gene Census is an ongoing effort to catalogue those genes for which mutations have been causally implicated in cancer. The original census and analysis was published in Nature Reviews Cancer and supplemental analysis information related to the paper is also available.
    The census is not static but rather is updated regularly/as needed. In particular we are grateful to Felix Mitelman and his colleagues in providing information on more genes involved in uncommon translocations in leukaemias and lymphomas. Currently, more than 1% of all human genes are implicated via mutation in cancer. Of these, approximately 90% have somatic mutations in cancer, 20% bear germline mutations that predispose to cancer and 10% show both somatic and germline mutations.

    COSMIC (Sanger Center, Hinxton, Uk)
    All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. There is a vast amount of information available in the published scientific literature about these changes. COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.

    LOVD Leiden Open Variation Database 3.0 (Leiden, Ne
    LOVD stands for Leiden Open (source) Variation Database. LOVD's purpose : To provide a flexible, freely available tool for Gene-centered collection and display of DNA variations. LOVD 3.0 extends this idea to also provide patient-centered data storage and storage of NGS data, even of variants outside of genes. LOVD is open source, released under the GPL license, and is actively being improved, currently we have releases every month.

    BioMuta v2 (Georges Washington Univ, Washington DC, Us)
    BioMuta v2.0 is a curated single-nucleotide variation (SNV) and disease association database where the variations are mapped to the genome/protein/gene.

    DoCM DoCM Database of curated mutations (WUSTL, Us)
    DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

    CIViC Clinical Interpretations of Variants in Cancer (WUSTL, Us)
    The CIViC database is based on Evidence Items which reference their parent Variants, Variant Groups, and Genes. You may explore the various CIViC entities and their attribute using the menu to your left (or above, if you're viewing this on a mobile display).

    intoGen Integrative Onco Genomics (Barcelone, Es)
    Mutational cancer driver database

    NCG Network of Cancer Genes (London, Uk)
    NCG collects 3,347 cancer driver genes from Census of Cancer Genes (CGC), Vogelstein, Science 2013, Saito, Nature 2020 and screenings of cancer tissues, well as 95 healthy drivers from screenings of non-cancer tissues

    OncoKB MSK's Precision Oncology Knowledge Base (MSK, NY, USA)
    An FDA-Recognized Human Genetic Variant Database* Powered by the clinical expertise of Memorial Sloan Kettering Cancer Center When using OncoKB, please cite: Chakravarty et al., JCO PO 2017.

    CancerInterpreter (Barcelona, Es)
    Cancer Genome Interpreter (CGI) is designed to support the identification of tumor alterations that drive the disease and detect those that may be therapeutically actionable. CGI relies on existing knowledge collected from several resources and on computational methods that annotate the alterations in a tumor according to distinct levels of evidence.

    Cancer3D Cancer3D (Sanford Burham, Ca)
    Cancer3D is a database that unites information on somatic missense mutations from TCGA and CCLE, allowing users to explore two different cancer-related problems at the same time: drug sensitivity/biomarker identification and prediction of cancer drivers. The database is an interface to two novel algorithms, e-Driver and e-Drug, that make use of information about the internal structure of a protein to predict novel cancer drivers or drug biomarkers respectively. Moreover, it maps somatic missense mutations from over 18,000 human proteins to more than 25,000 protein structures from PDB.

    HGMD Human Gene Mutation Database ( Institute of Medical Genetics, Cardiff, Uk)
    Human gene mutation is a highly specific process, and this specificity has important implications for the nature, prevalence and therefore diagnosis of genetic disease. Indeed, the recognition that certain DNA sequences are hypermutable has yielded clues as to the endogenous mutational mechanisms involved and provided insights into the intricacies of the processes of DNA replication and repair (Cooper and Krawczak 1993). In practical terms, a fuller understanding of the mutational process may prove important in molecular diagnostic medicine by contributing to improvements in the design and efficacy of mutation search procedures and strategies in different genetic disorders.
    The Human Gene Mutation Database (HGMD) represents an attempt to collate known (published) gene lesions responsible for human inherited disease. This database, whilst originally established for the study of mutational mechanisms in human genes (Cooper and Krawczak 1993), has now acquired a much broader utility in that it embodies an up-to-date and comprehensive reference source to the spectrum of inherited human gene. Thus, HGMD provides information of practical diagnostic importance to (i) researchers and diagnosticians in human molecular genetics, (ii) physicians interested in a particular inherited condition in a given patient or family, and (iii) genetic counsellors.

    dbVar (Bethesda, NCBI, Us)
    dbVar is NCBI's database of genomic structural variation - it contains insertions, deletions, duplications, inversions, multinucleotide substitutions, mobile element insertions, translocations, and complex chromosomal rearrangements

    DGV -Genomic Variants (Toronto, Ca)
    The objective of the Database of Genomic Variants is to provide a comprehensive summary of structural variation in the human genome. We define structural variation as genomic alterations that involve segments of DNA that are larger than >1kb. Now we also annotate InDels in 100bp-1kb range. The content of the database is only representing structural variation identified in healthy control samples.
    The Database of Genomic Variants provides a useful catalog of control data for studies aiming to correlate genomic variation with phenotypic data. The database is continuously updated with new data from peer reviewed research studies. We always welcome suggestions and comments regarding the database from the research community.

    DECIPHER Sanger Center, Hinxton, Uk)
    DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.
    DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

    SNPs3D (UMD, Us)
    SNPs3D is a website which assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis.

    Genetic Association Database (NIH, Bethesda, Us)
    The Genetic Association Database is an archive of human genetic association studies of complex diseases and disorders. The goal of this database is to allow the user to rapidly identify medically relevant polymorphism from the large volume of polymorphism and mutational data, in the context of standardized nomenclature.

    Cancer variants portal (SIB, Geveve)


    Diseases

    OMIM Online Mendelian Inheritance in Man" (John Hopkins, Baltimore, Us)
    OMIM is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and developed for the World Wide Web by NCBI, the National Center for Biotechnology Information. The database contains textual information, pictures, and reference information. It also contains copious links to NCBI's Entrez database of MEDLINE articles and sequence information.

    MedGen ( NCBI, Bethesda, Us)
    MedGen is NCBI portal to information about human disorders and other phenotypes having a genetic component. MedGen is structured to serve health care professionals, the medical genetics community, and other interested parties by providing centralized access to diverse types of content. For example, because MedGen aggregates the plethora of terms used for particular disorders into a specific concept, it provides a Rosetta stone for stakeholders who may use different names for the same disorder. Maintaining a clearly defined set of concepts and terms for phenotypes is essential to support efforts to characterize genetic variation by its effects on specific phenotypes. The assignment of identifiers for those concepts allows computational access to phenotypic information, an essential requirement for the large-scale analysis of genomic data.

    dbGap ( NCBI, Bethesda, Us)
    The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.

    ClinVar ( NCBI, Bethesda, Us)
    ClinVar is designed to provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. By so doing, ClinVar facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation. ClinVar collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible. Information about using ClinVar.

    GTR (The Genetic Testing Registry) ( NIH, Bethesda, Us)
    The Genetic Testing Registry (GTR) provides a central location for voluntary submission of genetic test informat ion by providers. The scope includes the test's purpose, methodology, validity, evidence of the test's usefulnes s, and laboratory contacts and credentials. The overarching goal of the GTR is to advance the public health and research into the genetic basis of health and disease.

    Open Targets (Hinxton, Uk)
    The Target Validation Platform (www.targetvalidation.org) aims to support researchers in identifying early drug targets faster and with more confidence. The platform integrates data from several public databases and is the result of a collaboration between the Sanger Institute, GlaxoSmithKline (GSK), the European Bioinformatics Institute (EBI) and Biogen.
    As part of our ongoing efforts to improve this valuable public resource, we want to talk to experimental biology researchers who study associations of human genes with diseases. We are interested in understanding how well the platform meets your needs and what other information and features would make it more useful to you. A typical session takes about an hour of your time. Previous participants have found them to be a lot of fun!

    HuGE Navigator
    HuGE Navigator provides access to a continuously updated knowledge base in human genome epidemiology, including information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic tests
    The Office of Public Health Genomics (OPHG), CDC The Centers for Disease Control and Prevention (CDC) established the Office of Public Health Genomics (OPHG) in 1997. OPHG aims to integrate genomics into public health research, policy, and programs, which could improve interventions designed to prevent and control the country's leading chronic, infectious, environmental, and occupational diseases.
    OPHG's efforts focus on conducting population-based genomic research, assessing the role of family health history in disease risk and prevention, supporting a systematic process for evaluating genetic tests, translating genomics into public health research and programs, and strengthening capacity for public health genomics in disease prevention programs. (Centers for Disease Control and Prevention (CDC)

    ORPHANET : Database of rare diseases and orphan drugs (INSERM, Paris, Fr)
    This project is the result of a commonly observed fact: rare diseases are difficult to deal with for medical practitioners. This is due to their restricted knowledge of the diseases' natural history, the patient care required, treatment, and sometimes even of its existence. Scientific knowledge exists, or at least partial knowlege, but it is scattered. Because of the physical media on which it is communicated, the information is difficult to access for the great majority of physicians, not to mention patients and their families. Only a very small number of doctors specialize in these diseases, and their practices are scarcely known, sometimes even totally unknown to other practitioners.
    The fields currently covered are:

  • a message-taking service that dispatches user's questions to an expert in the field.

    DisGeNET (Es)
    DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated to human diseases (Pi ero et al., 2016; Pi ero et al., 2015). DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype phenotype relationships.

    ClinGen : Clinical Genome resource (NIH, Bethesda, Us)
    ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building an authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.


    Clinical trial, drugs and therapy

    BioCentury BCIQ (Redwood City, Us)
    BioCentury's BCIQ is unlike any other business intelligence and research tool on the market. It combines over 20 years of BioCentury's leading analysis of the biopharma industry with four, easy-to-use, fully integrated modules. The data are fully vetted and meticulously maintained. Quite simply, BCIQ is the most accurate, in-depth tool for your research and business intelligence need

    DGIdb The Drug Gene Interaction Database (WUSTL, Us)
    Search Interactions search for drug-gene interactions by gene or drug names

    My Cancer Genomes (Vanderbilt, Us)
    My Cancer Genome contains information on the clinical impact of molecular biomarkers in cancer-related genes, proteins, and other biomarker types on the use of anticancer therapies in cancer. This information is derived from FDA labels, NCCN and other professional society guidelines, clinical trials, peer-reviewed publications, and more.

    CTD : Comparative Genomics Database ( NC State Univ, Us)
    The Comparative Toxicogenomics Database (CTD) elucidates molecular mechanisms by which environmental chemicals affect human disease.
    Chemical-gene/protein interactions and chemical- and gene-disease relationships are curated from the published literature, and integrated with diverse data (chemicals, genes/proteins, human diseases, references, sequences, vertebrate and invertebrate organisms, and the Gene Ontology) to facilitate environmental health research.

    PharGKB ( Stanford, Us)
    PharmGKB is a comprehensive resource that curates knowledge about the impact of genetic variation on drug response for researchers and clinicians. We encompass clinical information including dosing guidelines and drug labels, potentially clinically actionable gene-drug associations and genotype-phenotype relationships.

    Genomics of Drug Sensitivity in Cancer (Welcome Trust) (Sanger, Hinxton, Uk)
    The Genomics of Drug Sensitivity in Cancer project is an academic research program to identify molecular features of cancers that predict response to anti-cancer drugs.

    Clinical Trial ( NIH, Bethesda, Us)
    ClinicalTrials.gov is a registry and results database of federally and privately supported clinical trials conducted in the United States and around the world. ClinicalTrials.gov gives you information about a trial's purpose, who may participate, locations, and phone numbers for more details. This information should be used in conjunction with advice from health care professionals.

    DEPO (Database of Evidence for Precision Oncology) ( Washington University, Us)
    DEPO is the Database of Evidence for Precision Oncology, containing druggable variant information such as drug therapy, evidence levels (FDA-approved, Clinical Trials, Case Reports, Preclinical), and the cancer types for intended treatments. The pie chart below summarizes the curated variants (drug-sensitive or -resistant) according to the following: copy number variation (CNV), which corresponds to either copy number amplication or loss; gene fusion; expression outlier, which refers to genes whose elevated and reduced expression is associated with drug response; and mutations, which refers to missense, nonsense, in-frame indels, and frameshift mutations

    Pharos Druggable Genome (IDG) program ( NIH, Us)
    Pharos is the user interface to the Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) program funded by the National Institutes of Health (NIH) Common Fund. (Grant No. 1U24CA224370-01). The goal of KMC is to develop a comprehensive, integrated knowledge-base for the Druggable Genome (DG) to illuminate the uncharacterized and/or poorly annotated portion of the DG, focusing on three of the most commonly drug-targeted protein families:

    BioGrid ORCS ( Toronto)
    Welcome to the BioGRID Open Repository of CRISPR Screens (ORCS)
    BioGRID ORCS is an open repository of CRISPR screens compiled through comprehensive curation efforts. Our current index is version 1.1.12 and searches 266 publications and 86,066 genes to return 1,592 CRISPR screens from 4 major model organism species, 745 cell lines, and 127 cell types. All screen data are freely provided through our search index and available via download in a wide variety of standardized formats.


    Miscellaneous
    Resources for Molecular Cytogenetics (Bari, It)
    Collection of PAC and BAC probes useful for specific tumors.


    Bibliography, Data mining

    PubMed (NLM , Us)

    COREMINE ( Oslo, No)
    Coremine Medical is a product of the PubGene Company designed to be used by anyone seeking information on health, medicine and biology. It is ideal for those seeking an overview of a complex subject while allowing the possibility to "drill down" to specific details. Search results are presented in a dashboard format comprised of panels containing various categories of information ranging from introductory sources to the latest scientific articles.

    EVEX ( Text mining ) ( Turku, Est)
    EVEX is a text mining resource built on top of PubMed abstracts and PubMed Central full texts. It contains over 40 million bio-molecular events among more than 76 million automatically extracted gene/protein name mentions. The text mining data further has been enriched with gene normalization results, allowing straightforward integration with external resources. Further, gene families from Ensembl and HomoloGene provide homology-based event generalizations. EVEX presents both direct and indirect associations between genes and proteins, enabling explorative browsing of relevant literature.
    The EVEX website provides summarized information on various bio-molecular events, accounting for lexical variation of gene/protein symbols and dealing with synonymy and abbreviations. Both direct and indirect associations can be retrieved, and homology-based generalizations provide the opportunity to retrieve information on entire gene families.

    Harminozome (Ma'ayan Laboratory of Computational Systems Biology, Us)
    Search for genes or proteins and their functional terms extracted and organized from over a hundred publicly available resources.

    ARCHS4 (NY , Us) ARCHS4: Massive Mining of Publicly Available RNA-seq Data from Human and Mouse
    The Ma'ayan Laboratory applies machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation.
    Our research team develops software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems.

    Bionity ( MSKCC, New-York, Us)
    A network of genes and proteins extends through the scientific literature, touching on phenotypes, pathologies and gene fun ction. We report the development of an information system that provides this network as a natural way of accessing the more than ten million abstracts in PubMed. By using genes and proteins as hyperlinks between sentences and abstracts, we conver t the information in PubMed into one navigable resource and bring all the advantages of the internet to scientific literatu re investigation.
    Moreover, this literature network can be superimposed on experimental interaction data (e.g. yeast two-hybrid data from Dro sophila melanogaster and Caenorhabditis elegans) to make possible a simultaneous analysis of new and existing knowledge. Th e network, called Information Hyperlinked over Proteins (iHOP), contains half a million sentences and 30,000 different gene s from humans, mice, D. melanogaster, C. elegans, zebrafish, Arabidopsis thaliana, yeast and Escherichia coli. The iHOP server is publicly accessible here.

    ZODIAC ( Evanston, Us)
    Zodiac, like Google, is a search engine. You type queries in the search bar above, and Zodiac subsequently returns search results. We advise that you read t he tutorial above, or continue reading below.
    Zodiac helps you understand how genes interact in cancer conditions. The networks contained in Zodiac are based on rigorous statistical inference utilizing prior knowledge and TCGA data.
    Zodiac helps discover interacting gene, and consequently, potential drug targets.
    Zodiac identifies potential genetic aberrations such as gene fusions.
    Zodiac is easy to use and allows for real-time look-up of genetic interactions while reading a paper, listening to a seminar, or browsing the internet.

    DataMed Index ( Us)
    DataMed prototype(v3.0) is being developed for the NIH BD2K Data Discovery Index (DDI) by the bioCADDIE project team. DataMed, once completed, will be of use to the scientific community to allow users to search for and find data across different repositories in one space. We are soliciting your feedback to help us shape DataMeds' future development. Please take a moment to answer this brief Survey Form and give us your thoughts. We believe your voice will be a critical addition to the development of the bioCADDIE prototype. Thank you, from the bioCADDIE team.


    Home    Genes    Gene Fusions    Haematological    Solid Tumors    Cancer prones    Case Reports    Deep Insight    Teaching

    X Y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

    Philippe Dessen
    Last update : Mon Oct 3 11:56:02 CET 2022