Jing Chen Eric Bardes



Yüklə 445 b.
tarix01.04.2017
ölçüsü445 b.
#13155





Jing Chen

  • Jing Chen

  • Eric Bardes

  • Bruce Aronow







Gene Ontology (GO)

  • Gene Ontology (GO)







  • The UMLS Metathesaurus contains information about biomedical concepts and terms from many controlled vocabularies and classifications used in patient records, administrative health data, bibliographic and full-text databases, and expert systems.

  • The Semantic Network, through its semantic types, provides a consistent categorization of all concepts represented in the UMLS Metathesaurus. The links between the semantic types provide the structure for the Network and represent important relationships in the biomedical domain.

  • The SPECIALIST Lexicon is an English language lexicon with many biomedical terms, containing syntactic, morphological, and orthographic information for each term or word.



about over 1 million biomedical concepts

  • about over 1 million biomedical concepts

  • About 5 million concept names from more than 100 controlled vocabularies and classifications (some in multiple languages) used in patient records, administrative health data, bibliographic and full-text databases and expert systems.

  • The Metathesaurus is organized by concept or meaning. Alternate names for the same concept (synonyms, lexical variants, and translations) are linked together.

  • Each Metathesaurus concept has attributes that help to define its meaning, e.g., the semantic type(s) or categories to which it belongs, its position in the hierarchical contexts from various source vocabularies, and, for many concepts, a definition.

  • Customizable: Users can exclude vocabularies that are not relevant for specific purposes or not licensed for use in their institutions. MetamorphoSys, the multi-platform Java install and customization program distributed with the UMLS resources, helps users to generate pre-defined or custom subsets of the Metathesaurus.

  • Uses:

    • linking between different clinical or biomedical vocabularies
    • information retrieval from databases with human assigned subject index terms and from free-text information sources
    • linking patient records to related information in bibliographic, full-text, or factual databases
    • natural language processing and automated indexing research












Disease candidate gene studies

  • Disease candidate gene studies



Assumption: genes involved in the same complex disease will have similar functions

  • Assumption: genes involved in the same complex disease will have similar functions















Gene Ontology: GO and NCBI Entrez Gene

    • Gene Ontology: GO and NCBI Entrez Gene
    • Mouse Phenotype: MGI (used for the first time for human disease gene prioritization)
    • Pathways: KEGG, BioCarta, BioCyc, Reactome, GenMAPP, MSigDB
    • Domains: UniProt (Pfam, Interpro,etc.)
    • Interactions: NCBI Entrez Gene (Biogrid, Reactome, BIND, HPRD, etc.)
    • Pubmed IDs: NCBI Entrez Gene
    • Expression: GEO
    • Cytoband: MSigDB
    • Cis-Elements: MSigDB
    • miRNA Targets: MSigDB


Random-gene cross-validation

  • Random-gene cross-validation

    • Disease-gene relations from OMIM and GAD databases
    • Training set: disease genes with one gene (“target”) removed
    • Test set: 100 genes = “target” gene + 99 random genes
    • Rank of “target” gene
    • Control: random training sets
    • AUC and Sensitivity/Specificity


Random-gene cross-validation: breast cancer example

  • Random-gene cross-validation: breast cancer example



Random-gene cross-validation result

  • Random-gene cross-validation result



Random-gene cross-validation with only one feature

  • Random-gene cross-validation with only one feature





Locus-region cross-validation using different feature sets

  • Locus-region cross-validation using different feature sets



ToppGene web server (http://toppgene.cchmc.org)

  • ToppGene web server (http://toppgene.cchmc.org)

  • For functional enrichment analysis



ToppGene web server (http://toppgene.cchmc.org)

  • ToppGene web server (http://toppgene.cchmc.org)

  • For functional enrichment analysis



ToppGene web server (http://toppgene.cchmc.org)

  • ToppGene web server (http://toppgene.cchmc.org)

  • For functional enrichment analysis



ToppGene web server (http://toppgene.cchmc.org)

  • ToppGene web server (http://toppgene.cchmc.org)

  • For functional enrichment analysis







Example: Breast cancer

  • Example: Breast cancer



ToppGene web server (http://toppgene.cchmc.org)



ToppGene web server (http://toppgene.cchmc.org)

  • ToppGene web server (http://toppgene.cchmc.org)

  • For candidate gene prioritization



ToppGene web server (http://toppgene.cchmc.org)

  • ToppGene web server (http://toppgene.cchmc.org)

  • For candidate gene prioritization



Example: Breast cancer study. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007 May 27.

  • Example: Breast cancer study. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007 May 27.





Example: Breast cancer

  • Example: Breast cancer



General limitations of any training-test strategy:

  • General limitations of any training-test strategy:

  • Prior knowledge of disease-gene associations.

  • Assumption that the disease genes yet to discover will be consistent with what is already known about a disease.

  • Depend on the accuracy and completeness of the functional annotations.

    • Only one-fifth of the known human genes have pathway or phenotype annotations and there are still more than 40% genes whose functions are not defined!








Yüklə 445 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin