Skip to main content

IBM Leadership Seminars

Seminar Navigation

Clinical Genomic Analysis Workshop 2011
June 2, 2011
Organized by IBM Haifa Research Lab


Estimating Heritability Using Random Effects Models
David Golan, Tel Aviv University

Random effects models have recently been introduced as an approach for analyzing genome wide association studies (GWAS), which allows estimation of overall heritability of traits without explicitly identifying the genetic loci responsible. Using this approach, Yang et al. (2010) have demonstrated that the heritability of height is much higher than the ~10% associated with identified genetic factors. However, Yang et al. relied on a heuristic for performing estimation in this model. We adopt the model framework of Yang et al. (2010) and develop a method for maximum likelihood (ML) estimation in this framework. Our method is based on MCEM (Wei et al., 1990), an expectation-maximization algorithm wherein a Markov chain Monte Carlo approach is used in the E-step. We demonstrate that this method leads to more stable and accurate heritability estimation compared to the approach of Yang et al. (2010), and it also allows us to find ML estimates of the portion of markers which are causal, indicating whether the heritability stems from a small number of powerful genetic factors or a large number of less powerful ones.

Linkage Analysis in the Presence of Germline Mosaicism
Dan Geiger, Technion – Israel Institute of Technology

Genetic linkage analysis is a widely used statistical method for genetic mapping. This method is successful in mapping genes involved in simple Mendelian diseases, but is less powerful in mapping genes that do not follow simple Mendelian inheritance. Germline mosaicism is a genetic condition in which some germ cells of an individual contain a mutation. We extend the statistical model used for genetic linkage analysis in order to incorporate germline mosaicism. We develop a likelihood ratio test for detecting whether a genetic trait has been introduced into a pedigree by germline mosaicism. We analyze the statistical properties of this test and demonstrate its effectiveness via computer simulations. We further use this test to provide solid statistical evidence that the MDN syndrome studied by Genzer-Nir et al. was originated by germline mosaicism. This work was done jointly by Omer Weissbrod and the speaker.

Generalized Alpha Investing: Definitions, Optimality Results, and Applications to Public Bioinformatics Databases
Ehud Aharoni, IBM Research - Haifa

The increasing prevalence and utility of large, public databases in the field of bio-informatics necessitates the development of appropriate methods for controlling false discovery. Motivated by this problem, we discuss the generic problem of testing a possibly infinite stream of null hypotheses. In this context, Foster and Stine (2007) proposed a false discovery measure they called mFDR, and an approach for controlling it named alpha investing. We generalize alpha investing and use our generalization to derive optimal allocation rules for the case of simple hypotheses. We demonstrate empirically that this approach is more powerful than alpha investing while controlling mFDR.
We then present the concept of quality preserving databases (QPD), originally introduced in Aharoni et al. (2010), which formalizes efficient public database management to simultaneously save costs and control false discovery. We show how one variant or generalized alpha investing can be used to control mFDR in a QPD and lead to significant reduction in costs compared to naïve approaches for controlling the family-wise error rate implemented in Aharoni et al. (2010).

Enrichment Statistics for Ranked Lists and Applications in Genomics
Zohar Yakhini, Agilent Laboratories and the Technion

I will describe a statistical approach to assessing the statistical significance of high density of 1s in either side of a binary vector. This method is used for analyzing the enrichment of elements at the top of ranked lists. The full characterization of the distribution of this statistics can be obtained through a simple dynamic programming procedure.

Useful applications include motif finding, the identification of sequence elements related to DNA methylation, enrichment of GO derived gene sets (through the web-based application GOrilla), the joint analysis of miRNA and mRNA profiling data and the study of interactions between miRNA and RBPs (RNA binding proteins). I will discuss examples with emphasis on the biological results. For example, we used the ranked lists approach to perform miRNA and mRNA joint analysis in a study of a cohort of 100 breast cancer samples and discovered several novel relationships, including a direct association of miR-29 to extra cellular matrix density of the tumors.

Computational Analysis of Gene Regulation, Disease Classification, and Protein Networks
Ron Shamir, Tel Aviv University

Understanding complex disease is one of today's grand challenges. In spite of the rapid advance of biotechnology, disease understanding is still very limited and further computational tools for disease-related data analysis are in dire need. In this talk I will describe some of the tools that we are developing for these challenges. I will describe methods for utilizing expression profiles of sick and healthy individuals to identify pathways dysregulated in the disease, methods for integrated analysis for microRNA expression and protein interactions in stem cells, and methods for regulatory motif discovery.

Analysis of Complex Population Structure with Applications
Eran Halperin, Tel Aviv University

It is becoming increasingly evident that the analysis of genotype data from populations of complex structure such as recently admixed populations provides important insights into human population demographic history and disease genetics. Such analyses have been used to find novel genomic regions associated with disease, to understand recombination rate variation and recent selection events. In this talk, I will provide an overview of the methods we developed for the analysis of such populations, and I will illustrate how these methods provide opportunities to identify regions under selection, reconstruct recombination maps, and to reconstruct haplotypes of extinct populations.

Uncovering the Human Cell Lineage Tree: The Next Grand Scientific Challenge
Ehud Shapiro, Weizmann Institute of Science

The cell lineage tree of a person captures the history of the person's cells since conception. In computer science terms it is a rooted, labeled binary tree, where the root represents the primary fertilized egg, leaves represent extant cells, internal nodes represent past cell divisions, and vertex labels record cell types. It has approximately 100 trillion leaves and 100 trillion branches (≈100,000 bigger than the Human genome); it is unknown.

We should strive to know it, as many central questions in biology and medicine are actually specific questions about the Human cell lineage tree, in health and disease: Which cancer cells initiate relapse after chemotherapy? Which cancer cells can metastasize? Do insulin-producing beta cells renew in healthy adults? Do eggs renew in adult females? Which cells renew in healthy and in unhealthy adult brain? Knowing the Human cell lineage tree would answer all these questions and more.

Fortunately, our cell lineage tree is implicitly encoded in our cells' genomes via mutations that accumulate when body cells divide. Theoretically, it could be reconstructed with high precision by sequencing every cell in our body, at a prohibitive cost. Practically, analyzing only highly-mutable fragments of the genome is sufficient for cell lineage reconstruction. Our lab has developed a proof-of-concept method and system for cell lineage analysis from somatic mutations. The talk will describe the system and results obtained with it so far, and future plans for this project.

Contact Information