Minimally-Invasive Radiation Biodosimetry
Informatics and Biostatistics Core
Core Leader: Michael Bittner, Translational Genomics Research Institute
All three projects in this consortium make use of advanced biostatistics, informatics and data management techniques, both for data analysis and for tracking of results obtained with samples shared across projects. This Core provides a variety of statistical and analytical support to all three Projects as well as to the Irradiation Core and Pilot Projects. This Core also provides central, secure hosting for data exchange among all Consortium members.
As well as more specific analysis of high-throughput biomolecule measurement issues described below, the Core provides general statistical support in terms of experimental design, and mathematical data analysis.
Direct analytic efforts will center on assessment of molecular functions involved in a variety of radiation responses to general and local radiation, and the impact of the types and levels of these functions on radiation sensitivity in mice and humans (Projects 2-3). This work will incorporate the feature selection and classification approaches already shown to yield highly reliable and informative lymphocyte transcriptional markers in Project 2, to develop transcriptional and small molecule markers of severity of radiation damage, and specialized statistical support of efforts to develop biodosimeters based on gene expression in lymphocytes and small molecule production in body fluids.
In order for biomarkers to be useful as biodosimeters, it is important to be able to determine how accurately and effectively the marker measurements allow classification of irradiated individuals on the basis of the radiation doses they have sustained. To do this the Consortium needs to be able to rapidly find those markers that not only separate the classes, but which do so with the greatest resistance to inter-individual variation and assay noise. The Consortium will also benefit from a clear understanding of the ways in which the fundamental mechanisms of cellular response to radiation give rise to the measurable outcomes in the various assays proposed. In concert with Projects 2 and 3, this Core will analyze functional genomic and metabolomic measurements from the responding cells in most of these assays. This allows assessments of the common and differential aspects of the transcriptomic/metabolomic response in the various systems, and will help us discern where the measurements provide complementary or redundant information.
The overall functions of this Core are to:
Provide general statistical support for the three research projects, Irradiation Core, and the pilot projects, in terms of experimental design and data analysis.
Apply data viewing and analytical distribution testing to identify univariate trends among potential genomic or metabolomic biomarkers in cells as they react to radiation, to find independently informative, radiation damage biomarkers that may be used to gauge the level of severity of damage for a given individual exposed to a particular dose and type of radiation.
Apply multivariate analysis to identify groups of genomic or metabolomic biomarkers that act collaboratively to carry out the cellular response to radiation.
Apply contextual analysis to develop genomic or metabolomic biomarker panels that are specific for particular cell types, dose scenarios, or population subgroups.
Provide a common secure data hosting facility so that the considerable amount of data to be shared due to the use of shared biological materials can be readily exchanged.
We have developed the ExPattern (ExP) software to analyze genomic and metabolomic data in conjunction with clinical data such as radiation responsiveness, patient survival and patient reactivity to a stimulus. It is based on a novel algorithm, cellular context mining, to identify molecularly homogeneous patterns with strong clinical association. The program is written in Java to support multiplatform use including Windows, Unix/Linux and Mac OS. Currently, it can handle data sets with more than 20,000 molecular and clinical markers to sort through complicated molecular patterns and identify those with particular clinical and gene pattern statistical significance. Analytical results are presented to users as a list of molecular patterns, the relations among those patterns in graphical format, and the regulatory relations among molecular and clinical markers.
This approach to complex data analysis is based on a model of the decision making process of cells. When biological regulation is altered either normally or in a way that results in pathology, cascades of consequences driven by the underlying regulatory system logic result. For example, exposure to ionizing radiation, the mutational activation of the ras oncogene, or reduction in p53 activity due to mdm2 gene amplification all produce or allow particular changes in the levels of transcripts, proteins and signaling-related protein modification in each of the samples where they are present. The set of alterations in genes and proteins influenced by the new regulatory setting provides a pattern that can be recognized and exploited using tools that discern these patterns.
The analysis proceeds in three steps, as illustrated in Figure 1. First, all data is translated into either discrete-valued or categorical variables. This step allows all the variables to be analyzed in the same fashion. For simplicity, the figure shows all data as being in two states or categories, however any small number of states would be suitable, for instance, a range of exposure doses of ionizing radiation. Next, the matrix of samples and feature (gene) behavior for those samples is examined to find sets of samples and features where each feature behaves in a very homogeneous fashion. Estimates of the probability of a given feature displaying the observed homogeneity over a fixed set of samples of any size can be estimated as a hypergeometric distribution likelihood, and improbable distributions can be aligned to find sample/feature subsets where there are blocks of homogeneous behavior. The improbability of the observed block is estimated by an ad hoc statistic by evaluating the observed frequency of homogeneous blocks of x genes and y samples in simulations randomly producing the same distribution of feature behavior. Multiple features showing coherent behavior over multiple samples become improbable very quickly. The power of considering the overall improbability of many features behaving homogeneously over a number of samples makes this method quite sensitive to such indications of collaborative action. Behavior that is not present in a large fraction of samples out of the whole set surveyed, and that is thus too dilute to be detected by more conventional approaches, such as linear discriminant analysis, can be easily recognized.
As an example, in our analysis of a set of 31 melanomas, a relatively large block of samples showing differential behavior is discovered by univariate analysis (Figure 2), while a smaller block of samples with more genes involved is recognized by multivariate contextual analysis (Figure 3). This method is also able to detect the kinds of gene behavior common in many biological cascades. For instance, it is common for the same gene to be regulated by different upstream influences in different samples or different individuals. In the case of genotoxic signaling, there are many well-characterized examples of this tendency, such as over a thousand published articles examining responses that can be mediated either through p53, or through other regulatory mechanisms in the absence of functional p53.
While the mathematics behind the context analysis model are somewhat complex, and the computations required to fully explore the combinatorial space involved are quite large, the meaning of the observed blocks of coherent behavior in samples is extremely intuitive from both the biological and decision process perspective. Any set of normal, differentiated tissues exhibits many such blocks that partition the samples according to the consequences of the different regulatory settings in play in each tissue type, making this analysis approach well suited to identification of radiation dose-related gene expression signatures in the peripheral blood.
The program has been expanded in ways that improve presentation to the researcher, allow more explicit searches for specific types of patterns, and allow more direct connection to databases where functional characterization of the genes is available. To help the investigator interpret the results, the program can connect to various public biological data repositories such as NCBI/Entrez and PubMed. It also supports connections to a popular Gene Ontology mining tool, GOMiner, to help identify subsets of genes within a given pattern that can be associated with particular cellular functions or locations.
Translational Genomics Research Institute, Phoenix, AZ
University of Bern, Bern, Switzerland
website updated 07/27/2011
Home| Cytogenetic Biodosimetry | Functional Genomic Biodosimetry | Metabolomic Biodosimetry
Irradiation Core | Fabrication Core | Informatics and Biostatistics Core | Contact