|  |
 |
Table of contents:
|  | HTML |  | PDF |
This article:
|  |
HTML
|  | PDF |
DOI: 10.1147/rd.506.0601 | Copyright info |  |
 |
 |
Multiscale biosystems integration: Coupling intracellular network analysis with tissue-patterning simulations
|  |  |
by S. M. Peirce, T. C. Skalak, and J. A. Papin |
|
|  |
 |  |  |
|
| |
|
The last several years have produced an explosion of biological data. The complete genome sequences of hundreds of organisms have now been published [1], and the activities of these genomes are being characterized with gene microarray technologies at a dizzying pace [2]. Furthermore, protein–protein interaction experiments, fluorescent microscopy, and mass spectrometry are generating data on how the corresponding proteins interact with one another. However, the efficient generation of large amounts of data (i.e., “high-throughput” data) alone does not drive further biological discovery. Additionally, the systematic integration and interrogation of the data often facilitates the discovery process [3]. Thus, there is an ever-growing need to develop quantitative, computational frameworks for analyzing the properties of these biological systems [4].
The elucidation of glycolytic reaction mechanisms (energy-generating metabolic processes) ushered in decades of work to characterize entire metabolic networks [5]. Similarly, the characterization of the lac operon genes [6] led to the characterization of regulatory networks [7]. The discovery of phosphorylated tyrosine residues and the functioning of G-protein-coupled receptors has resulted in an entire field of cellular signal transduction in which current efforts are generating extensive “wiring diagrams” that indicate relationships between components in signaling networks [8]. Quantitative reconstructions of some networks now exist [9]. While the high-throughput data on which these reconstructions are based depends on population averages, experimental systems are being created that characterize component interactions on a single-cell level [10].
Current biological research at the multicell level, or tissue level, has also emerged from quantitative approaches. The progression in scientific knowledge of cardiovascular physiology is an excellent example. One of the first quantitative hypotheses in this field related fluid flux in the circulation to the oncotic (swelling) and hydrostatic pressures that develop in tissues [11]. Later, a mathematical model of radial oxygen diffusion from capillaries into surrounding muscle was developed that predicted the maximum distance that oxygen can diffuse into the tissue and provided a theoretical framework for understanding capillary bed architectures in tissue [12]. More recently, continuum models have been used to describe vascular wall remodeling [13], and finite element models have suggested and validated new mechanisms of microvascular network function, structure, and growth [14, 15].
An ongoing challenge requires researchers to relate biological information across temporal and spatial scales, and computational approaches have proven beneficial in advancing scientific understanding. Two examples illustrate the multiscale modeling of biological processes. First, the pioneering work of Hodgkin and Huxley connected ionic current flow with neural function [16], and the complex interplay between collections of neurons and brain behavior is now the subject of further mathematical analysis [17]. Second, many molecular components of the immune system have been characterized and subjected to extensive computational modeling and analysis [18]. These examples and others require researchers to investigate components (e.g., ion channels) of targeted cellular and tissue function. One example of tissue function is signal propagation in the form of a cardiac action potential that mediates the beating of the heart.
However, in order to connect genomic information with tissue function, there is a need to associate properties of genome-scale intracellular networks with tissue-level patterning and behavior (Figure 1). (The term intracellular networks refers to systems of biochemical, biomechanical, and bioelectrical signals inside a cell that are transmitted by cellular constituents such as DNA, RNA, and protein.) This review presents recent efforts to make this connection using genome-scale network-analysis techniques and agent-based modeling approaches. Note that multiscale techniques based on continuum models and sets of differential equations are currently under development. First, recent work is presented on reconstructing cellular biochemical networks that integrate genomic, transcriptomic, and proteomic data in a mathematical formalism that can be used to generate quantitative descriptions of network properties. The subsequent section discusses efforts to generate agent-based models of multicellular development in vascular and embryonic tissue—models that have been extensively validated with in vivo experimentation. Finally, we outline the challenges and promises of integrating these disparate length and time scales in a quantitative framework for advancing medical care in diseases as diverse as cancer and cardiovascular pathology. These multiscale efforts attempt to build a conceptual and functional linkage between intracellular and extracellular processes for the prediction and understanding of tissue patterns and adaptations to environmental stimuli.
Figure 1
| |
|
The systems analysis of an intracellular network consists of two steps. The first step is the network reconstruction of the relevant chemical compounds and reactions. The second step is the analysis of this reconstructed network using computational techniques. These two steps are highly interconnected, and each process of sequentially rebuilding and analyzing the network can generate hypotheses for further interrogation.
| |
|
The reconstruction process involves the integration of various high-throughput experimental data, and each dataset provides only one perspective on intracellular mechanisms and activities [8]. For example, expression array data can indicate which genes are on or off at a given time point, but cannot indicate which protein products interact with one another. A reconstruction provides a framework for representing the collection of intracellular activities. Thus, the reconstruction of the gene network provides a concise collection of existing hypotheses for a given system.
There are three principal types of networks for which reconstructions exist. Metabolic networks consist primarily of reactions that convert substrates (e.g., glucose) into products for biosynthetic and energy demands [e.g., amino acids and ATP (adenosine triphosphate)]. Regulatory networks comprise the set of relationships between proteins and the genes they regulate. Signaling networks consist of the interactions between proteins and metabolites that together transduce extracellular signals into intracellular events. Although the interconnectivity of these three networks is becoming increasingly clear [19], they are typically segregated in standard textbook descriptions. Process of reconstruction
Each network involves a set of chemical transformations that can be represented as a set of stoichiometric reactions (i.e., reactions that are defined by a mass balance of reactants and products) in a matrix formalism (see Figure 2). The rows and columns of the stoichiometric matrix correspond respectively to compounds and their associated reactions. The elements of the matrix correspond to the stoichiometric coefficients in the associated reactions.
Figure 2
The first step in network reconstruction is to collect available high-throughput data for the system of interest. For example, the annotated genome sequence helps to identify which enzymatic reactions are available to a metabolic network. Expression arrays can indicate which genes are active in the cell type or system to be reconstructed. The reconstructions should be cell-type- or organism-specific where possible; otherwise, the subsequent analysis may not generate predictions that can be experimentally verified. Furthermore, the environmental conditions in which the data were generated should be well defined. For example, growth experiments in media with unknown constituents are difficult to incorporate in network reconstructions. It is difficult to analyze and generate quantitative predictions if the cell is exposed to an unknown variety of nutrients and growth factors.
The next step involves scanning the annotated genome and incorporating the associated stoichiometric reactions in a database or spreadsheet format. The use of the genome ensures that all reactions and activities can be accounted for as much as possible. In other words, by considering the entire genome in the analysis, researchers are taking into consideration all of the molecular components that could contribute to the system behavior. The outcome of this step is a well “curated” file with a list of the reactions occurring in the given system. This reconstruction then becomes the basis for further analysis. Significantly, the reconstruction itself is of tremendous interest, independent of the analyses that are performed on it. First, it provides a concise format for identifying what is present or absent in a given organism and can serve as a structured framework for refining annotations. Additionally, the stoichiometric reconstruction forces the researcher to ask questions that might otherwise be neglected. For example, in the reconstruction of a signaling network, we may identify genes corresponding to given receptor and adaptor proteins for a given signaling pathway. However, to generate a stoichiometric reconstruction, we may need to include information from mass spectrometry or yeast two-hybrid experiments to ascertain whether the given proteins are in monomeric or dimeric forms. The spreadsheet or database file generated for the reconstruction organizes the information that would otherwise be restricted to an individual researcher's knowledge base or to an unsorted collection of the literature. Stoichiometric reconstructions have produced a valuable new understanding of metabolic processes [20]. However, even without a stoichiometric level of detail for a given biological system, the process of reconstruction described above still generates immensely informative results [8]. Status of existing network reconstructions
The genome-scale metabolic networks of several microbial species have been reconstructed [9]. Certainly the most extensively reconstructed metabolic network is that of Escherichia coli [21]. To date, this reconstruction accounts for more than 1,000 open reading frames (i.e., DNA sequences that encode part or all of a protein) of the predicted 4,311 open reading frames in the E. coli genome. Additional metabolic systems have also been reconstructed; for example, glycolysis in the pathogen Trypanosoma brucei was reconstructed in order to understand its biochemical regulation in the bloodstream [22]. The regulatory network of E. coli was also reconstructed and used to predict the changes in gene expression and the resultant growth phenotypes under a variety of “knockouts” and environmental conditions [23]. (The term knockouts refers to experimental perturbations in which a gene has been functionally “knocked out” or effectively deleted through genetic manipulations.) The regulatory networks associated with developmental processes in sea urchins and other organisms were reconstructed and analyzed [7]. Perhaps the signaling system for which the most extensive reconstructions exist is associated with the epidermal growth factor receptor [24, 25]. Recently, the stoichiometric reconstruction of the JAK–STAT signaling network in the human B-cell was used to quantify structural and topological properties of the network [26]. (JAK stands for Janus Kinase; STAT stands for Signal Transducers and Activators of Transcription. The JAK–STAT pathway refers to a family of signaling pathways that are important for immune response.) Challenges to network reconstruction
The primary challenge to current reconstruction efforts is a lack of data for given biological systems, as well as the significant amount of effort required to generate and curate an organism-specific stoichiometric matrix. For example, a large-scale reconstruction is difficult to initiate without a sequenced genome, or without a knowledge of the genes that are active in a particular differentiated cell. Without the data describing the intracellular network, the network can be reconstructed with limited confidence. However, the low-confidence reconstruction is still a powerful tool, because it provides a starting point for making predictions and focusing research efforts to characterize the less-characterized components and interactions. The laborious process of reconstruction and testing may also be improved with automated efforts to integrate high-throughput data into a mathematical representation. Some efforts at automated network generation are emerging [27].
| |
|
The analysis of a given network is intimately connected to the reconstruction process. Computational analysis can generate predictions regarding growth rates of cells, the effect of gene knockouts, and the secretion rate of byproducts of biochemical reactions. These predictions can then be used to evaluate whether a given reaction should be included in the reconstruction. Thus, the analysis of a given network can lead to refinements of the network reconstruction itself. The example of the identification of the gene in Helicobacter pylori, a bacterium that causes gastric cancer, corresponding to malate dehydrogenase is described elsewhere [28]. Analysis techniques
Three general categories of analysis techniques have been applied to genome-scale reconstructions. Although the detailed implementation of these methods is beyond the scope of this review, the general approaches are described below.
The first analysis category involves approaches that identify single states (e.g., reaction flux distributions) of the cell subject to a given criterion, such as a criterion relating to the maximum growth rate of a cell. These approaches address, in part, the challenge of accounting for uncertainty in model parameters and experimental data by mathematically stating bounds that “confine” a network (e.g., thermodynamic constraints) and then identifying an “objective” that the cell is optimized to achieve. An example objective is a flux distribution that results in the maximum growth rate possible. These techniques include flux-balance analysis (FBA) [29], minimization of metabolic adjustment (MOMA) [30], and energy balance analysis (EBA) [31]. The input to each of these approaches is the stoichiometric matrix for a given system, a set of constraints on how the reactions are used (e.g., maximum substrate uptake rates), and an objective function (e.g., maximum growth rate). The output of each of these approaches is a distribution of flux values for all of the reactions that effectively provides a “state” of the given network. Because the calculation of these distributions is typically performed with linear programming (a mathematical technique for finding the maximum value of a given function subject to a set of mathematical criteria [32]), the calculation can be performed efficiently for relatively large biochemical networks.
The second type of analysis techniques involves network-based pathways, and these techniques have been applied at a genome scale. These approaches involve the calculation of “mass-balanced” pathways through a network (in which stoichiometric relationships are not violated). Network-based pathways have been calculated for metabolic networks [33] and have led to quantitative descriptions of several network properties (described below). More recently, this network-based pathway approach has been applied to a cellular signaling network [26].
The final analysis category includes Monte Carlo sampling procedures. Typically, the values for the reaction fluxes are varied, subject to upper and lower bounds of the flux through a reaction, and network properties are subsequently characterized. For example, researchers have implemented a sampling algorithm to characterize the distribution of fluxes in the metabolic network of the bacterium Escherichia coli [34]. The distribution of reaction fluxes was found to be highly uneven in situations in which a small number of reactions dominated a significant portion of the network activity. Random sampling was also used to calculate the dynamics of large-scale reaction networks [35]. Systems-level properties
Each of these analysis techniques can be used to generate systems-level descriptions of key cellular phenotypes and network behaviors. We provide two examples below.
Correlated reaction sets are groups of reactions that function together. They have been defined as unbiased modules (groups of reactions or components with a common function) in biochemical networks [36]. Researchers have identified these reaction sets in the JAK–STAT cellular signaling network in the human B-cell [26], as well as in the metabolic networks of a variety of microorganisms [37–39]. Because correlated reaction sets are groups of reactions that are functionally related, they provide information for hypotheses for regulatory structure. Indeed, expression array data indicates that the genes corresponding to these groups of reactions are coordinately regulated (see [36] for a description).
The term pathway redundancy refers to the number of independent routes for which a given set of inputs into a network are converted into a given set of outputs out of the network. This system property was quantified with extreme pathway analysis, a computational tool to study biochemical networks [40]. For the bacterium H. influenzae, an average of 46 independent routes exist to convert minimal nutritional media into amino acids and associated byproducts [40]. For the bacterium H. pylori, under relatively similar conditions, an average of two independent routes exist [41]. Although the relative sizes of the genomes and associated metabolic network reconstructions are similar for both organisms, a significantly higher degree of pathway redundancy exists in H. influenzae. This characteristic may be indicative of the ecological niches in which the two organisms reside. For example, H. pylori is located primarily in the highly acidic gastric lining; H. influenzae can be found in environments that undergo a wider variability. Thus, perhaps the metabolic network of H. pylori is “fine-tuned” to its very specific ecological niche.
| |
|
Biological tissue is a composite of cellular and acellular material arranged in structures and substructures whose organization is often characterized by a repetition in patterns. The regularity of tissue patterns persists as long as the tissue is in a dynamically stable or a quiescent state. However, the structural and functional properties of tissues can be altered during physiological growth and pathological events, and often these events lead to changes in the arrangement of cells and acellular material, giving rise to new tissue patterns.
Biological tissues are considered “complex systems” because of their multifaceted composition and the interrelated spatial and temporal dynamics that characterize the way in which tissues adapt (e.g., alter their structure and function) to external stimuli. For example, in humans, tissues adapt to changing environmental conditions, changes in lifestyle, or the onset of disease. Understanding tissue patterning processes, particularly processes that arise in response to disease, is central to many areas of medicine, and advances in this area can contribute to the engineering of artificial tissues; the optimization of pharmacological therapies for heart disease, diabetes, and cancer; decreasing the incidence of birth defects; and more informed use of medical diagnostic techniques such as magnetic resonance imaging and ultrasound. Reaching the ultimate goal of understanding the basic science of tissue patterning, and being able to manipulate relevant processes in disease detection, treatment, and resolution, requires the following elements: 1) identification and characterization of the molecular, cellular, and acellular components of the tissue; 2) conceptualization of the way the components interact with one another; 3) spatial and temporal integration of cellular behaviors with the molecular signals that drive them; and 4) simulation with quantitative computation of these events. We believe that useful advances can be made only with the help of computational techniques that effectively consider the biological complexities across multiple length scales—from molecules to tissue-level behaviors.
Agent-based models (defined in the following section) have recently been used to study an array of different biological processes spanning molecular to organism to population levels of detail, and include research into DNA sequence evolution [42], ion channel electropotentials in heart arrhythmia [43], cell proliferation of contact-inhibited cells [44], avascular tumor growth [45], granuloma formation during M. tuberculosis infection [46], and the organization of rocky mussel beds on intertidal shores [47]. Here, the focus is on the use of a model in the area of tissue patterning. In the following section, we 1) provide an overview of the general concept of agent-based modeling, 2) review agent-based models that have been used to study different tissue-patterning processes, 3) describe how tissue-level properties can be elucidated from agent-based modeling approaches, and 4) outline the challenges of using this modeling technique to study tissue patterning.
| |
|
Agent-based simulations assume that local interactions of autonomous members of a population (i.e., agents), give rise to global, or emergent, phenomena. Example agents used in models from past research include immune cells in tuberculosis studies. Each agent is programmed with rules that govern its behavior. In many studies, researchers permit the agents to cooperate or compete. Agents also sometimes navigate the surrounding environment, whose properties can vary over space and time. The underlying philosophy of this “bottom-up” modeling technique is that relatively simple rules for agent interaction can generate complex systems-level outcomes observed in the real world. This concept is highly compatible with the biology of tissue patterning, in which local cell–cell interactions are considered to be of primary importance in generating cellular responses from which emergent tissue-level properties arise. Agent-based simulations are useful for understanding tissue patterning because the simulations are computationally efficient, allow for spatially heterogeneous cell behaviors, and mimic the autonomy with which biological cells interact. Agents and state variables
Agent-based models are composed of individual agents that have individual attributes, or state variables, as well as the ability to change their state variables within a range of finite values. In biological models, state variables can represent phenotypic states, protein expression levels, a proliferative state, a migratory state (a state in which the full complement of proteins required for the cell to move is expressed), or any other cell “behavior” or environmental condition that may affect the cell, such as local extracellular matrix composition or diffusible growth factor concentration [48, 49]. At discrete time points, agents can change states in parallel with one another by interacting with their neighboring agents within the computational framework [50]. In agent-based modeling of biological phenomena, a single agent frequently represents a single cell [44, 48, 49], and agent-based models may be composed of thousands of interacting agents. However, it is important to note that depending on the biological process being modeled, agents can be defined to represent animal populations (in ecological studies), molecules, or subsets of tissues components. The state variables for each individual agent can be recorded at discrete time points, enabling tracking of the state history of an individual agent. In this way, the phenotypic state (expression levels of a particular protein by a particular cell) of a simulated cell can be monitored over time and correlated with other dynamic cellular or environmental changes within the context of the entire simulated tissue (Figure 3). In Figure 3, the phrase concentration of extracellular matrix refers to the concentration of the protein that constitutes the extracellular matrix, such as collagen, elastin, or fibronectin. Expression of protein-A refers to the concentration of a different protein, for example a diffusible growth factor in the extracellular space.
Figure 3
Spatial distribution of agents
Agent-based models can allow agents to operate within the confines of a one-, two-, or three-dimensional space that is divided into a discrete array, such as a grid represented by adjacent pixels. In these cases, each grid square is like a cell in a table; it is annotated with coordinates to describe its location and corresponds to an actual geometrical location in the modeled system. Subsets of agents can be programmed to move within the defined space, transitioning from pixel to neighboring pixel with time (Figure 3). Agent rules
Agents are governed by rules that define initial conditions, boundary conditions, agent–agent interactions, and agent–pixel interactions. Individual agents can be prompted to respond to rules while taking into consideration their own state history and that of neighboring agents and pixels. Thus, agent-based models are deterministic in that individual agents follow rules that prescribe how their state will change in the next time step given information relating to the current time step. Rules, such as those listed in Table 1, are often obtained from real data and greatly influence the predictions made by agent-based models. Therefore, it is critical that the rules be accurate and used with appropriate spatial and temporal scaling that is suitable both for the computational framework and for the simulated biological processes.
|
| Table 1 Example rules that govern subsets of agents that represent different cell types (endothelial cell and smooth muscle cell). Brackets indicate concentration of the enclosed reactants. (NO: nitric oxide; VE: vascular/endothelial; PDGF–BB: platelet-derived growth factor, isoform designation BB; TGF: transforming growth factor; pM: picomoles; Rbound: concentration of receptor protein bound to the ligand.) |
|
|
|
|
|
| Agent | Parameter | Rule |
|
| Endothelial cell | Steady-state production of NO | NO (pM/hr) = 10 |
| |
| β-adrenergic-receptor-mediated production of NO | NO (pM/hr) = 5 × (0.01 × β − Rbound) |
| |
| Expression of cell adhesion molecule VE–cadherin when contacting another cell | Expression (fold above steady state) = 2 × (number of neighbors) |
| |
| Steady-state proliferation rate | Cell doubling time (hours) = 2,000 |
| |
| Smooth muscle cell | Contraction in response to NO as a percentage of maximally dilated (relaxed) state | Percent contracted = [NO]2 |
| |
| Proliferation rate in response to PDGF–BB growth factor | Cell doupling time (hours) = 40 + 82.8 × (e−3.2[PDGF-BB]) − 0.15[PDGF–BB] |
| |
| Fold change in smooth muscle myosin heavy-chain expression in response to TGF–β growth factor | Expression (amount above steady state) = 0.73 × ln([TGF−β]) + 1.1 |
|
| |
|
Peirce et al. have published two agent-based simulations that address two different biological patterning processes: 1) microvascular patterning in response to mechanical and biochemical factors in the adult mammal [49], and 2) cell and extracellular morphogenesis during embryonic development [48]. These two tissue-patterning events are similar with respect to the individual cell and molecular components and processes that drive them, such as cell migration, proliferation, and apoptosis, but they differ in scope and complexity. The former simulation included the behaviors of more than 1,000 cells over a 14-day time period, while the latter was much simpler, involving only 200 cells and five simulated hours. Despite these differences, both simulations employed the same computational platform, NetLogo [51], modeled the tissue in two dimensions, represented single cells by single agents, defined the initial simulated tissue geometry in the simulation space using data directly from the analogous in vivo system that was being modeled, assigned rules based on independent data, and were validated through independent bench-top experiments.
The studies described here were among the first to demonstrate the physiological relevance of using an agent-based simulation to study tissue-patterning processes in vertebrate animals in vivo or in whole living tissues. Microvascular patterning
Patterning of the microcirculation in the adult animal occurs when existing microvessels (capillaries, arterioles, and venules) are stimulated to “structurally remodel” by growing in length or diameter, by sprouting into new branches, and by regressing. Such patterning takes place during naturally occurring processes, such as exercise, and in a number of pathological events, such as tumor growth, heart disease, and wound healing. Spatial and temporal coordination of cell behaviors caused by an array of diffusible molecular signals, or growth factors, is essential in orchestrating a properly patterned microvascular network [52]. In order to capture the multicellular “circuitry” of the tissue-level process of patterning in a blood vessel network, Peirce et al. [49] developed a quantitative agent-based computational simulation that was based on the integration of cell behaviors, which were independently reported by various researchers, and molecular mechanisms previously published in the literature. The simulation incorporated initial microvascular patterns of real tissues derived from small-animal studies. The tissue-level responses to two environmental stimuli were assessed: 1) network-wide changes in hemodynamic mechanical stresses, and 2) exogenous focal delivery of a pro-angiogenic growth factor, namely vascular endothelial growth factor (VEGF). (The term angiogenesis refers to the formation of new blood vessels. Focal delivery is delivery to a specific location in the tissue.) The agent-based model predicted increases in total vascular length and contractile vessel length at various time points after stimulation of 14 days of elapsed time. Predictions were verified by comparison with measured values obtained in analogous but independent in vivo studies. Thus, this work appropriately incorporated independent data (which described growth factor production and diffusion, as well as cellular proliferation, migration, and differentiation) in a computational framework that provided an accurate description of emergent vascular patterning phenomena.
This kind of simulation had value because it allowed researchers to identify a functional module of interrelated processes—a combination of molecular signals and cellular behaviors that give rise to tissue-patterning events observed experimentally. Using this model, researchers may systematically perturb the various component signals in order to identify drug targets for either enhancing vascular growth (in ischemic disorders) or limiting it (in tumorogenesis). Embryogenesis
Several researchers have described the cellular motions and molecular machinery that give rise to organized tissues during embryogenesis in the frog Xenopus laevis [53, 54]. As individual cells in stratified layers of the blastocoel roof (BCR) are intercalated during embryogenesis, the tissue thins and lengthens. (A blastocoel is the fluid-filled central cavity of the embryonic blastula.) Meanwhile, a fibronectin (FN) layer is deposited at the underside of the BCR as an organized pattern of extracellular matrix, and cell layers evolve. Researchers currently do not know the extent to which soluble growth factor signaling, distributed mechanical tension forces, cell-to-cell adhesion signaling, or other molecularly mediated aspects of the biology drive these behaviors. Better understanding of the dynamic interplay between the coordinated signals and responses that direct an appropriate tissue-patterning response requires a framework in which to compute the effects of these interacting factors over the relevant tissue geometry in space and time. Longo et al. [48] developed a simulation that accurately predicted the total time for BCR thinning of the Xenopus laevis embryo (approximately 4.5 hours) based on independently obtained cell migration rates and well-characterized nearest-neighbor cell-to-cell interactions (Figure 4). Verified by independent data from experimental studies, the simulations also predicted a temporal increase in FN matrix assembly on the underside of the BCR that resembles fibrillogenesis in the embryo. (The term fibrillogenesis refers to the development of fine fibrils normally present in collagen fibers of connective tissue.) When a multicell implant was placed in the simulated BCR, the simulation predicted accurate spatial dispersion patterns of the implanted cells when compared with those measured in the analogous in vivo intervention, namely the implantation of a plug of green fluorescent protein-labeled cells in an actual embryo.
Figure 4
| |
|
The development of the two simulations described above can be considered an initial success, as each contributed to the understanding of the tissue-patterning process in ways not achievable through experimentation alone; each simulation now serves as an analysis tool for testing future hypotheses in silico before valuable resources and time are expended on bench-top experiments. For many reasons, physical experiments with living tissues may not be adequate for supplying insight. For example, actual tissues frequently present technical barriers to the isolation and manipulation of individual variables without perturbing other variables that affect the system response. Even in controlled genetic manipulations in which one gene is “knocked out” (effectively removed), compensatory mechanisms exist that interfere with our understanding of the actual effect of a particular manipulation. However, using computational simulations, it is feasible to isolate and vary individual variables without interfering with other parts of the system. For example, the aforementioned microvascular patterning simulation was able to identify a functional patterning “module” (a unique combination of biochemical signals and cellular behaviors). This module consisted of four different cell types, four different cell behaviors, and three different growth factor proteins capable of quantitatively predicting vascular length increases and arterial formation, which are key aspects of vessel network patterning, in response to clinically relevant environmental stimuli. Future work is needed to assess the ability of the simulation to predict other relevant patterning metrics, such as vessel branching in the network. A computational tool that predicts tissue-patterning responses, such as increases in vascular length, is useful because the predictions can direct further research aimed at artificially “engineering” blood vessel networks with a desired vascularity—or designing therapeutic treatments (drug-delivery schemes) to alternatively enhance or limit vessel growth in different disease states such as tissue injury or tumor growth.
The model of frog Xenopus laevis embryogenesis was capable of making independent predictions about many aspects of the BCR thinning process, all of which were verified by direct experimentation. The simulation was also used to test a novel hypothesis: that epithelial cells differentially adhere to the FN layer in the BCR, and cell-residency time is proportional to FN deposition and fibrillogenesis in the BCR. When this hypothesis was incorporated as an agent rule, the simulation predicted accurate FN deposition, thus supporting this new hypothesis, which can now be tested in the in vivo system. The results of this work suggest that an agent-based approach can be especially useful in instances in which specific experimental strategies may not be immediately obvious or existing experimental techniques may be too crude to permit isolation and parameterization of individual key variables.
| |
|
The accuracy of the governing rules for agent-based modeling of biological systems has a critical impact on the predictive capability of the simulation. The modeler is dependent on the use of well-founded rules for agent behavior; however, if the rules are accurate and independently obtained from the current experimental study that is being used to validate the simulation, the predictions are likely to be valid because they are rooted in previously obtained experimental data and not generated only from theory. In practice, however, our ability to develop agent-based models for studying complex biological systems can be hindered by a lack of reliable raw data from which to generate agent rules. Researchers should also be careful not to make more assumptions than the minimum needed to describe the phenomena, because this may give researchers an incorrect understanding of the biological process and yield a computationally inefficient model that also produces incorrect results. Numerous questions can arise when considering simulations that employ agent-based models. For example, how does the modeler know when enough rules have been incorporated? Is the simplest explanation (e.g., one using the fewest rules) the most accurate one? Often modelers attempt to address these concerns by performing a parametric analysis of the state variables (variables that describe the physical attributes, or characteristics, of the agent) to identify key parameters, bottlenecks in the system, or outcomes that are particularly sensitive to possible variations in the relevant parameter settings. The term bottlenecks refers to rate-limiting biochemical reactions or biological processes. Furthermore, agent-based models are particularly prone to becoming unstable if rules do not provide adequate feedback loops or “stop” conditions. Thus, it is the modeler's responsibility to screen for and safeguard against such instabilities, which may contribute to errors in the predictions. Finally, despite the relative computational efficiency of agent-based modeling techniques, the large number of cells, interactions, pixels, and rules needed to simulate biological systems can undermine the feasibility of this approach. Despite these challenges, we believe that the use of agent-based computational techniques to integrate and compute genomic and proteomic information in the context of multicell tissues will expedite a basic understanding of biological patterning processes and the extent to which they can be manipulated for therapeutic purposes.
| |
|
| |
|
The challenges that face research efforts in cancer therapies emphasize the tremendous need to couple quantitative analysis of intracellular networks with tissue-level physiology. The idea of treating cancer as a molecular disease, as a function of malfunctioning intracellular proteins, has led to the development of effective drug therapies that use imatinib mesylate. Imatinib is an inhibitor of the enzyme bcr–abl kinase, which is constitutively active (always present) in chronic myeloid leukemia (CML) [55]. (The terms bcr and abl respectively stand for breakpoint cluster region and Abelson, the name of a leukemia virus that carries a similar protein.) The treatment of CML patients with imatinib has been tremendously successful in that it has extended life expectancy in roughly 70% of treated patients. Additional tyrosine kinases are potential targets for developing new cancer therapeutics [55].
The idea of treating cancer as a developmental, or tissue-level, disease has also made significant progress. It is becoming increasingly clear that cell–cell and extracellular interactions are critical components of tumor progression [56]. For example, as a tumor reaches a critical size, its growth becomes highly dependent on a blood supply to meet metabolic demands and overcome oxygen and nutrient diffusion limitations. A recently developed anti-VEGF (vascular endothelial growth factor) drug, bevacizumab, has demonstrated effective antivascular effects to inhibit tumor growth [57]. A similar drug called Endostatin** was used in animal studies to assess a large set of signaling pathways associated with its anti-angiogenic effects [58]. Genome-wide expression profiling, RT–PCR (reverse transcriptase–polymerase chain reaction), and phosphorylation analysis indicated highly interconnected roles of key proteins (such as NF–κB, STAT, TNF, AP–1), and other signaling pathways. NF–κB is a multisubunit transcription factor. STAT stands for Signal Transducers and Activators of Transcription. TNF stands for tumor necrosis factor, and AP–1 is a transcription factor.
Researchers consider the tissue-level and molecular-level components of cancer in order to develop therapeutics. Clearly, both perspectives will prove to be fruitful avenues for research into the progression of the disease. Tissue-level and molecular-level concerns also suggest a need for a multiscale approach that can quantitatively integrate molecular detail of intracellular networks with tissue-level analysis and experimentation. With such a coupling, researchers should be able to characterize the mechanism behind the anti-tumor-growth effects of a drug such as Endostatin.
| |
|
Coronary heart disease is a complex pathological condition because it involves both bottom-up (molecular) and top-down (tissue) mechanisms and interactions. It can be considered a molecular disease that results in adverse tissue-level behaviors, such as those produced by oxidation of LDL (low-density lipoprotein) cholesterol via free radical damage leading to the development of fatty streaks in the blood vessel wall. Tissue-level behaviors, such as alterations in blood vessel wall geometry due to atherosclerotic plaque formation, can reinforce malfunctions at the molecular level, such as increased levels of circulating pro-inflammatory regulatory proteins called cytokines. Thus, this particular disease is well suited to investigation by using a combined computational approach that incorporates molecular-level detail with cellular-level patterning within the tissue.
A common pharmacological treatment for early-stage coronary heart disease is the systemic administration of beta-adrenergic receptor blocking agents, commonly known as “beta-blockers.” Beta-blockers competitively inhibit the binding of adrenaline to beta-1 and/or beta-2 adrenergic receptors on cardiac and smooth muscle cells, which slows nerve impulses to the heart [59]. This decreases heart rate, contractility, and blood pressure [60, 61]. Despite widespread administration of these drugs in the clinic, the underlying molecular mechanisms by which beta-blockers mitigate heart disease are not well understood. Because a single molecule, such as a beta-adrenergic receptor–agonist, can elicit varying cellular responses in different interacting cell types that may be collaborative or competing (i.e., either beneficial in preventing heart disease or destructive and augmenting heart disease), the details of these interactions may not be obvious using experimental approaches alone. In these instances, identifying cause-and-effect interactions can prove to be a frustrating endeavor without using a tool to access and manipulate the individual components within the framework of the entire system. In a multiscale computational approach, however, the integrated and complex mechanisms underlying this accepted treatment may potentially be elucidated and individually assessed in the context of the whole system.
In future research, one may construct a network analysis of the signaling cascade that describes all of the events subsequent to adrenaline binding to beta-receptors, including the intracellular molecular details leading to cross-bridge cycling during muscle contraction in cardiac and vascular smooth muscle cells [62–64]. The term cross-bridge cycling refers to the molecular interactions that cause a muscle cell to contract or shorten, thereby providing forces to operate the muscle tissue. The direct effect of beta-blockers on vascular endothelial cell nitric oxide production may also be taken into account [65]. The network analysis may provide quantitative data describing both the contractile state of individual muscle cells and the nitric oxide production of endothelial cells given adrenaline–receptor and antagonist–receptor binding distributions. This information may be passed to a corresponding agent-based simulation that may preserve the input–output relationships of each cell-type-specific network while scaling up to the multicell tissue space. If, for example, the modeled tissue space were a portion of the ventricular wall perfused by a major coronary artery, the agent-based model would enable real-time computation of both localized cardiomyocyte contraction (which would directly affect metabolic demand) and vascular tone in the blood vessels that feed the tissue space (which would be affected by metabolic demand) (Figure 5). In Figure 5, the three downward-pointing arrows represent the binding of beta-blocker to beta-receptor for different phases of cardiac functioning. The “seesaw” in Figure 5 indicates that these two tissue-level processes (metabolic demand and vascular tone) are delicately balanced in the cells of the heart tissue, and functionally linked to each other, so that a change in one causes a change in the other. Labels at the bottom of the figure in red indicate responses to the beta-blocker, while labels in black indicate responses to adrenaline. Ultimately, the simulation may predict the interactions between beta-blocker modulation of blood supply (via vascular tone) and beta-blocker modulation of tissue metabolic demand (via cardiomyocyte contraction). When the physiological manifestations of both processes exceed a certain threshold, the metabolic demand will exceed the chemical supply, and an eventual heart attack may ensue.
Figure 5
Coupling the network-level interactions with multiple cells in an agent-based model would allow the investigator to intervene in the biology at one level of scale (e.g., to simulate a drug interaction), and the simulation would propagate that intervention through the single-cell and multi-cell scales up to the tissue level, where clinically accessible physiological results (cardiac output, heart rate, or blood pressure) could be correlated with patient data. The pairing of these two computational techniques would allow systematic isolation and alteration of individual parameters affected by beta-blockers, thereby providing spatially and temporally detailed insight into cause-and-effect properties of this complex system.
| |
|
Several challenges with respect to the coupling of intracellular network analysis with tissue-level physiology have been discussed above. Conceptual advances will be required to marry the available quantitative techniques with the analysis of biological systems that evolve and are very difficult to elucidate. A significant need exists to develop the computational infrastructure required for such sophisticated analyses.
Relatively few systems have been extensively characterized from the molecular and physiological perspectives. For example, the genome of the frog Xenopus laevis, a model organism for developmental biology and tissue patterning, has not yet been sequenced. Researchers must constantly assess whether the available data is “sufficient.” We can generate high-throughput datasets for many organisms under many conditions, but biological systems adapt to changing environments. This fundamental property motivates the need for novel modeling approaches that can account for the inherent flexibility in biology [66].
In addition, each high-throughput experimental technology merely creates “snapshots” of particular aspects of a cell, and researchers will likely always lack “complete” data for a given system. All datasets also have a level of inaccuracy associated with the particular technology being used. Computational techniques will have to account for this level of uncertainty in the biological system.
Other questions remain. For example, is there a need to describe all of the processes of the cell with network-level information before researchers can achieve reasonable predictive capabilities and worthwhile simulations? Or is it sufficient to make assumptions at particular levels of complexity in order to make useful predictions and generate novel avenues for therapeutic developments? The answers to these questions require multiscale network analysis.
The existing software and analytical tools used for intracellular network analysis and agent-based analysis have developed independently. To integrate these different scales in biology, a need exists for interfacing software and database schemas and ontologies that can seamlessly pass information between the interacting tools. In addition, tissue-patterning problems will require three-dimensional implementations of agent-based modeling techniques. Some intracellular network analyses are classified as NP-complete problems (a computer science term for describing problems that cannot be solved efficiently) and will thus likely require reformulations to generate useful results [33]. The computational infrastructure for scaling and calibrating spatial and temporal information will be required at the molecular- and tissue-level interface because molecular events can occur on the order of picoseconds while tissues can evolve over the lifetime of an organism.
| |
|
Researchers who wish to develop useful and realistic integrated modeling approaches to systems biology must design and specify practical deliverables for both basic scientific and therapeutic use. Reaching these goals may require more than one approach, and a simulation technique that is suitable for generating a mechanistic understanding of the basic science may not necessarily be of practical use in drug development. Building quantitative simulations that span spatial and temporal scales requires the researchers to consider the available data from various perspectives, and often the exercise of model building leads to insightful, novel hypotheses and reveals gaps in current understanding that must be filled.
To assess the potential of an integrated modeling approach, it is necessary to provide experimental validation of key model predictions. This validation is best accomplished by performing analogous interventions in both the computational and in vivo environments. Interventions will be specific to the biological system and will most likely be limited to what is experimentally feasible. For example, in embryogenesis studies, an intervention may consist of transplanting a genetically engineered patch of cells to a genetically normal and unmodified (i.e., wild-type) embryo and monitoring the activities of the mutant cells. Researchers must ensure that metrics obtained in each setting are analogous and require little or no scaling to enable accurate comparison between in vivo measurements and computationally predicted datasets. To ensure a reliable assessment of the model's validity based on existing data, it is also necessary to keep the results of experimental studies used for model verification separate from the simulation rules or inputs. In other words, the rules and parameters that comprise the simulation should obviously not be derived from the output of the experimental study that will ultimately be used to verify the simulation.
The results of agent-based simulations can sometimes suggest future biological experiments that test model results and that may eventually lead to new medical treatments. Even simple rules assigned to interacting agents can yield surprisingly complex behaviors that are difficult to predict when studying isolated components. Past studies have sometimes employed differential equations to help researchers understand cell behavior in an aggregate way. On the other hand, the agents of in-silico modeling make decisions in response to environmental parameters, and unusual activity of just a small set of cells can alter the overall system behavior in profound ways.
In conclusion, a central debate in biology has long revolved around a question that still stands today: Are tissue structures and functions determined strictly by precise information stored as genetic material, or do they develop as a result of interactions of cells and tissues with their environment? Valuable insight has been provided by gene knockout studies that remove the impact of a single gene and then determine the tissue physiology in its absence [67–69]. However, to fully answer these questions, researchers require a quantitative, computational framework that is capable of integrating genetic data with environmental stimuli in order to connect molecular mechanisms to tissue-level physiology and pathology.
**Trademark, service mark, or registered trademark of EntreMed, Inc. in the United States, other countries, or both.
| |
|
Received November 20, 2005; accepted for publication December 19, 2005; Published online June 27, 2006.
|
|