|  |
 |
Table of contents:
|  | HTML |  | PDF |
This article:
|  |
HTML
|  | PDF | DOI: 10.1147/rd.506.0645 | Copyright info |  |
 |
 |
Model-based design approaches in drug discovery: A parallel to traditional engineering approaches
|  |  |
by B. Schoeberl, U. B. Nielsen, and R. Paxson |
|
|  |
 |  |  |
|
| |
|
One of the promises of systems biology is that the systems-level understanding of biological pathways and processes will allow for “smarter” drug development, as described by Aksenov et al. [1], Apic et al. [2], and Nielsen and Schoeberl [3]. The optimal therapeutic approach and drug design are selected on the basis of a detailed understanding of the drug mechanism of action, which results in targeted therapeutics with higher efficacy and fewer side effects. To achieve this goal, there is a need for model-based drug design that allows for the rapid in silico identification of the “optimal” target and identifies the best mechanism of action (MOA)—e.g., small molecule compared with antibody therapy, optimal inhibitor affinity, target ligand or receptor, possible induction of receptor internalization and degradation, and so forth.
In broad terms, successful model construction and rapid application by research teams requires systematic methods of building, annotating, analyzing, maintaining, and sharing these mathematical models. Further, successful models must bridge a virtual organizational gap by facilitating their use by experimentalists as well as by model creators.
Mathematical models of dynamic systems have long been used in the automobile, chemical, and other industries to support model-based component design and plant control. The day when the pharmaceutical industry can test a drug against a computational model with a high degree of confidence may arrive in the not-too-distant future. The computational approach is building momentum in the pharmaceutical industry and is already beginning to yield results. Mathematical models of relatively small signal transduction or metabolic networks could be the first building blocks in the systems integration of many models into a more comprehensive, higher-fidelity model—possibly even a multicell, full-organ, or full-body model.
There are methodologies and tools that support model construction, parameterization, aggregation, validation, and analysis. The automotive, chemical, semiconductor, and aerospace industries have been leaders in the application of model-based design (MBD) tools and have experienced significant improvement in design and shortened product development cycles based on their use. However, there is a key fundamental difference between modeling biological systems and modeling engineered systems: the former is a reverse-engineering task.
Here we examine the applicability of mathematical modeling methodologies and tools to the drug development process. Although the modeling approaches used in MBD have previously been applied in other industries, the challenges faced by drug companies using model-based design are more complex. One implication is that most of the kinetic parameters and species concentrations are unknown. More generally, the challenges include the extreme “stiffness” of these biochemical models, the high number of undetermined parameters, the difficulty of use for modelers and experimentalists, and the fact that model documentation and annotation based on prior knowledge from literature requires specialized software that combines the methodology and knowledge gained in other areas.
| |
|
In the early 1990s, industries such as automotive and aerospace began to use software tools that facilitated the construction and application of models. More than a decade later, a complete MBD methodology has become standard for the construction and application of models in various design and development activities.
Figure 1 shows a model of an automobile transmission design using MBD. The model is composed of submodels that represent the vehicle, transmission, engine, and controller, which itself consists of the shift logic and shift point (depicted as threshold) calculation. On the basis of the driver's input (i.e., brake or accelerator), the model can be used to optimize system performance by shifting gears to maintain the operation of the engine at maximum fuel efficiency. Once the desired performance is obtained, the controller (shown in the figure as the shift logic and threshold calculation blocks) can be implemented automatically via code generation and tested, validated, and deployed on an embedded system.
Figure 1
In the case of model-based drug design (MBDD), the objective is to identify the protein that will be targeted by the novel drug and its best MOA. In contrast to the top-down approach typically used in engineering, a bottom-up strategy is favored in the early stages of the drug design process. The critical networks implicated in the disease of interest are initially identified and reverse-engineered (pathway capture). Well-known signaling or metabolic pathways can be reconstructed on the basis of literature knowledge, and network inference methods (e.g., Bayesian networks) can be applied to reverse-engineer protein interaction or gene networks from large datasets of protein or gene datasets, as elucidated by Kholodenko et al. [4], Sachs et al. [5], and Needham et al. [6]. The resulting model can then be used to identify drug targets and determine the optimal MOA. After an initial validation with wet-lab experiments and model tuning, the submodel can be used in the assembly of a more encompassing model that accounts for pharmacokinetic and pharmacodynamic effects (PK/PD), as depicted in Figure 2. The final model, encompassing both signaling pathway models and PK/PD models, can later be applied in the clinic to identify responders (patient stratification). However, all-encompassing models for MBDD, such as the one shown in Figure 2, remain only a vision. Submodels such as PK/PD models and signaling pathway models are already being applied in the pharmaceutical industry [7, 8].
Figure 2
The MBD and MBDD design processes are shown in Figure 3. The first step in the construction of a model is to gather the model requirements. It is important to define the questions that the model should address and the information available to support the model construction. Throughout this activity, the model functions as a knowledge aggregator by accumulating the collective information known about the physical system and its decomposition into subunits. The model may be incomplete at this point, but through an iterative process between model simulations and experiments, the knowledge gaps can be filled during the design process. In Figure 3, the highly iterative process is depicted by double-ended arrows comparing the model with experiments in each step of the MBDD. This is in contrast to the traditional process of gathering requirements in written specifications and building simulations by writing software code. While written specifications serve as static pieces of information for the various stages of development, the model aggregates information and requirements across teams.
Figure 3
It has been well established that the MBD methodology has been successful in facilitating large-system design and in reducing development cycles. It has proven useful in concentrating system knowledge on a single model and in facilitating model aggregation through the reuse of component models. A comparison of activities in the engineering and the drug development processes, as summarized in Table 1, outlines ways in which the methodology can be applied and highlights areas in which new tools are needed to increase the applicability of MBD to problems and challenges in drug discovery.
|
| Table 1 Comparison of model-based design methodologies applied to engineering and to biology and drug discovery. |
|
|
|
|
|
| | Engineering | Biology and drug discovery |
|
| Modeling strategy | Forward-engineered and designed | Reverse-engineered and inferred from literature |
| |
| Modularity and hierarchy | Modules are part of the design architecture and have clear input and output characteristics | • Modules are difficult to ascertain initially • Modules are discovered via model analysis |
| |
| Model aggregation | Submodels are plug-and-play because of the clear application programming interface | Variability of inputs and outputs according to the comprehensiveness of the model makes model aggregation from submodels difficult |
| |
| Level of fidelity | Level of fidelity is increased as needed to reach design goals | Key components initially require a high level of fidelity |
| |
| Interdisciplinary level | Modules are typically built by specific engineering teams; control engineers work on control logic, while mechanical engineers work on vehicle model | Interdisciplinary teams are required throughout the design process because of the need for high fidelity |
| |
| Modeling goal | Robust controllers and code generation for verification with embedded systems | Plant model that is able to determine the best MOA and best drug target |
| |
| Model parameters | Most parameters are directly measurable | Many of the model parameters are typically unknown and estimated with experimental data |
| |
| Analysis | Nonlinear models are used, but linear analysis methods are established and widely applied | Nonlinear systems analysis tools, bifurcation, sensitivities, global optimization |
| |
| Graphical representation | Well-established notation | No standardized notation yet |
|
| |
|
A first step in system architecture is to specify its hierarchical decomposition into submodels and to specify their interconnectivity. The modular approach allows large and complex systems to be modeled in a more manageable and transparent fashion. In addition, each submodel can be individually built, validated, integrated, and tested in the full system.
Using an automobile transmission as an example, the initial model can be built with rough (low-fidelity) submodels for each component, e.g., the engine, transmission, and controller, all of which have well-defined input and output characteristics. During the development process, specialized teams improve and refine the submodels individually, increasing the fidelity of the overall model. Each submodel encapsulates the current state of the art and is updated with new knowledge or technology as it becomes available.
Because a submodel encapsulates a distinct part, it can be reused. For an automatic transmission, the controller unit can be used in the development process of another vehicle. The clear definition of the input and output characteristics of each component is a prerequisite for the “plug-and-play” approach to model building.
Because biological systems are characterized by large and complex networks, the modular approach appears to be appropriate. As depicted in Figure 2, this approach appears to be applicable on the macro level (tissue–organ–body) because the model components can be identified by defined inputs and outputs [9–11]. However, on the micro level, where the models account for protein or gene networks, it becomes difficult to break the system down into modules. This is the main difference between applying MBD to engineering and to biology.
To meet drug design goals, a high-fidelity model of a cancer cell and the tumor environment is needed, but little is known about these systems. Therefore, it is difficult to determine where low-fidelity models would be adequate. To learn how to adequately approximate the system behavior, a high-fidelity model is initially required. Currently, high-fidelity models are obtained by reverse-engineering protein or gene networks. However, depending on the design goal, very detailed submodels might still be needed to appropriately model the active mechanism of a particular drug.
In contrast to the engineering case, in which specialized teams work on submodels, in MBDD interdisciplinary teams are needed throughout the design methodology because of the difficulty of defining modules in protein or gene networks and achieving the required high-level fidelity obtained by gathering experimental data and continuously refining the model.
Unlike engineering models, biological submodels do not have clear input and output characteristics; therefore, model reuse and aggregation becomes more complex. Rather than plug-and-play, the relationships among submodels must be manually edited after several models have been fused. For example, the mitogen-activated kinase (MAPK) cascade submodel is ubiquitous in many signaling pathways, but the nature of the interactions between it and other submodels—e.g., the phosphoinositide 3-kinases (PI3Ks) cascade—differs from pathway to pathway in connectivity and strength. Instead of plug-and-play, a strategy that uses a rule-based scheme to create the crosstalk interactions between the different submodels appears to be more adequate.
An additional complication of model aggregation arises from the multiple occurrences of network motifs (e.g., the same kinase or phosphatase) in different signaling pathways. Thus, the combination of several signaling pathway models is complicated by uncertainty as to whether these network motifs use the same protein pools or are spatially separated by, for example, scaffolding proteins. This is reflected in the choice of initial species concentrations (single or multiple pools) and the value of the kinetic parameters. Experimental data addressing both issues is needed in order to parameterize the model appropriately and to obtain high model fidelity.
As opposed to engineering models, in which most parameters can be directly measured, kinetic parameters in biological systems must often be inferred from large and “noisy” datasets. This step in model refinement is done with parameter estimation; however, because of the large number of unknown parameters and the uncertainty in their values, global optimization techniques are needed [12]—more so than in engineering.
| |
|
Existing techniques such as identification, gain quantification, sensitivity analysis, and optimal control are well-developed areas of control theory commonly applied in engineering. For engineering systems, even though the models are nonlinear, the operating regions are generally known. This allows the system to be linearized and analyzed using linear methods. Because of the lack of well-characterized operating points in biological systems, nonlinear analysis methods must be used, and tools must be developed that can conveniently facilitate the exploration of model behavior across parameter space.
Sensitivity analysis is used as a bootstrap method during model tuning to determine which parameters should be estimated from experimental data. Given that these represent the most sensitive nodes of the network, the same analysis helps to identify the best drug targets in the system [13, 14]. Because of the broad range of its parameter values, the nonlinear system can display a variety of stable and unstable behaviors. Often bifurcation analysis is used to reveal regions of stability or instability in the network [15]. This is helpful because experiments can be specifically designed to observe interesting nonlinear phenomena predicted by the model. For instance, Hoffmann et al. [16] have developed a computational model that accounts for the temporal control of NF-κB activation by the coordinated degradation and synthesis of I-κB proteins. The activity of NF-κB is controlled by three different isoforms of IκB: , β, and . The model predicted that IκB is responsible for the fast turn-off, and IκBβ and IκB function to reduce the system oscillatory potential and stabilize the NF-κB response. These model predictions were experimentally confirmed, and Hoffmann et al. showed that gene expression specificity is achieved by the signal characteristics (i.e., persistent vs. transient NF-κB signals). This example shows how bifurcation analysis serves both to validate the model and to facilitate learning about unexpected system behavior.
However, it is foreseeable not only that engineering principles can be applied to biology, but that entirely new theoretical problems arise when questions in the field of systems biology are addressed with traditional control theory [17]. It may be that in the long run, MBD as it is applied to biological systems may advance the use of MBD in other industries, and vice versa.
| |
|
The graphical language used to describe the automatic transmission model example (Figure 1) consists of signals flowing through the lines in the diagram between the various system components, represented by blocks. The mathematical operations represented by the blocks and the data carried by the signals together represent an unambiguous mathematical model of the system (such models typically result in large systems of nonlinear ordinary differential equations). In addition, the models are typically hierarchical. For instance, in Figure 1 the block representing the state machine for the shifting logic itself encapsulates lines and blocks that perform its function. Thus, the modeler can quickly build models by graphically adding blocks that contain underlying mathematical models. This speeds up the model-building process and makes it less error-prone. The semantics of the graphical language are well defined and widely used in control engineering. Other engineering domains (e.g., mechanical, electrical, and hydraulic) also have well-defined graphical notation that is useful in specifying models and communicating model architecture. An important aspect of these semantics is that they are capable of fully and unambiguously describing the mathematical model underlying the diagram.
Biologists use less standardized symbols to describe biological pathways—i.e., lines are used to represent interactions between species, and blocks are used to represent species and their concentration. Combinatorial complexity arising from protein complex formation is not accounted for in traditional graphical representations. Conventions are beginning to appear (e.g., those published by Kohn [18], Maimon and Browning [19], and Kitano et al. [20]) that attempt to uniquely determine the interaction type and therefore possibly the underlying mathematics. However, these notations are still far from the standardized representation used in engineering disciplines. Current graphical representations of pathways help interdisciplinary teams to communicate but do not provide an executable model that can be used for system simulation.
The granularity and size of protein and gene networks, in addition to the fact that they are reverse-engineered from large datasets, call for automatic graphical layout. Similarly, such models are often composed of hundreds to thousands of species, making manual graphical construction and representation of the model almost impossible.
| |
|
About ten years ago, the first mathematical models of signal transduction were published. These began with subsystems in order to obtain a quantitative understanding of processes such as receptor trafficking [21], the signal transfer behavior of the MAPK cascade [22], or larger models of signaling pathways [13, 23, 24]. The success of model construction and parameterization was demonstrated by their experimental validation and created interest in the drug discovery industry.
With the sequencing of the human genome and an increased knowledge of the individual proteins that make up cellular pathways, the pharmaceutical industry has focused increasingly on individual molecules as targets in their quest for targeted therapeutics. However, this approach has not yet resulted in the delivery of the first systems biology drug, nor has it increased the number of drug approvals or reduced the toxicity of drug candidates. It is becoming evident that putative protein targets for drugs must be understood in the greater biological context in which they are active—molecular interactions, biochemical pathways, cellular compartments, tissues, and organs. To do this effectively, scientists must embrace the notion that biology is increasingly an engineering-based science and should take advantage of the proven tools developed in other engineering-based disciplines. The application of computational and engineering tools to life sciences research and drug discovery is beginning to provide a detailed quantitative understanding of the interactions occurring within biological systems; it promises to lead to the development of novel, targeted drugs offering improved efficacy and safety.
| |
We thank Tad Stewart at Merrimack Pharmaceuticals for his critique of this paper.
| |
|
Received November 7, 2005; accepted for publication January 13, 2006; Published online September 15, 2006.
|
|