Project
IBM Research Homepage 
 Research Home  >> Systems Management   >>   Projects

About
Projects
People
Presentations
Publications


SYSTEMS MANAGEMENT Group
Our Projects

Dependencies in Distributed Service Management
The identification of dependencies between distributed services and their components has become increasingly important in today's networked environments because applications and services rely on a variety of supporting components that might be outsourced to a service provider. Failures occurring in lower service layers affect the performance and availability of end-to-end services that are offered to customers.
From an operational point of view, performance degradations and failures need to be detected and resolved quickly, and efficiently. Solving this problem requires the determination and computation of dependencies between services and applications. One, therefore, has to deal with questions such as: what are the important characteristics of dependencies? In other words, when a managed entity X, such as a service or resource, depends on another managed entity Y (X is then termed dependent and Y antecedent), what are the properties of such a dependency that need to be recorded? How can we classify dependencies such that they can be used more efficiently to do root cause or impact analysis in fault management?
The notion of dependencies can be applied at various levels of granularity: For example, threads within a running application may be dependent on each other's operational output; a stored procedure within a database management system may be dependent on a lock administrator, etc. The project does not consider such situations because there is a big difference between application management (focusing on application behavior observable from "outside") and application debugging (focusing on the internal behavior of an application). We consider only dependencies of the former type, i.e., that exist between different managed objects or components and, hence, are visible from outside an application.

Requirements 
Both service providers and customers require management tools that allow the navigation through the dependency hierarchy, in order to analyse and track down the root cause of a service failure. In addition, service providers are interested in tools to determine in advance the impact of a service outage on other services and users for scheduling server maintenance intervals (e.g., deploying backup systems when a production server has to be brought down for performing a software upgrade).

  • Root-cause Analysis
    • Determine a particular customers’ service status, 
    • Today: manual test of services by operator, service dependencies not explicitly specified 
    • Needed: “drill-down”, (automated) traversal of layers to find root cause 
  • Impact Analysis
    • Determine which services/customers are affected by problem
    • Today: Planned outages: Proactive warnings, Accidental outages: N/A
    • Needed: “drill-up”, determine potentially affected services/customers
  • Prerequisites
    • Make service dependencies explicit and available
    • Dependency model must allow upward and downward traversal 
Challenges 
The following issues complicate the discovery and representation of dependencies between service and application components:
  • Mapping between Service, Implementation and Runtime Process is essential:
    • obtaining the three parts individually is tractable
    • combining the three parts into a uniform model is challenging
    • finding an efficiently computable dependency model is very hard because of:
      • Scalability (# of nodes representing systems, applications and services)
      • Dynamics (maintaining up-to-date model reflecting the correct state in the lifecycle) 
      • Heterogeneity (operating systems, middleware, application types, communication infrastructures)
  • Today’s Applications are not “wired for Management”:
    • Incomplete and / or proprietary access to management information
    • Important Software Components are registered in Repository
    • Eliminate need for application-specific instrumentation
Goals
The project investigates the complexities of managing distributed, networked applications. The goals of the project are to develop technologies and associated algorithms that will enable the development of management applications being able to perform end-to-end problem determination and isolation, performance monitoring, and fault management.
To achieve this, several new concepts and technologies are required.  These include:
  • a model of distributed applications and services for ensuring their manageability based on the Common Information Model (CIM), standardized by the Distributed Management Task Force (DMTF).
  • a mechanism to model and express the functional dependencies and interrelationships between applications, networked services and resources, and business processes. 
  • a model reflecting the dynamic dependency relationships between different services; in addition, a management system should be capable to provide various mechanisms to select parts of a dependency model according to user–defined criteria.
  • an architecture for retrieving and handling dependency information stemming from various managed resources in a web–based environment, which can then be used by fault management applications.
  • the ability to remotely monitor and alter the behavior  of applications. In order to apply functionality common to different applications (start, stop, restart, suspend, resume, initialize, correlate parameters with time etc.)  It is desirable to define generic application management services. Successful deployment of these services requires an object-oriented, component-based infrastructure. 

 Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact