
|
 |

SYSTEMS MANAGEMENT Group
Our Projects
Dependencies in Distributed Service Management
The identification of dependencies between distributed services and
their components has become increasingly important in today's
networked environments because applications and services rely on a
variety of supporting components that might be outsourced to a service
provider. Failures occurring in lower service layers affect the
performance and availability of end-to-end services that are offered to
customers.
From an operational point of view, performance degradations and
failures need to be detected and resolved quickly, and efficiently.
Solving this problem requires the determination and computation of
dependencies between services and applications. One, therefore, has to
deal with questions such as: what are the important characteristics of
dependencies? In other words, when a managed entity X, such as a
service or resource, depends on another managed entity Y (X is then
termed dependent and Y antecedent), what are the
properties of such a dependency that need to be recorded? How can we
classify dependencies such that they can be used more efficiently to
do root cause or impact analysis in fault management?
The notion of dependencies can be applied at various levels of
granularity: For example, threads within a running application may be
dependent on each other's operational output; a stored procedure
within a database management system may be dependent on a lock
administrator, etc. The project does not consider such situations
because there is a big difference between application
management (focusing on application behavior observable from
"outside") and application debugging (focusing on the internal
behavior of an application). We consider only dependencies of the
former type, i.e., that exist between different managed objects or
components and, hence, are visible from outside an application.
Requirements
Both
service providers and customers require management tools that allow the
navigation through the dependency hierarchy, in order to analyse and track
down the root cause of a service failure. In addition, service providers
are interested in tools to determine in advance the impact of a service
outage on other services and users for scheduling server maintenance intervals
(e.g., deploying backup systems when a production server has to be brought
down for performing a software upgrade).
-
Root-cause
Analysis
-
Determine
a particular customers’ service status,
-
Today:
manual test of services by operator, service dependencies not explicitly
specified
-
Needed:
“drill-down”, (automated) traversal of layers to find root cause
-
Impact
Analysis
-
Determine
which services/customers are affected by problem
-
Today:
Planned outages: Proactive warnings, Accidental outages: N/A
-
Needed:
“drill-up”, determine potentially affected services/customers
-
Prerequisites
-
Make
service dependencies explicit and available
-
Dependency
model must allow upward and downward traversal
Challenges
The following issues complicate the discovery and representation of dependencies between service and application components:
-
Mapping between Service, Implementation and Runtime Process is essential:
-
obtaining the three parts individually is tractable
-
combining the three parts into a uniform model is challenging
-
finding an efficiently computable dependency model is very hard because
of:
-
Scalability (# of nodes representing systems, applications and services)
-
Dynamics (maintaining up-to-date model reflecting the correct state in
the lifecycle)
-
Heterogeneity (operating systems, middleware, application types, communication
infrastructures)
-
Today’s Applications are not “wired for Management”:
-
Incomplete and / or proprietary access to management information
-
Important Software Components are registered in Repository
-
Eliminate need for application-specific instrumentation
Goals
The
project investigates the complexities of managing distributed, networked
applications. The goals of the project are to develop technologies and
associated algorithms that will enable the development of management applications
being able to perform end-to-end problem determination and isolation, performance
monitoring, and fault management.
To
achieve this, several new concepts and technologies are required.
These include:
-
a
model of distributed applications and services for ensuring their manageability
based on the Common Information Model (CIM), standardized by the
Distributed Management Task Force (DMTF).
-
a
mechanism to model and express the functional dependencies and interrelationships
between applications, networked services and resources, and business processes.
-
a
model reflecting the dynamic dependency relationships between different
services; in addition, a management system should be capable to provide
various mechanisms to select parts of a dependency model according to user–defined
criteria.
-
an
architecture for retrieving and handling dependency information stemming
from various managed resources in a web–based environment, which can then
be used by fault management applications.
-
the
ability to remotely monitor and alter the behavior of applications.
In order to apply functionality common to different applications (start,
stop, restart, suspend, resume, initialize, correlate parameters with time
etc.) It is desirable to define generic application management services.
Successful deployment of these services requires an object-oriented, component-based
infrastructure.
|
|