Adaptive Systems Department

Our Mission

The adaptive systems department within the IBM T.J. Watson Research Center develops technologies and methodologies for managing change in computing systems, an area that is critical to IBM's efforts in on-demand computing. The scope is broad. Examples of changes include: subscriber overloads, software memory leaks, application deployment, and system re-purposing.

We have developed a considerable array of technologies to address these concerns. Our event mining work addresses, among other things, the discovery of actionable patterns as well as profiling to determine what constitutes normal behavior. Our generic adaptive control project has developed an array of technologies (with an emphasis on formal techniques from control theory and statistical modeling) that proivde for both regulation and optimization of computing systems (e.g., the Lotus Notes email server, the Apache web server,  database management systems, and multi-tiered eCommerce systems). More recently, we are applying planning and scheduling techniques to the problem of deploying and installing multi-product applications in a cost-effective way (e.g., minimizing service disruptions). Implied within these efforts is the need to have the supporting data and information models, such as CIM models that support sufficiently descriptive models of the computing infrastructure.


Topics

Below is a list of activities that we are currently pursuing:

 

Generic Adaptive Control. Achieving good performance in distributed environments requires adapting to changes in hardware, software, and workloads. The vision of this project is to develop a generic, adaptive agent for automated tuning of complex applications.  Our technical approach exploits techniques from control theory, applying them in real world settings so as to assess the value provided. Our results have been incorporated into IBM’s database product (DB/2), and work is underway with other products as well.

Change Management With Planning and Scheduling (CHAMPS). The large cost of owning computing systems is in part due to the difficulty of configuring them, a task that includes installing hardware and software, setting related parameters (e.g., in registries), and choosing compatible components.  The CHAMPS project explores the extent to which these tasks can be automated by dynamically creating workflows that control the sequence in which activities are performed. There are two sub-problems. The first is inferring the set of actions that are available based on the nature of the artifacts present along with determining constraints on the order in which actions can be taken. The second sub-problem is to bind actions to specific resources and schedule their execution in a way that optimizes business objectives (e.g., maximize profits, minimize downtimes, minimize the elapsed time).

Event Mining.We are exploring the application of data mining techniques to availability and performance management. Our efforts employ pattern recognition (e.g., for periodic patterns) and classification algorithms to identify actionable situations. We have developed a tool for browsing and analyzing event data that includes techniques that aid in pattern recognition. We work closely with IBM installations to obtain data and to assess our techniques.

Automated Problem Determination.This effort seeks to isolate complex problems in a generic way. Included here are techniques for: characterizing normal behavior, actively probing to identify and isolate problems, and analyzing expected behavior. One aspect of this project makes use of Bayesian Networks and Information Theory to gain insight into where to place probes in complex systems.


People

  • Naga Avachitula
  • Alina Beygelzimer
  • Mark Brodie
  • Aaron Brown
  • Melissa Buco
  • Asit Dan
  • Yixin Diao
  • Bob Filepp
  • Steve Froehlich
  • Shang Guo
  • Joseph L Hellerstein (dept. mgr)
  • Bob Kearney
  • Alexander Keller (proj. lead)
  • Vijaya Krishnan
  • Laura Luan
  • Heiko Ludwig
  • Sujay Parekh
  • Chang-Shing (Charles) Perng
  • Kavitha Ranganathan
  • Irina Rish
  • Daniela Rosu
  • Maheswaran Surendra (mgr)
  • Peppo Valetto
  • Laura Shwartz
  • Chris Ward

Publications Since 1996

  1. ``A Flexible and Scalable Approach to Navigating Measurement Data in Performance Management Applications," Robert F. Berry and Joseph L. Hellerstein, Second International Conference on Systems Management, June 19-21, 1996.
  2. ``An Approach To Selecting Metrics for Detecting Performance Problems in Information Systems," Joseph L. Hellerstein, Second International Conference on Systems ManagementJune 19-21, 1996.
  3. ``Rules of Thumb for Selecting Detection Metrics,'' Proceedings of Computer Measurement Group, December, 1996.
  4. ``Automated Performance Tuning: Possibilities and Realities," Joseph L. Hellerstein, Paper and invited talk at the Computer Measurement Group, Orlando, Florida, December, 1997.
  5. ``Using Multidimensional Databases for Problem Determination and Planning of a Networked Application,'' Third International Conference on Systems ManagementApril 22-24, 1998.
  6. ``An Introduction to Change-Point Detection,'' Joseph L. Hellerstein, Proceedings of ACM Sigmetrics, June, 1998.
  7. ``Applications Management--Current Practices, Research Results and Future Directions," Paul Brusil, Joseph Hellerstein, and Hanan Lutfiyya, Journal of  Network and Systems Management, Vol. 6, No. 3., 1998.
  8. ``Characterizing Normal Operation of a Web Server: Application to Workload Forecasting and Capacity Planning," Joseph L. Hellerstein, Fan Zhang, and Perwez Shahabuddin, Computer Measurement Group, December, 1998.
  9. ``An Approach to Predictive Detection for Service Management," Joseph L. Hellerstein, Fan Zhang, and Perwez Shahabuddin. Symposium on Integrated Network Management, 1999.
  10. ``ETE: A Customizable Approach to Measuring End-to-End Response Times and Their Components in Distributed Systems," Joseph L. Hellerstein, Mark Maccabee, W. Nathaniel Mills, and John J. Turek. International Conference on Distributed Computing Systems, 1999.
  11. ``Predictive Models for Proactive Network Management: Application to a Production Web Server," Dongxu Sheng and Joseph L. Hellerstein, Network Operations and Management, 2000.
  12. ``EventBrowser: A Flexible Tool for Scalable Analysis of Event Data,Sheng Ma and Joseph L. Hellerstein, Distributed Operations and Management, 1999.
  13. ``Automated Drill Down: An Approach to Automated Problem Determination for Performance Management," David Hart, Joseph Hellerstein, and Po Yue, Proceedings of the Conference of the Computer Measurement Group, December, 1999.
  14. "Modeling Heterogeneous Network traffic wavelet domain: Part II-- non-Gaussian traffic," Sheng Ma and Chuanyi Ji,  IEEE networking, 1999.
  15.  "Modeling Network Traffic in Wavelet Domain", Sheng Ma and Chuanyi Ji, International Journal on Chaos Theory and  Applications, 1999.
  16. ``Independent Wavelet Models: Unified Models for Heterogeneous Network Traffic ,'' Chuanyi JiXusheng Tian and Sheng Ma,  March, INFOCOM'99.
  17. "General Re-weighting methods for combining random weak perceptrons," Sheng Ma and Chuanyi Ji, Workshop on learning, Snowbird, Utah, April 1999.
  18. "Performance and Efficiency: Recent Advances in Supervised Learning", Sheng Ma and Chuanyi Ji, Proceedings of the IEEE, 1999.
  19. ``Ordering Categorical Data to Improve Visualization," Sheng Ma and Joseph L. Hellerstein, IEEE Symposium on Information Visualization, 1999.
  20. "A Statistical Approach to Predictive Detection," Joseph L. Hellerstein, Fan Zhang and Perwez Shahabuddin, Computer Networks, January, 2000.
  21. "Toward Applying Machine Learning to Design Rule Acquisition for Automated Graphics Generation," Michelle X. Zhou and Sheng Ma, AAAI Symposium on Smart Graphics, 2000.
  22. "AutoTune: A Generic Agent for Automated Performance Tuning," JP Bigus, JL Hellerstein, TS Jayram, and MS Squillante, Practical Application of Intelligent Agents and Multi Agent Technology, 2000.
  23. "Recognizing End-User Transactions in Performance Management," JL Hellerstein, TS Jayram, I Rish, American Association of Artificial Intelligence, 2000.
  24. "Mining Partially Periodic Event Patterns With Unknown Periods," S Ma and JL Hellerstein,  International Conference on Data Engineering, 2000. (Also in Pattern Recognition and String Matching, edited by Dechang Chen and Xiuzhen Cheng, to be published by Kluwer.)
  25. "Scalable Visualization of Event Data," David Taylor, Nagui Halim, Joseph L Hellerstein, and Sheng Ma, Workshop on Distributed Systems Operations and Management (DSOM), Austin, Texas, December, 2000.
  26. "An Approach to On-Line Predictive Detection," Fan Zhang and Joseph L. Hellerstein,  MASCOTS, 2000.
  27. "Metrics for Performance Tuning of Web-Based Applications," W. Nathaniel  Mills III, LeRoy Krueger, Willy Chiu, Nagui Halim, Joseph L Hellerstein, Mark S Squillante, The Computer Measurement Group, 2000.
  28. "Analysis of Large-Scale Distributed Information Systems", JL Hellerstein, TS Jayram, and MS Squillante, MASCOTS 2000.
  29. "Mining Event Data for Actionable Patterns," JL Hellerstein and S Ma, The Computer Measurement Group, 2000.
  30. "A Systematic Approach to Discovering Correlation Rules for Event Management," L Burns, JL Hellerstein, S Ma, CS Perng, DA Rabenhorst, D Taylor, IFIP/IEEE International Symposium on Integrated Network Management, 2001.
  31. "Event Relationship Networks: A Framework for Action Oriented Analysis in Event Management,"Thoenen, Jim Riosa, JL Hellerstein, RC 21843 and  IFIP/IEEE International Symposium on Integrated Network Management, 2001.
  32. "Using Control Theory to Achieve Service Level Objectives in Performance Management,"Parekh, N Gandhi, JL Hellerstein, D Tilbury, TS Jayram, J BigusReal Time Systems Journal, Vol.23, No. 1-2, 2002.
  33. "Feedback Control of a Lotus Notes Server: Modeling and Control Design," N. Gandhi, S. Parekh, J. Hellerstein, and D.M. Tilbury, American Control Conference, 2001. (Best paper in session.)
  34. "An Introduction to Control Theory With Applications to Computer Science," JL Hellerstein and Parekh, ACM Sigmetrics, 2001.
  35. "EventMiner: An integrated mining tool for scalable analysis of Event data," Sheng Ma, Joseph L. Hellerstein, Chang-sheng Perng, Knowledge and Data Discovery Workshop on Visual Data Mining, 2001.
  36. "A Business-Oriented Approach to the Design of Feedback Loops for Performance Management," Yixin Diao, Joseph L. Hellerstein, Sujay Parekh, Distributed Operations and Management, 2001.
  37. "Dependency Analysis in Distributed Systems Using Fault Injection: Application to Problem Determination in an e-Commerce Environment," Saurabh BagchiGautam Kar, and Joseph L. Hellerstein, Distributed Operations and Management, 2001.
  38. "Rule Induction of Computer Events," Ricardo VilaltaSheng Ma, and Joseph L. Hellerstein, Distributed Operations and Management, 2001.
  39. "FARM: A Framework for Exploring Mining Spaces with Multiple Attributes," Charles PerngHaixun Wang, Sheng Ma, and Joseph L. Hellerstein, First IEEE Conference on Data Mining, 2001.
  40. "Mining Mutually Dependent Patterns," Sheng Ma and Joseph L. Hellerstein, IEEE Conference on Data Mining, 2001.
  41. Managing the Performance of Lotus Notes: A Control Theoretic ApproachNeha Gandhi, Joseph L. Hellerstein, Sujay Parekh, and Dawn M Tilbury, Proceedings of the Computer Measurement Group, 2001.
  42. Stochastic Modeling of Lotus Notes with a Queueing ModelYixin Diao, Joseph L. Hellerstein, and Sujay Parekh, Proceedings of the Computer Measurement Group, 2001.
  43. Using MIMO Feedback Control to Enforce Policies for Interrelated Metrics With Application to the Apache Web Serve," Diao, N Gandhi, JL Hellerstein, Parekh, and DM Tilbury. Network Operations and Management, April 15-19 2002, pp. 219-234. (Best paper in conference.)
  44. "Managing Dynamic Services: A Contracts-Based Approach to a Conceptual Architecture," Alexander Keller, Heiko Ludwig, Gautam KarAsit Dan, Joseph L Hellerstein. Network Operations and Management, 2002.
  45. "Mining Mutually Dependent Patterns for System Management," Sheng Ma and Joseph L. Hellerstein, IEEE Journal on Selected Areas in Communications, 2002, pp. 726-735.
  46. "MIMO Control of an Apache Web Server: Modeling and Controller Design,"Diao, N Gandhi, JL Hellerstein, Parekh, and DM Tilbury, American Control Conference, 2002. (Best paper in session.)
  47. "A General-Purpose Algorithm for Quantitative Diagnosis of Performance Problems,"  Joseph L. Hellerstein, Journal of Network and Systems Management, June, 2003.
  48. "Using Fuzzy Control to Maximize Profits in Service Level Management," Y Diao, JL Hellerstein, Parekh. IBM Systems JournalVol 41, No 3, 2002.
  49. "Case Studies In Prediction of Potential Failures in Computer Systems,"  R Vilalta, C Apte, JL Hellerstein, S Ma, S Weiss. IBM Systems JournalVol 41, No. 3, 2002.
  50. "SLA Drive Management of Distributed Systems Using the Common Information Model," Markus Debusmann and Alexander Keller, Symposium on Integrated Management, 2003.
  51. “Fast Track Introduction to Control Theory for Computer Scientists,” Yixin Diao, Joseph L. Hellerstein, Sujay Parekh, Symposium on Integrated Management, 2003.
  52. "Discovering Actionable Patterns in Event Data," JL Hellerstein, S Ma, C Perng. IBM Systems JournalVol 41, No 3, 2002.
  53. User-Directed Exploration of Mining Space With Multiple Attributes",  C Perng, H Wang, S Ma, and JL Hellerstein, Knowledge and Data Discovery, 2002.
  54. "A First-Principles Approach to Constructing Transfer Functions for Admission Control in Computing Systems," JL Hellerstein, Y Diao, and Parekh. Conference on Decision and Control, 2002.
  55. "Optimizing Quality of Service Using Fuzzy Control,"Diao, JL Hellerstein, Parekh, Distributed Systems Operations and Management, 2002.
  56. "Managing Web Server Performance with AutoTune Agents," Y Diao, JL Hellerstein, Parekh, JP Bigus. IBM Systems JournalVol 42, No. 1, 2003.
  57. "Generic On-Line Discovery of Quantitative Models for Service Level Management,"Diao, F Eskesen, S Froehlich, JL Hellerstein, A Keller, L Spainhower, and M Surendra, IFIP Symposium on Integrated Management, 2003.
  58. "On-Line Response Time Optimization of An Apache Web Server," Yixin DiaoXue Lui, Steve Froehlich, Joseph L Hellerstein, Sujay Parekh, and Lui Sha. International Workshop on Quality of Service, 2003.
  59. "Generic, On-Line Optimization of Multiple Configuration Parameters With Application to a Database Server,Yixin Diao, Frank Eskesen, Steven Froehlich, Joseph L Hellerstein, Lisa Spainhower, and Maheswaran Surendra. IFIP Conference on Distributed Systems Operations and Management, 2003.
  60. Managing the Performance Impact of Administrative Utilities," Sujay Parekh, Kevin Rose, Joseph L. Hellerstein, Sam Lightstone, Matthew Huras, and Victor Chang. IFIP Conference on Distributed Systems Operations and Management, 2003.
  61. "Dynamic Surge Protection: An Approach to Handling Unexpected Workload Surges With Resource Actions That Have Lead Times," E. Lassettre, DW Coleman, Y Diao, S Froehlich, JL Hellerstein, L Hsiung, T Mummert, M Raghavachari, G Parker, L Russell, M Surendra, V Tseng, N Wadia, and P Ye.  IFIP Conference on Distributed Systems Operations and Management, 2003..
  62. "Towards Benchmarking Autonomic Computing Maturity," Sam Lightstone, Joseph Hellerstein, William Tetzlaff, Philippe Janson, Ed Lassettre, Carolyn Norton, Bala Rajaraman, and Lisa Spainhower. IEEE Workshop on Autonomic Computing Principles and Architectures, BanffAlbertaCanada, 2003.
  63. "Enforcing Quality of Service Using Decentralized Runtime Feedback Control," Yixin Diao, Bruno Ciciani, Catherine H. Crawford. International  Computer Measurement Group Conference, 2003.
  64. ``The CHAMPS System: Change Management with Planning and Scheduling," Alexander Keller, Joseph L. Hellerstein, Joel L. Wolf, Kun-Lung Wu, Vijaya Krishnan. Network Operations and Management, 2004.
  65. "Using MIMO Linear Control for Load Balancing  in Computing Systems," Yixin Diao, Joseph L. Hellerstein, Adam Storm,  Maheswaran Surendra, Sam Lightstone, Sujay Parekh, and Christian Garcia-Arellano. American Control Conference, 2004.
  66. "Challenges in Control Engineering of Computing Systems," Joseph L. Hellerstein, American Control Conference, 2004.
  67. Incorporating Cost of Control Into the Design of a Load Balancing Controller,” Yixin Diao, Joseph L. Hellerstein, Adam Storm,  Maheswaran Surendra, Sam Lightstone, Sujay Parekh, and Christian Garcia-Arellano. Invited paper, Real-Time and Embedded Technology and Application Systems Symposium, 2004.
  68. "Throttling Utilities in the IBM DB2 Universal Database Server," Sujay Parekh, Kevin Rose, Yixin Diao, Victor Chang, Joseph L. Hellerstein, Sam Lightstone, Matthew Huras. American Control Conference, 2004.
  69. “An Approach to Benchmarking Configuration Complexity,” Aaron B. Brown and Joseph L. Hellerstein. SIGOPS 2004.
  70. “Automating the Provisioning of Application Services with the BPEL4WS Workflow Lanaguage,” Alexander Keller and Remi Badonnel, Distributed Systems Operations and Management (DSOM) 2004.
  71. Feedback Control of Computing Systems, Joseph L. Hellerstein, Yixin Diao, Sujay Parekh, and Dawn Tilbury. Wiley-Interscience. 2004.
  72. “Self-Managing Systems: A Control Theory Foundation,” Yixin Diao, Joseph L. Hellerstein, Gail Kaiser, Sujay Parekh, Dan Phung, OASIS 2004, keynote at High-Speed Local Networks 2004, and IEEE Second conference on Engineering of Autonomic Systems, 2005.
  73. Service Level Management: A Dynamic Discovery and Optimization Approach,” Yixin Diao, Frank Eskesen, Steven Froehlich, Joseph L. Hellerstein, Alexander Keller, Lisa F. Spainhower, and Maheswaran Surendra. Electronic Transactions on Network and Systems Management, April, 2005
  74. "A Business-Oriented Optimization of Performance and Availability for Utility-Based Computing," Joseph L Hellerstien, Kaan Katircioglu, and Maheswaran Surendra. Journal on Selected Areas of Communications, Oct., 2005.
  75. A Framework for Applying Inventory Control to Capacity Management for Utility Computing, Joseph L Hellerstein, Kaan Katircioglu, and Maheswaran Surendra. IFIP/IEEE Integrated Management, 2005, pp. 237-250.
  76. “A Model of Configuration Complexity and Its Application to a Change Management System,” Aaron Brown, Alexander Keller, and Joseph L. Hellerstein.  IFIP/IEEE Integrated Management, 2005, pp. 531-644. Best paper in conference.
  77. “Control Engineering Challenges in Computing Systems,” Joseph L. Hellerstein. To appear in IEEE Control Systems Magazine.
  78. Comparative Studies of Load Balancing With Control and Optimization Techniques,” Yixin Diao, Chai Wah Wu, Joseph L. Hellerstein, Adam J. Storm, Maheswaran Surendra, Sam Lightstone, Sujay Parekh, Christian Garcia-Arellano, Matthew Carroll, Lee Chu, and Jerome Colaco. American Control Conference, 2005.
  79. Reducing the Cost of IT Operations---Is Automation Always the Answer? Aaron B. Brown and Joseph L. Hellerstein. Accepted to HotOS-X, 2005.
  80. Book Review: Fuzzy Control of Computing Systems. Joseph L. Hellerstein. Control Systems Magazine, pp. 94, June, 2005.
  81. A Control Theory Foundation for Self-Managing Systems, Yixin Diao, Rean Griffith, Joseph L. Hellerstein, Gail Kaiser, Sujay Parekh, Dan Phung. Accepted to Journal on Selected Areas of Communications.
  82. Control Considerations in Scaling Event Correlation, Wei Xu, Joseph L. Hellerstein, Bill Kramer, and David Patterson. To appear in Distributed Systems Operations and Management, 2005.


Research Reports and Recent Submissions

  1. "Modeling Heterogeneous Network traffic wavelet domain: Part I-- temporal correlation," Sheng Ma and Chuanyi Ji,  submitted, 1999.
  2. "Analysis of the Control of a Multiclass Queueing Network Based on Production Server Data", JL Hellerstein, TJ JayramSujay Parekh, and MS Squillante,  In Progress.
  3. "An Analysis of Naive Bayes Classifiers on Low-Entropy Distributions," Irina Rish, Joseph L. Hellerstein, and Jayram Thathachar, RC91994, 2001.
  4. "An Analysis of Data Characteristics that Affect Naive Bayes Performance," Irina Rish, Joseph L. Hellerstein, and Jayram Thathachar, RC21993, 2001.
  5. Applying Control Theory to Computing Systems, Joseph L Hellerstein, Yixin Diao, and Sujay Parekh. Submitted to Communications of the ACM.
  6. Dynamic Adaptation of Rules for Temporal Event Correlation in Distributed Systems, Rean Griffith, Joseph L. Hellerstein, Yixin Diao, and Gail Kaiser. Submitted to 2nd International Conference on Autonomic Computing, 2005.
  7. Controlling Quality of Service in Multitier Web Applications, Yixin Diao, Joseph L. Hellerstein, Sujay Parekh, Hidayatullah Shaikh, Maheswaran Surendra. Submitted to International Conference on Distributed Computing Systems.

 


Presentations


Patents

1.      ``A General Purpose Mechanism for Detecting Performance Problems in Window-Based Systems" ( Robert Berry and Joseph L Hellerstein), issued 3/9/99, 5,881,222.

2.      ``A Simple Approach To Case-Based Reasoning for Data Navigation Tasks," (Joseph L Hellerstein) issued 2/10/98, 5,717,835.

3.      ``Method and Apparatus for Quantitative Diagnosis of Performance Problems Using External Representations,"  (Joseph L Hellerstein) 11/30/99, 5,996,090.

4.      ``System and Method for Automated Problem Isolation in Systems with Measurements Structured as a Multidimensional Database," (Joseph L. Hellerstein and Po C. Yue), 12/11/2002, 6,330,564.

5.      "Predictive Model-Based Measurement Acquisition," (JL Hellerstein and Haus), August 6, 2002, US 6,430,615.

6.      " Method and System for Optimal Problem Isolation for Data Structured as a Multidimensional Database" (Joseph L Hellerstein, Tracy Kimbrel, Robert D Kearney, JayramThathachar), November, 2002, 6,330,564.

7.      “System and methods for using Continuous Optimization for Ordering Categorical Data Sets,” Alina Beygelzimer, Joseph L. Hellerstein, Sheng Ma, and Charles Perng, September 2, 2003, 6,615,211.

8.      “Methods and Apparatus for Performance Management Using Self-Adjusting Model-Based Policies” issued 1/6/2004, 6,676,128.

9.      “Systems and methods for pairwise analysis of event data” issued  2/24/2004, 6,697,802.

10.  “System and Method for Systematic Construction of Correlation Rules for Event Management,” 6,697,791 on February 24, 2004.

11.  Method, computer program product, and system for deriving web transaction performance metrics” issued 3/2/2004, 6,701,363.

12.  “Systems and methods for automated navigation between dynamic data with dissimilar structures” issued 3/9/2004, 6,704,721.

13.  “System and method for generic automated tuning for performance management”, issued 4/6/2004, 6,718,358.

14.  “Method, computer program product, and system for deriving web transaction performance metrics”,  May, 20004, 6,701,363.

15.  “Object-oriented framework for generic adaptive control,” US 0302611, September 3, 2004.

16.  “Systems and methods for authoring and executing operational policies that use event rates,” US 6792456, September 14, 2004.

17.  “Systems and methods for discovering mutual dependence patterns,” US 6829608, December 7, 2004.

18.  “Systems and methods for exploratory analysis of data for event management,” US 6836894, December 28, 2004.

19.  “Method and system for recognizing end-user transactions,” US6925452, August 2, 2005.

20.  “System and method for on-line prediction using dynamic management of multiple sub-models,” US6937966, August 30, 2005.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Modified November 29,  2005.