**Machine Learning Seminar 2008May 25, 2008Organized by IBM Haifa Research Lab**

**Abstracts**

**An Overview of Agnostic Learning**

Adam Kalai, Georgia Institute of Technology

The agnostic model of learning (Kearns, Schapire, and Sellie, 1992) is a binary classification model that places little or no requirements on the data. The main assumption is that future examples will be drawn from the same distribution as training examples, while accuracy is compared to the most accurate classifier in some class. Hence, such algorithms can tolerate arbitrary or even adversarial noise. Furthermore, as in Valiant's PAC model, an agnostic algorithm should be computationally efficient. Designing computationally efficient agnostic learning algorithms has proven to be an interesting challenge. I will give an overview of older and more recent work on agnostic learning, focusing on provably efficient algorithms (instead of impossibility results). The analysis of agnostic learning algorithms sheds light on several key Machine Learning notions, such as Fourier learning, SVMs, active learning, decision tree learning, and boosting.

**Learning for Debugging Large-Scale Web Services**

Ira Cohen, HP Labs

Large scale IT systems, and web services in particular, are difficult to debug and manage. Commercial tools monitor these services, but provide little in terms of analysis and root cause information when problems occur. In recent years there has been an effort to leverage and develop machine learning methods for transforming these monitors to actionable information and improve the manageability of systems. The talk will describe our work in using machine learning for web services performance debugging. In particular, it will focus on the use of clustering with pair-wise constraints, work that is motivated by the fact that obtaining reliable labels for different types of performance problems has proven to be very difficult. I'll describe our probabilistic clustering method which allows soft pair-wise constraints, represented by an intuitive measure of confidence in the constraint, and compare it to existing methods in terms of robustness to misspecified constraints.

**Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression**

Saharon Rosset, Tel-Aviv University

In this talk I will introduce three important concepts, then bring them together to discuss some of my latest research. The three concepts are: 1. Estimation of conditional distributions P(Y|X) through quantile regression. This is an approach that allows maximal flexibility in understanding and modeling the dependence of a response Y on features X, as each quantile (=percentile) of P(Y|X) can be modeled separately. I will introduce kernel quantile regression (i.e., quantile regression in a reproducing kernel Hilbert space) as the main modeling tool of interest.2. The statistical role of regularization and the importance of model selection. Model selection can be driven by theoretical considerations (like SRM) or by empirical performance (via cross validation), which is our focus.3. Efficient generation of ''regularization paths'' to regularized problems. The idea here is to use geometrical considerations and homotopy approaches to follow solution paths of regularized problems for a small computational cost. Such algorithms have been developed for support vector machines, lasso, and other important modeling tools. Bringing these three elements together, I will present an algorithm to solve the quantile regression problem for all quantiles simultaneously, and obtain the optimal cross validated solution for every quantile, via a path-following approach.

**Revealing Structure in Unseeable Data**

Noam Slonim, IBM Haifa Research Lab

Given the pairwise affinity relations associated with a set of data items, clustering algorithms aim to reveal the underlying data structure by automatically partitioning the data into a small number of homogeneous clusters. An inherent difficulty in this task is the fact that the input size is quadratic in the number of data points. Specifically, if the number of data points is large or if computing each affinity relation is demanding, existing algorithms become infeasible. In this talk I will describe novel rigorous clustering strategies to handle such scenarios, and will present results for various domains of great practical importance. In particular, I will argue that in many real world problems the underlying data structure can be revealed by observing only small fractions of the entire data.

**Keynote: Learning in an Integrated Intelligent System: Examples from the CALO System**

Thomas G. Dietterich, Director of Intelligent Systems Research, Oregon State University

CALO (Cognitive Assistant that Learns and Organizes) is a large 5-year effort to build an integrated intelligent assistant for the desktop knowledge worker. A central focus of CALO was on integrating modern machine learning methods into an AI system. This talk will describe two themes emerging from this research: (a) an architecture for integrating multiple learning components to achieve end-to-end learning and (b) methods for learning procedures from user demonstrations by combining linguistic and data flow knowledge. The talk will include videos and demonstrations of some CALO components.

**Solving Biological Problems with Probabilistic Graphical Models**

Eran Segal, Weizmann Institute of Science

Genomic datasets, spanning many organisms and data types, are rapidly being produced, creating new opportunities for understanding the molecular mechanisms underlying human disease, and for studying complex biological processes on a global scale. Transforming these immense amounts of data into biological information is a challenging task. In this talk, I will describe a statistical modeling language that we developed, based on Bayesian networks, for representing heterogeneous biological entities and modeling the mechanism by which they interact. I will present the statistical learning approaches that we employ to automatically learn the details of these models (structure and parameters), and outline the computational and mathematical challenges that lie ahead.

**Machine Learning Techniques inside the Medical Information Hub**

Andre Elisseeff, IBM Zurich Research Lab

The complexity emerging from modern information systems and global standards is overwhelming medical practitioners and patients. Text based user interactions on unstructured data and complex standards take too much time and are not intuitive. In this talk, we will present a tool aiming at changing the interaction with medical records from a text based to a graphical based paradigm. It enriches the user experience by mapping all medical records to a browseable virtual 2D/3D representation of a human body. It is based on a large ontology (SNOMED with about half a million concepts) for the mapping and relies on machine learning for semantic search, data entry and user assistance as well as 3D navigation. This talk will focus on the application of machine learning techniques in the development of such a tool.

**Projection Methods: Algorithmic Structure and Acceleration Techniques**

Yair Censor, University of Haifa

Recognizing that the Perceptron algorithm is a projection method inspires our interest in the class of projection methods. Are there other methods in that class which can be interpreted as, or modified into, machine learning algorithms? What else can methods from this class "do" for us? How do projection methods work? What are the algorithmic structures available in the class of projection methods? what can be said about convergence properties and/or experimental initial behavior patterns? When, why and how are projection methods preferable over other methods? How can they be accelerated?

**Learning to See in Plato's Cave and Distributed Teachers**

Ran Gilad-Bachrach, Intel Research Israel Lab

In the allegory of the cave, the great Greek philosopher Plato describes a prisoner who is captured in a cave and can not see reality but only its reflections as they appear as shadows on the walls of the cave. The prisoner learns to distinguish between the different objects based on the shadows they cast. However, once the prisoner is released and steps outside of the cave, will he be able to identify objects in the reality? In this talk, we discuss the results of an experiment we ran on learning object class recognition tasks using images rendered by 3D image processing software. These tools make it possible to generate multiple views of a single object from different view points, under different illuminations and more. Therefore, it is easy to generate large and very informative training-sets using these tools. However, in this setting, our learner is trapped in the cave of virtual-reality images and the question remains how well the learner generalize when escaping the cave and faced with real world images.

Another method we present for augmenting available data for machine vision algorithm is the use of distributed teachers. Distributed teachers are plentiful but lack the inner consistency one would expect from labels. We present algorithms for unifying the labels of such labelers to generate consistent labels.

This is joint work by Aharon Bar-Hillel, Chen Goldberg, Eyal Krupka, Liat Ein-Dor and Yossi Ittach.