**Machine Learning Seminar 2012**

**Sunday October 14, 2012**

IBM Research - Haifa, Israel

IBM Research - Haifa, Israel

## Tab navigation

- Invitation
- Program- selected tab,
- Registration

PDF version for printing (1,121 KB)

**9:30** Registration

**10:00** Opening Remarks,

*Oded Cohn, Director Haifa IBM Research Lab Dr. Robert Sutor, VP, Business Analytics and Mathematical Sciences, IBM Research*

**10:15** Improved Active Learning Analysis,

*Dr. Nir Ailon, Technion*

**Abstract:**In active learning, unlike in traditional learning, the learner (algorithm) has control over the set of instances for which labels are available. In recent years much progress has been made in devising general purpose query complexity measures for active learning, and in identifying cases where active learning is superior to traditional one. The disagreement coefficient is one of these measures. I will present two problems for which we have both VC dimension and disagreement coefficient bounds, but the known algorithms and query complexity bounds using these parameters do not give any useful result. Instead, a new technique which we call Smooth Relative Regret Approximation (SRRA) gives almost optimal bounds. The technique can be shown to recover query complexity bounds based on the disagreement coefficient, and hence subsumes it.

**10:45**What Can We Learn from Data Statistics? An Overview of Statistical Query Learning,

*Dr. Vitali Feldman, IBM Research - Almaden*

**Abstract:**In the statistical query (SQ) model a learning algorithm can only estimate statistical properties of examples rather than observe individual examples. The model was originally defined by Kearns (JACM, 1998) as a method to design algorithms robust to classification noise. More recently, it was shown that any SQ learning algorithm can be easily parallelized on multi-core computers and can also be converted to a differential privacy-preserving algorithm. Further, it is now known that most common learning techniques, e.g. SVM, PCA, ICA, linear/logistic regression and k-means, can be expressed in the SQ model or adapted to it. An interesting property of the model is that the query complexity of SQ learning can be estimated using combinatorial parameters of the problem. In this talk I'll survey the basic results on SQ learning and describe some of the recent advances in bounding the SQ complexity of learning and optimization tasks.

**11:15**Data Mining in the Streaming Model: Approximating Massive Matrices,

*Dr. Edo Liberty, Yahoo! Research*

**Abstract:**This talk will briefly introduce the streaming computational model in the context of data mining. We will focus on working with matrices that are revealed over time and are too large to store. Working with such massive matrices requires creating a concise yet accurate approximation for them. These are called matrix sketches. This talk will shortly survey new results for matrix sketching in two streaming models.

In the first, the matrix is presented to the algorithm one entry at a time. Examples include recommender systems whose input is of the form "user i rated item j 3 stars". In the second, the matrix is revealed row by row, for example, "document i contains terms j_1,…,j_k". This is the case in web crawling where the crawler cannot store all the documents it visits.

**11:45**Break

**12:00**Genetic GPS: Mirroring Geography via Genetics,

*Prof. Eran Halperin, Tel-Aviv University*

**12:30 Keynote: Machine Learning from High-Dimensional and Large Amounts of Data at Google**,

*Prof.Yoram Singer, Google USA*

**Abstract:**We review the design, analysis and implementation of a few machine learning algorithms for learning from high-dimensional and large amounts of data. We focus on algorithms that are both efficient and can yield compact models using sparsity promoting regularization. We give examples of the usage of the algorithms in online advertisement, image annotation, and the Google's "Brain" project.

**13:30**Lunch

**14:30**Nearest Neighbors: Old and New,

*Dr. Aryeh Kontorovich, Ben Gurion University*

**Abstract:**We offer a new perspective on the nearest neighbor classifier, which yields tighter risk asymptotics than the classic Cover-Hart analysis. As a by-product, our analysis suggests a natural solution to the problem of noisy labels/outliers. Our result holds in doubling metric spaces, of which Euclidean spaces are a special case. The classifier may be learned in linear time and evaluated on new points in logarithmic time via approximate nearest neighbors. Joint work with Lee-Ad Gottlieb and Robert Krauthgamer.

**15:00**On the Trade-off Between Equivalence Constraints and Labels,

*Dr. Liat Ein-Dor, IBM Research - Haifa*

**Abstract:**Supervised learning is based predominantly on labeled examples which are often expensive and scarce. An alternative form of supervision is equivalence constraint, i.e. two examples which are known to be from the same or different classes, yet their class labels are unknown. Equivalence constraints are often easier and cheaper to obtain, but the theoretical underpinnings of their learning utility relative to labels is still lacking. In this study we develop novel framework for analyzing the learning utility of equivalence constraints. Specically, we extend the statistical mechanics perceptron capacity calculations, used thus far only for labeled data, to supervised learning from equivalence constraints. We then derive generalization bounds for training with equivalence constraints, using a link between perceptron capacity and Rademacher complexity. We prove that for large sample sizes, a sample with equivalence constraints supervision becomes as powerful as a fully labeled sample of the same size. We also prove that this result holds even when the examples in the constraints are highly correlated. To validate our theoretical findings we introduce an algorithm that learns a perceptron from equivalence constraints and compare its performance to a similar algorithm learning from labels, using synthetic and real data. The results support the generality of our findings.

**15:30**Break

**15:45**Visual Recognition over Multiple Categories: A Scalable Framework,

*Prof. Amnon Shashua, Hebrew University*

**Abstract:**Single category visual recognition, i.e., detecting an object category from an image has witnessed great success in both academic and industrial contexts. For example, systems today exhibit impressive performance both in accuracy and realtime efficiency when detecting faces, cars or pedestrians from images. However, human vision can effortlessly handle thousands of object classes. The issue of how to cope with the classifier's accuracy and resource complexity when dealing with a growing number of object categories largely remains an open problem. I will present a system designed such that the computational resources grow sub-linearly (poly-logarithmic) with the number of classes. This also implies that features used for recognition should be shared by several classes. I will present an implementation of the algorithm in the context a system for the aide of the visually impaired, where the system is trained for realtime reading (text in the wild, documents, newspapers) and object categorization.

Work done jointly with Shai Shalev-Schwartz (HU) and Yonatan Wexler (Orcam).

**16:45**Concluding Remarks,

*Moshe Levinger, IBM Research – Haifa*

**16:30**Poster Session