**Machine Learning Seminar 2011June 12, 2011Organized by IBM Haifa Research Lab**

## Abstracts

**Know What You Don't Know: Uncertain Models Learns with Partial Feedback**

Koby Crammer, Technion – Israel Institute of Technology

The talk will focus on a class of algorithms that maintain uncertainty information about classifier parameters. Informally, the uncertainty is modeled via a covariance matrix over the weights of a linear hypotheses. Learning in this framework modify parameters by estimating weights and reducing the uncertainty. I will describe few algorithms to learn models in this framework, both with full and partial feedback, and show a mistake bound analysis. The usefulness of maintaining confidence will be demonstrated in multi-class decision problems and domain adaptation.

**PARIS: Principal Atom Recognition In Sets**

Michal Aharon, HP Labs Israel

We introduce a new method for discovering latent topics in sets of objects, such as documents. Our method, which we call PARIS (for Principal Atoms Recognition In Sets), aims to detect principal sets of elements, representing latent topics in the data, that tend to appear frequently together. These latent topics, which we refer to as `atoms', are used as the basis for clustering, classification, collaborative filtering, and more. We develop a target function which balances compression and low error of representation, and the algorithm which minimizes the function. Optimization of the target function enables an automatic discovery of the number of atoms, representing the dimensionality of the data, and the atoms themselves, all in a single iterative procedure. We demonstrate PARIS's ability to discover latent topics, even when those are arranged hierarchically, on synthetic, documents and movie ranking data, showing improved performance compared to existing algorithms, such as LDA, on text analysis and collaborative filtering tasks.

**Learning visual similarity using classifiers**

Lior Wolf, The Blavatnik School of Computer Science, Tel-Aviv University

The One-Shot-Similarity (OSS) is a framework for classifier-based similarity functions. It is based on the use of background samples and was shown to excel in tasks ranging from face recognition to document analysis.

In this talk we will present the framework as well as the following results:

(1) when using a version of LDA as the underlying classiﬁer, this score is a Conditionally Positive Deﬁnite kernel and may be used within kernel-methods (e.g., SVM),

(2) OSS can be efﬁciently computed, and (3) a metric learning technique that is geared toward improved OSS performance.

**Multi-class Norm-based AdaBoost-like Algorithm**

Dan Gutfreund, IBM Haifa Research Lab

We propose a new approach to generalize the Adaboost algorithm of Freund & Schapire to the multi-class setting. The basic idea is to map labels and confidence-based classifiers to a normed vector space, and to measure performance by distances in this space. The result is a family of algorithms which can handle the standard multi-class classification setting, as well as the setting where there is a structure on the label-space in the form of distances between the labels. In the talk I will present both theoretical analysis as well as experimental results.

**Online Learning in The Manifold of Low-Rank Matrices**

Gal Chechik, Bar-Ilan University

Models parametrized as matrices are common in multi-task learning and metric learning, as in the case of learning a Mahalanobis distance matrix. But when the dimensionality of the data is large, learning such models becomes infeasible due to memory constraints. A natural approach to learn memory-restricted matrix models is to limit their ranks, since an n by m matrix of rank k can be stored using (m+n)k parameters. Unfortunately, the set of low-rank matrices is non-convex, and approaches like iterative projected gradient are very costly in run time.

Here we use the fact that low-rank matrices can be viewed as embedded Riemann manifolds in a Euclidean space and describe a procedure to learn such models in O((m+n)k) linear time. Our approach, Loreta, is based on *retractions*.These are operations that map a point onto the low-rank manifold as accurately as projections, and we show how to compute a special retraction in linear time when the gradients are of rank 1. We find that Loreta learns similarity of text documents through a ranking loss much more accurately than full rank models using the same memory footprint. Loreta also outperformed other methods using thesame memory when tested in a large scale image annotation task.

**Suggesting Friends Using the Implicit Social Graph**

Sigalit Bar, Google

Users of online communication tools implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this talk we describe the implicit social graph formed by these interactions, which is distinct from explicit social graphs in which users explicitly add other individuals as their "friends".

We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends.

We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail Labs features.

**What Else Can We Do with More Data**

Shai Shalev-Shwartz, The Hebrew University of Jerusalem

In some applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. In the talk I'll show how more data can be beneficiary for other purposes. In particular, I'll present algorithms that can leverage large data sets to reduce the required training runtime, prediction runtime, and algorithms that can use the multitude of examples to compensate for lack of full information on each individual example.

**Automatically Tagging Email by Leveraging Other Users' Folders**

Yehuda Koren, Yahoo!

Most email applications devote a significant part of their real estate to organization mechanisms such as folders. Yet, we verified on the Yahoo! Mail service that 70% of email users have never defined a single folder. This implies that one of the most well known email features is underexploited. We propose here to revive the feature by providing a method for generating a lighter form of folders, or tags, benefiting even the most passive users. The method automatically associates, whenever possible, an appropriate semantic tag with a given email. This gives rise to an alternate mechanism for organizing and searching email.

We advocate a novel modeling approach that exploits the overall population of users, thereby learning from the wisdom-of-crowds how to categorize messages. Given our massive user base, it is enough to learn from a minority of the users who label certain messages in order to label that kind of messages for the general population. We design a novel cascade classification approach, which copes with the severe scalability and accuracy constraints we are facing. Significant efficiency gains are achieved by working within a low dimensional latent space, and by using a novel hierarchical classifier. Precision level is controlled by separating the task into a two-phase classification process.

We performed an extensive empirical study covering three different time periods, over 100 million messages, and thousands of candidate tags per message. The results are encouraging and compare favorably with alternative approaches. Our method successfully tags 72% of incoming email traffic. Performance-wise, the computational overhead, even on surge large traffic, is sufficiently low for our approach to be applicable in production on any large Web mail service.