Skip to main content

IBM R&D Labs in Israel

Machine Learning Seminar 2015

Monday, November 9th, 2015
IBM Research - Haifa, Israel

Tab navigation

Program

09:15-10:00	Registration
10:00-10:15	Opening Remarks, Michal Rosen-Zvi, IBM Research - Haifa
10:15-10:45	Constrain, Train, Validate and Explain: A Classifier for Mission-Critical Applications, Yaakov Engel, Rafael Abstract: Classifiers used in mission-critical applications, where misclassification errors incur high costs, should be robust to training-set artifacts, such as insufficient or misrepresentative coverage and severe forms of bias. As such, they are required to support intensive designer-control, and a range of validation procedures that must go beyond cross-validation. For such applications, we advocate the use of a family of classifiers that employ a factored model of the posterior class probabilities. These classifiers are simple, interpretable, allow their designers to enforce a variety of domain-specific constraints, and can tolerate missing data both during training and at prediction time. Such classifiers are also capable of explaining their decisions in terms of the basic measured quantities. This classifier is used in several projects, one of which is described in this talk.
10:45-11:45	Keynote: Deep Networks: a Theory?, Tomaso Poggio, MIT Abstract: IS-theory starts from the hypothesis that invariant representations of images are the main computational goal of the ventral stream in visual cortex. Invariant representations can be proved to lead to lower sample complexity in image recognition. We propose a biologically plausible simple-complex cells module (HW module) for computing components of an invariant signature. We use in a hierarchical architecture that add selectivity to invariance by efficient approximation of multidimensional functions. The architecture uses an extension of additive splines that we call hierarchical additive splines. We show that today's Deep Convolutional Networks can be characterized in terms of this theoretical framework.
11:45-12:00	Break
12:00-12:30	Active Learning for Regression, Sivan Sabato, Ben-Gurion University Abstract: Active learning is the field of machine learning which studies learning when examples are abundant, but labels are expensive. For instance, this occurs when examples are documents or photos that are freely available on the web, but identifying their content reliably requires human labor. Active learning algorithms interactively select which labels to collect, taking into account the usefulness of the unknown answer. I will present a new active learning algorithm for parametric linear regression with random design. This algorithm has finite sample convergence guarantees for general distributions in the misspecified model. This is the first active learner for this setting that provably can improve over passive learning. Following the stratification technique advocated in Monte-Carlo function integration, this active learner approaches the optimal risk using piecewise constant approximations. Based on joint work with Remi Munos, INRIA Lille.
12:30-13:00	Singular Values and Eigenvalues in Data Analysis, Matan Gavish, Hebrew University Abstract: Spectral algorithms have played a central part in data analysis across all branches of science since at least the 1930s. Singular values and eigenvalues of data matrices appear under numerous names: factors, principal components, canonical correlations, etc. Recent theoretical advances allow systematic study of the spectral algorithms, and sometimes lead to optimal estimation algorithms. I'll show how such optimal algorithms are derived in two problems: matrix denoising and covariance estimation.
13:00-13:30	A Tight Convex Upper Bound on the Likelihood of a Finite Mixture, Elad Mezuman, IBM Research - Haifa Abstract: The likelihood function of a finite mixture model is a non-convex function with multiple local maxima and commonly used iterative algorithms such as EM will converge to different solutions depending on initial conditions. In this work we ask: is it possible to find the global maximum of the likelihood? Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. We also present a convex estimation algorithm that works directly on the discrete set. Taken together, for any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihood. Joint work with Yair Weiss.
13:30-14:30	Lunch
14:30-15:00	Context Sensitive Lexical Similarity via Joint-context and Embedding models, Ido Dagan, Bar-Ilan University Abstract: Identifying similarities between word meanings is a fundamental task in natural language processing, which was found useful for many applications. A prominent unsupervised learning approach for this task is Distributional Similarity – two words would be regarded similar if they tend to appear in similar lexical contexts. Recently, this approach gained tremendous attention thanks to novel word embedding methods, and their efficient implementations, which represent target and context words as continuous vectors. In this talk we present two recent advancements in modeling distributional word similarity. First, we show how joint contexts can be represented effectively via Substitute Vectors, based on language models, yielding a more informative context representation than typical bag of word models. Second, we show how similarity can be measured in a context sensitive manner, allowing us to predict different similarities for a target word depending on the particular context in which it appears. Our empirical results show that context-sensitive similarity is best modeled using substitute vectors, but can be approached also by a simple computation over word embedding vectors. Joint work with Oren Melamud, Jacob Goldberger, Omer Levy, Idan Szpektor and Deniz Yuret.
15:00-15:30	Robust Inference and Local Algorithms, Yishay Mansour, Microsoft Research and Tel-Aviv University Abstract: Robust inference is an extension of probabilistic inference, where some of the observations may be adversarially corrupted. We limit the adversarial corruption to a finite set of modification rules. We model robust inference as a zero-sum game between an adversary, who selects a modification rule, and a predictor, who wants to accurately predict the state of nature. There are two variants of the model, one where the adversary needs to pick the modification rule in advance and one where the adversary can select the modification rule after observing the realized uncorrupted input. For both settings we derive efficient near optimal policy runs in polynomial time. Our efficient algorithms are based on methodologies for developing local computation algorithms. Based on joint works with Uriel Feige, Aviad Rubinstein, Robert Schapira, Moshe Tennenholtz, Shai Vardi.
15:30-15:45	Best student paper award
15:45-16:15	Machine Learning Building Blocks, Shai Fine, Intel Abstract: Big Data Analytics attracts a growing interest, more than ever before. This, in turn, creates a flood of innovative ideas, problems and tasks to handle. It also poses challenges for technologists that strive to keep in pace with the explosion of new algorithmic and modelling toolsets, and provide relevant and competitive solutions. The goal of this work is to help closing this gap. To this end, we will introduce the concept of Machine Learning Building Blocks, which is a finite set of elements that can be mapped to hardware and software primitives and patterns. We will provide some intuition for the definition of the basic building blocks, and specific examples for the mapping to commonly used algorithms and modeling techniques, data characteristics, and usage scenarios. Next, we will present the design of a machine learning benchmark suite that provides a comprehensive coverage for selected building blocks. The novel construction is based on a selection of representative algorithms, real and synthesized data sets, and activation parameters. We will conclude with a few examples that demonstrate the utility of this approach for performance analysis.
16:15-16:30	Closing Remarks, Aya Soffer, IBM Research - Haifa
16:30-18:00	Poster Session

Links