Machine Learning Seminar 2014
Sunday, November 23rd, 2014IBM Research  Haifa, Israel
Tab navigation
 Invitation
 Program selected tab,
 Registration
 Posters
Program
09:1510:00 
Registration 

10:0010:15 
Opening Remarks, 
10:1510:45 
MultiModal Models for InDepth Computational Semantics,
Abstract: Vector space models (VSMs), particularly those that are based on neural embeddings have led to significant progress in the modeling of natural language semantics over the past few years. In the standard setup of this problem, vector representations of words, phrases and even full sentences are automatically acquired from large textual corpora (e.g. Wikipedia) and are used for predicting humanbased conceptual similarity. Recently, it has been established that combining information from multiple modalities, especially textual and visual, can enhance VSM quality, supporting the intuition that conceptual meaning is inherently linked to multiple sensual modalities, and is not only linguistic in nature.
In this talk I will describe two inquiries into this standard setup. I will first address the question of utilizing visual information for the modeling of abstract concept meaning. Not surprisingly, the positive impact of such information on the expressive power of VSMs has only been demonstrated for concrete concepts (e.g penguin, table) while no impact was demonstrated on the vast majority of linguistic concepts (both verbal and nominal) that are known to be abstract (e.g. love, war). I will present models that can integrate information from multiple modalities for improved abstract concept modeling, and analyze their expressive power and limitations. I will then present an analysis of the leading data sets for the conceptual similarity task and demonstrate that the human ratings they contain strongly correlate with conceptual association (e.g. Freud and psychology) rather than similarity (e.g. car and train). To compensate for this bias, I will describe simLex999, a new gold standard in which word pairs are judged for similarity rather than association. Experimental study demonstrates that existing VSMs substantially differ in their ability to model these two semantic qualities, although their objectives were not designed to prefer either of them. I will conclude with a list of open questions which will demonstrate that despite the substantial progress VSMs have brought to computational semantics we are still far from capturing the richness of human language meaning. 
10:4511:45 
Keynote: Scaling and Generalizing Variational Inference,
Abstract: Latent variable models have become a key tool for the modern statistician, letting us express complex assumptions about the hidden structures that underlie our data. Latent variable models have been successfully applied in numerous fields including natural language processing, computer vision, population genetics, and many others.
The central computational problem in latent variable modeling is posterior inference, the problem of approximating the conditional distribution of the latent variables given the observations. Inference is essential to both exploratory and predictive tasks. Modern inference algorithms have revolutionized Bayesian statistics, revealing its potential as a usable and generalpurpose language for data analysis. Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing algorithms do not scale well. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand. This requires significant modelspecific analysis, which precludes us from easily exploring a variety of models. In this talk I will discuss our recent research on addressing these two limitations. First I will describe stochastic variational inference, an approximate inference algorithm for handling massive data sets. Stochastic inference is easily applied to a large class of Bayesian models, including topic models, timeseries models, factor models, and Bayesian nonparametric models. Then I will discuss black box variational inference, a generic algorithm for approximating the posterior. We can use black box inference on many models with little modelspecific derivation. Together, these algorithms make Bayesian statistics a flexible and practical tool for modern data analysis. This is joint work based on these two papers:

11:4512:00 
Break 
12:0012:30 
Online Principal Component Analysis,
Abstract: We consider the online version of the well known Principal Component Analysis (PCA) problem. In standard PCA, the input to the problem is a set of d dimensional vectors x_1,... x_n and a target dimension k < d; the output is a set of k dimensional vectors y_1,..., y_n that best capture the top singular directions of the original vectors. In the online setting, the vectors x_t are presented to the algorithm one by one, and for every presented x_t the algorithm must output a vector y_t before receiving x_{t+1}.
We present the first approximation algorithms for this setting of online PCA. Our algorithm produces vectors of dimension k * poly(1/\epsilon) whose quality admit an additive \epsilon approximation to the optimal offline solution allowed to use k dimensions. 
12:3013:00 
Probabilistic Graphical Models of Dyslexia,
Abstract: Reading is a complex cognitive process, errors in which may assume diverse forms. To capture the complex structure of reading errors, a novel way of analyzing reading errors made by dyslexic people is proposed; it's base is probabilistic graphical models. The talk is focuses on three questions. (a) which graphical model best captures the hidden structure of reading errors. (b) whether a graphical model can diagnose dyslexia closely to how experts do (c) how can statistical models support arguments in the debate about the definition and heterogeneity of dyslexia. I will show that Naive Bayes model best agrees with labels given by clinicians and can be therefore used for automation of the diagnosis process. An LDAbased model best captures patterns of reading errors and could therefore contribute to the understanding of dyslexia and to the diagnostic procedure. Finally, results on individuals data clearly support a model assuming multiple dyslexia subtypes.
This is a joint work with Yair Lakretz, Gal Chechik and Naama Fridman. 
13:0013:30 
Single Sensory and Multisensory Information Processing for Internet of Things,
Abstract: There will be over 50 billion connected "things" in the year 2020. Intel's Cloud Internet of Things Analytics Platform is designed to greatly minimize the complexities of ingesting and processing massive amounts of data generated in IoT scenarios. Its vision includes collecting data from numerous devices and sensors and storing it in a cloud. In this talk we will describe innovative algorithms for single sensory and multisensory information processing, including sensor types determination and prototyping followed by informationtheoretic based multisensory change detection and One Class SVM based anomaly detection.

13:3014:45 
Lunch Break 
14:4515:15 
SystemML: A Declarative Machine Learning System, 
15:15  15:45 
Inference by Randomly Perturbing MaxSolvers,
Abstract: Modern inference problems can be increasingly understood in terms of discrete structures such as arrangements of objects in computer vision, parses in natural language processing or molecular structures in computational biology. In a fully probabilistic treatment, all possible alternative assignments are considered thus sampling from traditional structured probabilistic models may be computationally expensive for many machine learning applications. These computational difficulties are circumvented with a variety of optimization techniques that provide maxsolvers to predict the most likely structure.
In this talk I will present a new approach to relax the exponential complexity of probabilistic reasoning in structured models while relying on efficient predictions under random perturbations. This approach leads to a new inference framework that is based on probability models that measure the stability of the prediction to random changes of the structures scores. 
15:4516:15 
Fully Unsupervised Ranking and Ensemble Learning, or How to Make Good Decisions When you Know Nothing,
Abstract: In various decision making problems, one is given the advice or predictions of several experts of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting where classifier accuracy can be assessed using available labeled training or validation data, and raises several questions: Given only the predictions of several classifiers of unknown accuracies, over a large set of unlabeled test data, is it possible to a) reliably rank them, and b) construct a metaclassifier more accurate than any individual classifier in the ensemble?
In this talk we'll show that under standard independence assumptions on classifier errors, this high dimensional data hides a simple low dimensional structure. We then present a spectral approach to address the above questions, and derive a new unsupervised spectral metalearner (SML). We illustrate the competitive advantage of our approach on both simulated and real data, showing its robustness even in practical cases where some of the model assumptions are not precisely satisfied. Joint work with Fabio Parisi, Francesco Strino and Yuval Kluger (Yale) and with Ariel Jaffe (WIS). 
16:15  16:30 
Closing Remarks, 
16:3017:30 