Skip to main content

IBM Leadership Seminars


Information Retrieval/Search Technologies Seminar 2008
December 16, 2008
Organized by IBM Haifa Research Lab

Abstracts

The Social Media Opportunity for Multimedia Search
Mor Naaman, Rutgers University

Community-contributed collections of media on the web are a becoming a vast, rich resource for image and video on a long-tailed array of topics. These multimedia resources present a new opportunity for multimedia search and retrieval -- but also pose new challenges. I will describe some initial exploration into turning social media content into a data source for image search. Using a combination of context- and content-based tools, we generate representative sets of images for location-driven features and landmarks, a common search task. To do that, we use location and other metadata, as well as tags associated with images, and the images' visual features. Finally, I show how a similar approach guides our effort in providing access to a different type multimedia content: live music concert videos.


Automatic Personalization Using Implicit Social Graph Structure
Emil Ismalon, Collarity

As much as there is apparent benefit in utilizing the social graph to improve relevancy of search results and recommendations, we view the social structure as an approximation to an optimal structure that maximizes information flow. We developed a method for constructing optimized graph structure based on profiling user's interaction. In a recently conducted research we show significant improvement in all chosen relevancy metrics over some of the most popular search engine We believe that our approach can result in a fundamental change in the way people navigate and share information in the web.


Public vs. Private - Comparing Public Social Network Information with Email
Ido Guy, IBM Haifa Research Lab

The goal of this research is to facilitate the design of systems which will mine and use sociocentric social networks without infringing privacy. We describe an extensive experiment we conducted within our organization comparing social network information gathered from various intranet public sources with social network information gathered from a private source - the organizational email system. We also report the conclusions of a series of interviews we conducted based on our experiment. The results shed light on the richness of public social network information, its characteristics, and added value over email network information.


A Social Aspect of Person Name Disambiguation in the Web
Ron Bekkerman, HP Labs

When you query a search engine for "Michael Jordan", you wish to obtain information about a particular person you have in mind, and not about his namesakes. Even if you provide additional information, such as "Michael Jordan, Professor", you will find a CS Professor, an English Professor, and an Orthopedic Surgeon. We notice that the problem can be successfully solved by seeking a few people who belong to the same social network: indeed, two namesake Michael Jordans are unlikely to both have an acquaintance named Tom Mitchell. Given search results on a few people from one social network, our goal is to construct a set of web pages ("the core") that refer to the people of our interest, rather than to their unrelated namesakes.

The problem can be tackled from two different perspectives: either mining the content of the web pages, or analyzing their link structure and determining their proximities in the web graph. First, we cluster the Web pages and use a heuristic to choose one cluster that contains the pages of our interest. We also propose a simple link structure method that matches hyperlinks of the original web pages. To establish a strong baseline, we use a hybrid of those two methods. Next, we expand our hyperlink matching method by performing heuristic search in the web graph. Finally, we propose a better content mining method, called One-Class Co-Clustering, which - combined with our link structure analysis method - obtains the state-of-the-art results.

Based on WWW-05, IJCAI-07, and EMNLP-08 papers. Joint work with Andrew McCallum, Shlomo Zilberstein, James Allan, and Koby Crammer.


Textual Entailment as a Framework for Semantic Information Access
Ido Dagan, Bar-Ilan university

Semantic information access has been a quite vague, yet desperately sought after, capability for a long time. Possibly the primary obstacle has been the lack of a clear definition of what "semantic" processing of textual information means, which leads to scattered and unfocused developments of a myriad of seemingly relevant technologies. Textual Entailment is a recent paradigm we proposed to capture generic computational needs for text understanding. In this talk we will motivate these needs, present the textual entailment definition, and show how it captures semantic inferences required by information access applications. The talk will review the general components we are developing for a textual entailment "engine", addressing the major aspects of semantic variability and ambiguity, and suggest how the engine can be embedded under a common interface to provide semantic processing for applications.


How to Get a Good Answer Online?
Daphne Raban, School of Management and the Center for the Study of the Information Society, University of Haifa

One way to search for information online is to seek the help of others by posting a question in a question and answer (Q&A) web site. Q&A sites are places where users ask questions and others, experts or just anyone, may answer them. In a series of studies I describe several aspects of research trying to find an answer to the question: How to get a good answer online?

The first study investigates predictors of answer quality through a comparative, controlled field study of responses provided across several online Q&A sites. The main conclusions were that you get what you pay for in Q&A sites and that a community of users contributes to its success. The second study explains the incentive structure in a mixed economic and social market for information showing that market is catalyzed by social activity, not cannibalized by it, as may have been predicted by theory. The third study suggests that only implicit expressions of self presentation are related to the provision of social and monetary feedback, ratings, and tips. The overall insight from all three studies is that social activity and personal behaviors are strongly related to economic activity and gains. Through this research the term 'social capital' assumes a literal meaning.


Keynote: The Future of Information Discovery: Designing to Support Social Media and Exploratory Search
Ben Shneiderman, University of Maryland

Current search strategies for web sites and databases deliver great service to users with basic search needs. Innovative designers have turned to supporting exploratory search processes that lead to important discoveries over weeks and months. They are developing collaborative search and social networking strategies to harness the collective intelligence of domain experts. Advanced visualization provides overviews that support sensemaking, while advanced interfaces enable systematic, yet flexible, exploration.


Searching the Social Web - the Challenges of Socially-Connected Search
Ofer Egozi, Delver

The vast amounts of information encoded in the social graph may hold the key to the next major leap in web search relevance. A search engine that takes into account the social context of the searcher and the content generated in that context, can yield results that are significantly more relevant and meaningful to that specific user. Transforming this social information into a ranking algorithm is a challenging task, and demands break some common basic assumptions in search engineering. I will introduce the general concepts of the social graph and approaches to social search, and present the main algorithmic and engineering challenges in building a socially-connected search engine at Delver.


Social Search and Discovery using a Unified Approach
Sivan Yogev/Nadav Har'El, IBM Haifa Research Lab

This research explores new ways for augmenting search and discovery of relations between Web 2.0 entities using multiple types and sources of social information. Our goal is to allow searching for all object types such as documents, persons and tags, while also retrieving related objects of all types. To realize this goal, we implemented a social-search engine using a unified approach. In this approach, the search space is expanded to represent heterogeneous information objects that are interrelated by several relation types. We address a novel solution based on multifaceted search, which provides an efficient update mechanism for relations between objects, as well as efficient search over the heterogeneous data. We describe a social search engine positioned within a large enterprise, applied over social data gathered from several Web 2.0 applications. We conducted a large user study with over 600 people to evaluate the contribution of social data for search. Our results demonstrate the high precision of social search results and confirm the strong relationship of users and tags to the topics being retrieved.