Watson and Jeopardy!
Watson and Jeopardy!
What did IBM and Jeopardy! announce on 4/27?
IBM unveiled details of an advanced computing system that will be able to compete with humans at the game of Jeopardy!. Additionally, officials from Jeopardy! announced plans with IBM to produce a human vs. machine contest on the renowned quiz show.
What is Watson?
For nearly two years IBM scientists have been working on a highly advanced Question Answering (QA) system, codenamed "Watson." The scientists believe that the computing system will be able to understand complex questions and answer with enough precision, confidence, and speed to compete on Jeopardy!
What kind of technology is Watson based on?
Watson is an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open-domain question answering. At its core, Watson is built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring.
What is the difference between Watson and IBM's overall QA research?
Watson, besides being the name of the founder of IBM and the name of the IBM T.J. Watson Research Laboratory, is the name of the computer system that will play Jeopardy! against humans in the planned contest. IBM's DeepQA research project and question answering technology are being used to create the Watson computer system.
What does it take to win at Jeopardy!?
Jeopardy! is a game covering a broad range of topics, such as history, literature, politics, arts and entertainment, and science. Jeopardy! poses a grand challenge for a computing system due to its broad range of subject matter, the speed at which contestants must provide accurate responses, and because the clues given to contestants involve analyzing subtle meaning, irony, riddles, and other complexities in which humans excel and computers traditionally do not.
To win, it takes the deep analysis of large volumes of content to deliver high accuracy, confidence and speed. The best Jeopardy! players, according to our analysis, provide correct, precise responses more than 85% of the time. They also "know what they don't know" and choose not to answer questions when they are unsure, since there is a penalty for being wrong.
How Good is Watson?
Watson will not be able to answer every possible Jeopardy! question. However, IBM has established a promising approach and is developing a robust and fast implementation. We expect to reach competitive levels of human performance but have a ways to go and will be partnering with willing universities to push the envelope.
Is web search enough to win Jeopardy!?
A web search engine has access to an immense source of information and can quickly find relevant web pages given a small number of query terms. To play Jeopardy!, however, you must generate precise answers to the questions. A web search engine doesn't return precise answers; rather, it is designed to return a ranked list of web pages the user may be trying to find. To answer a Jeopardy! clue you must go beyond the search result page, dig into documents that are likely to contain the answer, fetch them, read them, and locate the precise and correct answer within them.
A challenge for even finding the relevant documents is the issue of choosing the right set of keywords to retrieve those documents in the first place. Many Jeopardy! clues contain information that is not critical for answering the clue but is provided for educational and/or entertainment purposes. Moreover, the clues may use terms different from those used in answer bearing documents. As a result, formulating an effective query that homes in on the relevant documents is a critical and non-trivial task.
Even if you think you've found an answer to the question, there's still the issue of confidence. Remember, you are penalized for giving a wrong answer, so you must have enough confidence in your answer to decide you even want to attempt answering the question. To obtain this confidence you need to read enough supporting text around the answer to convince yourself that you have found the correct answer. You may even want to see the answer justified by multiple sources, especially if the penalty for getting it wrong is significant.
And then there are the clues with answers that must be synthesized — the answer is a list of items or a logical combination of two or more items. These answers do not appear in any one place. Rather, you must synthesize them from independent sources to form your final answer.
On top of all this you must be ready with a confident answer in a matter of seconds after receiving the clue.
The bottom line is that the Jeopardy! Challenge poses a different kind of problem than what is solved by web search. It demands that the computer deeply analyze the question to figure out exactly what is being asked, deeply analyze the available content to extract precise answers, and quickly compute a reliable confidence in light of whatever supporting or refuting information it finds. IBM believes that an effective and general solution to this challenge can help drive the broader impact of automatic question answering in science and the enterprise.
What data is stored in Watson?
All of Watson's data will be self-contained. Watson will perform without a connection to the web or any external resource. The vast majority of Watson's data will be a wide variety of natural language text. Some structured (formal knowledgebase's) and semi-structured data (tagged text) is also included mostly to help interpret text and refine answers. Exactly which data will be used for competing on Jeopardy! will be revealed at a later date, but the specific content and how to analyze and manage it are part of the research agenda.
How can you find all these answers without being connected to the Internet?
Watson will not have enough data to answer every possible Jeopardy! question in its self-contained memory, nor can it possibly predict the questions it will get. In this sense it has the same limitations as do the best human contestants. The entire Watson computer system will be self-contained and on stage as are the human contestants – no external connections, no life-lines – what you see is what you get. The purpose of this technology showcase is to demonstrate the system's ability to deeply analyze the data it does have and to compute accurate confidences based on supporting or refuting natural language evidence. Think of it as if Watson has read a lot of books and in real time relates what it read to the question to find and support the right answers.
Will Watson be able to answer Audio/Visual clues?
While IBM is developing a host of advanced technologies for audio, image and video analysis, the first time around we are focusing squarely on the natural language understanding challenge. Future applications of the technology are planned to incorporate audio, image and video capabilities.
How can Watson handle the puns and wordplay that occur in Jeopardy?
By reading many, many texts Watson learns how language is used. That means it learns the context and associations of words, which allows it to deal with some of the wordplay we find in Jeopardy!. But what is special about Watson is that it is able to also produce a confidence in its answer. If that confidence is low, it realizes: maybe it doesn't understand the question--maybe there is a pun or something it's not getting. On the other hand, if the confidence is high, it knows it likely understood the question and stands a good chance of getting that question right.
When will the contest take place on Jeopardy! and who specifically will Watson play against?
The first step is for Watson to demonstrate its worthiness to play against champions by competing in a series of sparring matches starting sometime this year. The date and specific contestants for the final match have not been decided. When the date is set and the contestants are decided, Jeopardy! and IBM will make a public announcement.
On what kind of computer does this run?
To achieve the levels of accuracy, confidence, and speed required by the Jeopardy! Challenge, a massively parallel high performance computing platform, like BlueGene maybe used. The system can be scaled up or down depending on different application requirements.
Why should it surprise me that a computer might beat a human at Jeopardy!? Can't supercomputers do this today?
While computers can store and deliver a wealth of digital content created by humans, they are unable to operate over it in human terms. The quest for building a computer system that can do open-domain Question Answering is ultimately driven by a broader vision that sees computers operating more naturally in human terms rather than strictly computer terms. They should function in ways that understand complex information requirements, as people would express them, for example, in natural language questions or interactive human dialogs. Computers should deliver precise, meaningful responses, and synthesize, integrate, and rapidly reason over the breadth of human knowledge as it is most rapidly and naturally produced — in natural language text.
While competing at Jeopardy! is not the end-goal, it is a milestone in demonstrating a capability that no computer today exhibits — the ability to interact with humans in human terms over broad domains of knowledge.
Can other computer systems do this now? Why doesn't IBM take on another company's computer in this competition? It seems like that would be more challenging than taking on human contestants.
According to our analysis, two competing champion Jeopardy! players combined are able to answer 85%-90% of the questions over an incredibly broad domain of topics and accurately predict their own correctness with very high levels of confidence. And they do that in just seconds. IBM has not witnessed that level of performance by any other computer system to date. While other companies may be working on similar technologies, at this stage of the game the best human Jeopardy! players are the folks to beat.
Is this going to be like HAL in "2001: A Space Odyssey"?
Not exactly. The computer on Star Trek is a more appropriate comparison. The fictional computer system may be viewed as an interactive dialog agent that could answer questions and provide precise information on any topic. A primary goal for DeepQA is to greatly improve information seeking tasks over natural language content but ultimately, we would like to see the underlying technology help make computers more effective at communicating in human terms. Watson uses the DeepQA technology to push the envelope in natural language processing and automatic question answering. A powerful and fluent conversational agent, like the Star Trek computer, is a driving vision for this work.
How does Watson differ from Deep Blue?
Deep Blue demonstrated that computers can solve problems once thought the exclusive domain of human intelligence. The computer was tasked to evaluate a huge space of possible chess board configurations and use massive computing power to beat a grand master.
We are now at a similar juncture. This time it is about how vast quantities of digitally encoded unstructured information (e.g., natural language documents, corporate intranets, reference books, textbooks, technical reports, blogs, etc.) can be leveraged by computers to do what was once considered the exclusive domain of human intelligence: rapidly answer and rationalize open-domain natural language questions confidently, quickly and accurately.
What humans do naturally, computers find very difficult — deal with variety, ambiguity, subtlety, breadth and expressiveness of human language and meaning.
Watson faces a challenge that is entirely open-ended and defies the sort of well-bounded mathematical formulation that fits a game like Chess. Watson has to operate in the nearly limitless, ambiguous and highly contextual domain of human language and knowledge.
Watson is tasked to understand and answer human questions and to know when it does and doesn't know the answer — to assess its own knowledge and ability — something humans find relatively easy.
How does DeepQA compare with other work IBM has published in QA?
IBM Research has had small teams exploring Question Answering (QA) technologies for a number of years and as part of that has participated in the government's AQUAINT program and some of the annual TREC QA evaluations. This effort has led to many technical publications and has put IBM in a very good position to understand and advance the state-of-the-art in Question Answering.
However, for the Jeopardy! Challenge, the huge variety of question types and styles, the broad and varied range of topics, the demand for high degrees of confidence and speed required a whole new approach to the problem. The DeepQA approach, while informed by IBM's prior work in QA, took a dramatic turn along a variety of algorithmic and engineering dimensions.
How has this work been influenced by the current state-of-the-art in QA?
DeepQA, of course, is deeply influenced by prior published work in Question Answering. While many prior techniques were not likely to work for the Jeopardy! Challenge, and many that we thought were likely to work did not pan out, many other existing technologies such as deep parsing and information extraction represent a core staple in QA and are used in DeepQA as described in the published literature or with some modifications.
How does DeepQA's approach compare to purely knowledge-based approaches?
Classic knowledge-based AI approaches to Question Answering (QA) try to logically prove an answer is correct from a logical encoding of the question and all the domain knowledge required to answer it. Such approaches are stymied by two problems: the prohibitive time and manual effort required to acquire massive volumes of knowledge and formally encode it as logical formulas accessible to computer algorithms, and the difficulty of understanding natural language questions well enough to exploit such formal encodings if available. Consequently they tend to falter in terms of breadth, but when they succeed they are very precise.
Even as the availability of structured knowledge grows, it pales in comparison to the amount of knowledge needed to answer broad-domain questions that is available as natural language text (e.g., texts of all kinds, web documents, reference books, novels, plays, encyclopedias, dictionaries, thesauri, textbooks, technical reports, etc.). Techniques for dealing with huge amounts of natural language text, such as Information Retrieval, suffer from nearly the opposite problem in that they can always find documents or passages containing some keywords in common with the query but lack the precision, depth, and understanding necessary to deliver correct answers with accurate confidences.
The DeepQA hypothesis is that by complementing classic knowledge-based approaches with recent advances in NLP, Information Retrieval, and Machine Learning to interpret and reason over huge volumes of widely accessible naturally encoded knowledge (or "unstructured knowledge") we can build effective and adaptable open-domain QA systems. While they may not be able to formally prove an answer is correct in purely logical terms, they can build confidence based on a combination of reasoning methods that operate directly on a combination of the raw natural language, automatically extracted entities, relations and available structured and semi-structured knowledge available from for example the Semantic Web.
How would a human use this QA technology in a business setting?
Beyond Jeopardy!, the challenge expands to demonstrating how core open-domain QA technologies can be quickly and effectively adapted to different business applications. These applications will demand a deep understanding of users' questions and analysis of huge volumes of natural language, structured and semi-structured content to rapidly deliver and justify precise, succinct, high-confidence answers.
DeepQA technology provides humans with a powerful tool for their information gathering and decision support. A typical scenario is for the end user to enter their question in natural language form, much as if they were asking another person, and for the system to sift through vast amounts of potential evidence to return a ranked list of the most compelling, precise answers. These answers include summaries of their justifying or supporting evidence, allowing the user to quickly assess the evidence and select the correct answer.
Business applications include Customer Relationship Management, Regulatory Compliance, Contact Centers, Help Desks, Web Self-Service, Business Intelligence, etc.
How does QA technology compare to document search?
The key difference between QA technology and document search is that document search takes a keyword query and returns a list of documents, ranked in order of relevance to the query, while QA technology takes a question expressed in natural language, seeks to understand it in much greater detail, and returns a precise answer to the question.
Although the Jeopardy! Challenge requires that a QA system return a single best answer, other business applications of QA technology typically admit returning a list of top answers along with supporting evidence, allowing the end user to assist in the final step of selecting the best answer. IBM's DeepQA technology extends this paradigm further by introducing confidence as a key feature. By calculating a meaningful confidence in its answers, IBM helps provide the end user with a significant, and often missing, element of the result, giving the user a more complete picture about the information they are using for their decision making.
How is IBM working with Universities in the general field of Question Answering?
Early last year (2008), IBM and Carnegie Mellon University, along with several other universities doing research in this space, kicked-off the "The Open Advancement of Question Answering" (OAQA) initiative. This broad collaboration is intended to provide a foundation for effective collaboration among researchers to accelerate the science of automatic question answering. The initiative aims to develop common metrics, architectures, experimental methodologies, tools and driving challenge problems, like the Jeopardy! Challenge, to facilitate the collaborative and rapid advancement of the state-of-the-art in QA.
IBM intends to invite interested universities to collaborate on learning how to integrate their advanced QA component technologies and apply the key approaches in DeepQA (the technology underlying Watson) to different problems based on OAQA methodologies.
Does DeepQA use UIMA?
Yes. UIMA is a standard framework for building applications that perform deep analysis on unstructured content, including natural language text, speech, images and video. IBM contributed UIMA to open-source (see the Apache UIMA web site) to help facilitate and accelerate work in deep content analytics. UIMA is also now an OASIS standard. UIMA-AS implements UIMA on asynchronous messaging middleware. DeepQA and the Watson system uses UIMA-AS as its principal infrastructure for assembling, scaling-out and deploying all its analytic components.