Advancing AI with Project Debater

Since 2014, the IBM Project Debater team has released more than 30 technical papers and benchmark datasets across multiple research domains, with several more publications pending. The following resources highlight the scientific advances that drive many of Project Debater’s capabilities. Download the Project Debater datasets here.

Have a question for the Project Debater team? Click here.

plus and minus symbols

Argument Mining

Claims and evidence are the main components of an argument; identifying and using them correctly are essential to framing an argument in a debate. The IBM Project Debater team has invested substantial effort in developing machine learning techniques to mine massive corpora for claims and evidence and use them to generate arguments relevant to a controversial topic.

Detecting claims in relevant documents

We were the first to define and implement the challenging task of detecting topic-related claims within unstructured text. Our method automatically pinpoints relevant claims within a set of documents that can be used to support or contest a given controversial topic. We accomplish this using a cascade of AI algorithms exploiting various linguistic features.

Context Dependent Claim Detection

Ran Levy et al.

COLING, 2014

Detecting evidence in relevant documents

We were also the first to define relevant-evidence detection as a task and to develop methods that accomplish it. Given a controversial topic and a claim, our method finds text segments in unstructured text from relevant documents that can serve as evidence supporting the claim. Our approach classifies three common evidence types∶ study, expert, and anecdotal.

Show Me Your Evidence - An Automatic Method for Context Dependent Evidence Detection

Ruty Rinott et al.

EMNLP, 2015

Negating claims

We developed an approach to automatically generate a meaningful negation to a given claim about a controversial topic. The algorithm has two parts∶ a rule-based approach to determine what constitutes an effective negation, then a statistical approach to determine when an automatically generated negation can plausibly be used.

Automatic Claim Negation∶ Why, How and When

Yonatan Bilu et al.

2nd Argument Mining Workshop, NAACL, 2015

Synthesizing novel claims

It is one thing to detect claims included within relevant documents, and quite another to generate claims “de novo.” We developed a method to do this by “recycling” existing arguments. Fundamental text elements extracted from a database of argumentative text are combined to construct claims that are grammatically correct, meaningful, and relevant.

Claim Synthesis via Predicate Recycling

Yonatan Bilu and Noam Slonim

ACL, 2016

Detecting claims throughout a corpus

We were the first to expand claim detection methods beyond preselected relevant documents by developing a framework for unsupervised, corpus-wide claim detection. Our system can pinpoint claims in a huge corpus relying solely on linguistic cues that are inherent to natural language, eliminating the need for costly and time-consuming human annotation.

Unsupervised Corpus-Wide Claim Detection

Ran Levy et al.

4th Argument Mining Workshop, EMNLP, 2017

Improving corpus-wide claim detection

We are exploring how to use corpus-wide claim detection to develop an argumentative content search engine. We have obtained high-quality results using DNNs trained via weak supervision with automatically labeled data and no human intervention.

Towards an Argumentative Content Search Engine Using Weak Supervision

Ran Levy et al.

COLING, 2018 (to appear)

Assessing argumentation quality

With academic collaborators, we are researching ways to assess the quality of machine-generated arguments. We used existing theories and approaches to derive a systematic taxonomy for computational argumentation quality assessment. We also showed that quality assessments based on theory versus practice generally agree and support one another.

Computational Argumentation Quality Assessment in Natural Language

Henning Wachsmuth et al.

EACL, 2017

Argumentation Quality Assessment∶ Theory vs. Practice

Henning Wachsmuth et al.

ACL, 2017

Relating arguments across texts

To take full advantage of corpus-wide argumentation mining, a system needs to combine argument units from different texts. We designed a joint inference method for this task by modeling argument relation classification and stance classification jointly. To our knowledge, this is the first time joint inference has been used in this context.

Argument Relation Classification Using a Joint Inference Model

Yufang Hou and Charles Jochim

4th Argument Mining Workshop, EMNLP, 2017

Stance Classification and Sentiment Analysis

An automatic debating system must be able to identify whether an argument supports or contests a given topic. This is fairly easy for humans but difficult for machines, as it requires great sensitivity to the rich subtleties and nuances of natural language. We have made important progress in this intriguing line of research.

Determining expert opinion stance

Expert opinion is important evidence in constructing arguments, but its stance often hard to be determined from the text itself. We developed an innovative approach to this problem. By mining knowledge from Wikipedia with minimal human supervision, we developed a resource of over 100,000 experts and their stance toward over 100 controversial topics.

Expert Stance Graphs for Computational Argumentation

Orith Toledo-Ronen et al.

3rd Argument Mining Workshop, ACL, 2016

Determining claim stance

We designed a method to determine whether a given claim supports or contests a new controversial topic. Our model breaks down the complex cognitive process of determining stance into a sequence of simpler sub-tasks. We identified effective AI solutions to these sub-tasks, which can combine to predict claim stance with high precision.

Stance Classification of Context-Dependent Claims

Roy Bar-Haim et al.

EACL, 2017

Improving claim stance classification

To improve claim stance classification, we developed a classifier that predicts the sentiment of a given word based on its context. This overcomes the limitations of manually composed sentiment lexicons. We also identified contextual features that can improve sentiment classification and enable classification of claims with no explicit sentiment.

Improving Claim Stance Classification with Lexical Knowledge Expansion and Context Utilization

Roy Bar-Haim et al.

4th Argument Mining Workshop, EMNLP, 2017

Classifying sentiment of phrases

We designed a novel method for predicting the sentiment of a phrase based on its constituents. Using only the sentiment of individual words, our algorithm correctly handles complex phenomena such as sentiment reversal and mixed sentiment.

Learning Sentiment Composition from Sentiment Lexicons

Orith Toledo-Ronen et al.

COLING, 2018 (to appear)

Classifying sentiment of idioms

Claims and evidence often include idiomatic expressions, and a debating system must be able to analyze them to properly classify their stance. Because the sentiment of idiomatic expressions often cannot be deduced from their constituent words, we developed a sentiment lexicon of 5,000 common idiomatic expressions to improve sentiment analysis.

SLIDE - A Sentiment Lexicon of Common Idioms

Charles Jochim et al.

LREC, 2018

Deep Neural Nets (DNNs) and Weak Supervision

DNNs hold immense potential for improving automatic understanding of language, but training them is notoriously known to require a lot of high quality, manually labeled data. We developed tools and methods to train DNNs using weak supervision, alleviating that bottleneck. We also used DNNs in developing Project Debater’s speaking and listening skills.

Scoring arguments

A debating system needs to score claims and evidence with respect to the topic of debate. We evaluated 19 different DNN-based methods of scoring arguments to help identify the best deep learning architecture for this task.

An Empirical Evaluation of Various Deep Learning Architectures for Bi-Sequence Classification Tasks

Anirban Laha and Vikas Raykar

COLING, 2016

Understanding Automatic Speech Recognition (ASR) output

A debating system needs to understand arguments made by its opponent, which it receives as ASR transcripts. To do this, it must properly parse the ASR output into sentences by adding punctuation. We exploited DNNs to achieve this task.

Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

Vardaan Pahuja et al.

Interspeech, 2017

Predicting phrase breaks

Phrase breaks are essential to delivering long sentences in continuous speech, as a debating system must do. We developed a novel DNN model for predicting phrase breaks and a new training process using phonetically aligned speech data and a weakly labeled large text corpus. This makes Project Debater’s speech intelligible, natural, and expressive.

Weakly Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels

Asaf Rendel et al.

Interspeech, 2017

Emphasizing words and phrases

We developed DNN-based models to enable controllable word-level emphasis and sentence-level emphasis in expressive TTS systems. Both models preserve quality and naturalness of the baseline TTS output while significantly improving the perceived emphasis.

Emphatic Speech Prosody Prediction with Deep LSTM Networks

Slava Shechtman and Moran Mordechay

ICASSP, 2018 (to appear)

Improving speech patterns

We built an expressive TTS system, based on DNNs, with one module that predicts which words to emphasize in a text and another that generates speech patterns based on the predictions. The prediction module outperforms methods with hand-crafted features, and the overall system is perceived as more expressive via crowd-sourced listening tests.

Word Emphasis Prediction for Expressive Text to Speech

Yosi Mass et al.

(under review)

Identifying similar sentences

To train a DNN to predict thematic similarities between sentences, we automatically created a weakly labeled dataset of sentence triplets (a pivot sentence from a Wikipedia, another sentence from the same section of the article, and a third sentence from a different section of the article). Our model outperformed state-of-the-art methods.

Learning Thematic Similarity Metric from Article Sections Using Triplet Networks

Liat Ein-Dor et al.

ACL, 2018 (to appear)

Improving argument mining

We developed a method to improve the performance of DNNs in argument mining by blending a small amount of high-quality, manually labeled data with a large amount of lower-quality, automatically labeled (weakly supervised) data.

Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining Labels

Eyal Shnarch et al.

ACL, 2018 (to appear)

Searching for claims throughout a corpus

Searching for sentences containing claims in a large text corpus is a key component in developing an argumentative content search engine. We used DNNs trained via weak supervision (i.e., with automatically labeled data) to obtain high-quality results with no human intervention.

Towards an Argumentative Content Search Engine Using Weak Supervision

Ran Levy et al.

COLING, 2018 (to appear)

Determining concept abstractness

We used a DNN with weak supervision to determine the level of abstractness embodied within a given concept. Understanding whether the topic of the debate is abstract, as in ‘freedom of speech’, or concrete as in ‘zoo’, can guide the Debater system in developing more relevant arguments.

Learning Concept Abstractness Using Weak Supervision

Rabinovich et al.

(under review)

Algorithms for Natural Language Processing (NLP)

NLP describes the automatic understanding, interpretation, and manipulation of human language (such as speech and text) by computers. NLP is a key factor in interactions between humans and machines and the IBM Project Debater team is naturally active in NLP research.

Identifying related documents

We developed a novel method for measuring semantic relatedness between two documents. Documents are represented as compact concept graphs, where nodes represent concepts extracted from the document, and edges represent relationships between concepts. The similarity of two concept graphs reflects the semantic similarity of the documents.

Semantic Documents Relatedness Using Concept Graph Representation

Yuan Ni et al.

WSDM, 2016

Detecting argumentative structures

We devised a method to automatically extract rich linguistic features of text and then recognize these patterns in new texts. The algorithm, GrASP (GReedy Augmented Sequential Patterns), helps computers to address the challenge of variability in natural language.

GrASP∶ Rich Patterns for Argumentation Mining

Eyal Shnarch et al.

EMNLP, 2017

Understanding Automatic Speech Recognition (ASR) output

A debating system needs to understand arguments made by its opponent, which it receives as ASR transcripts. To do this, it must properly parse the ASR output into sentences by adding punctuation. We exploited DNNs to achieve this task.

Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

Vardaan Pahuja et al.

Interspeech, 2017

Identifying similar sentences

To train a DNN to predict thematic similarity between sentences, we automatically created a weakly labeled dataset of sentence triplets (a pivot sentence from a Wikipedia, another sentence from the same section of the article, and a third sentence from a different section of the article). Our model, trained over these data, outperformed state-of-the-art methods.

Learning Thematic Similarity Metric from Article Sections Using Triplet Networks

Liat Ein-Dor et al.

ACL, 2018 (to appear)

Text-to-Speech (TTS) Systems

Unlike a personal assistant or navigator, a debating system needs to speak continuously and persuasively for a few minutes on a subject not known in advance, while keeping the audience engaged. We developed new TTS algorithms and techniques to give Project Debater a clear, fluent, and persuasive voice.

Predicting phrase breaks

Phrase breaks are essential to delivering long sentences in continuous speech. We developed a novel DNN model for predicting where a phrase break or pause is needed and a new training process using phonetically aligned speech data and a weakly labeled large text corpus. This makes Project Debater’s speech intelligible, natural, and expressive.

Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels

Asaf Rendel et al.

Interspeech, 2017

Emphasizing words and phrases

We developed DNN-based models to enable controllable word-level emphasis and sentence-level emphasis in expressive TTS systems. Both models preserve quality and naturalness of the baseline TTS output while significantly improving the perceived emphasis.

Emphatic Speech Prosody Prediction with Deep LSTM Networks

Slava Shechtman and Moran Mordechay

ICASSP, 2018 (to appear)

Improving speech patterns

We built an expressive TTS system, based on DNNs, with one module that predicts which words to emphasize in a text and another that generates speech patterns based on the predictions. The prediction module outperforms methods with hand-crafted features, and the overall system is perceived as more expressive via crowd-sourced listening tests.

Word Emphasis Prediction for Expressive Text to Speech

Yosi Mass et al.

Interspeech 2018 (to appear)

Benchmark Datasets

Datasets are the engines of progress in machine learning. Any new task to be automated requires an appropriate dataset. The IBM Project Debater team has released multiple novel datasets relevant to tasks in computational argumentation and NLP, which been downloaded nearly 1,000 times to date. Download the Project Debater datasets here.

Annotated argument elements

A major obstacle in developing automatic argumentation mining techniques was the scarcity of relevant high-quality annotated data. We developed the first dataset to address this need. It includes 2,683 argument elements, collected in the context of 33 controversial topics, organized under a simple claim-evidence structure.

A Benchmark Dataset for Automatic Detection of Claims and Evidence in the Context of Controversial Topics

Ehud Aharoni et al.

1st Argument Mining Workshop, ACL, 2014

Multi-word term relatedness

Many modern NLP systems rely on the ability to predict semantic relatedness of two words. But natural language often involves multi-word expressions, whose meaning cannot be inferred from their constituents. We developed a first-of-its-kind dataset of nearly 10,000 pairs of words and multi-word expressions, thoroughly evaluated for semantic relatedness by human annotators.

TR9856∶ A Multi-word Term Relatedness Benchmark

Ran Levy et al.

ACL, 2015

Wikification

An important NLP capability, known as entity linking or Wikification, is the ability to automatically identify Wikipedia concepts mentioned in free text. We developed a large, high-quality Wikification dataset, covering named and general entities, in both written and spoken data.

What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text

Yosi Mass et al.

arXiv, 2018

Concept relatedness

Determining the relatedness of concepts is useful in a variety of NLP tasks. We introduced a new concept relatedness dataset composed of nearly 20,000 pairs of Wikipedia terms. We also developed a tool for predicting concept relatedness that outperformed state-of-the-art methods.

Semantic Relatedness of Wikipedia Concepts - Benchmark Data and a Working Solution

Liat Ein-Dor et al.

LREC, 2018

Sentiment of idiomatic expressions

Claims and evidence often include idiomatic expressions, and a debating system must be able to analyze them to properly classify their stance. Because the sentiment of idiomatic expressions often cannot be predicted from their constituent words, we developed a sentiment lexicon of 5,000 common idiomatic expressions to improve sentiment analysis.

SLIDE - A Sentiment Lexicon of Common Idioms

Charles Jochim et al.

LREC, 2018

Debate speeches

We released a first-of-its-kind dataset of real-life debate speeches to enhance research in computational argumentation by expanding resources beyond written text. The dataset allows developing and testing algorithms on audio, automatic transcripts, and/or manual transcripts of spoken argumentative content.

A Recorded Debating Dataset

Shachar Mirkin et al.

LREC, 2018

Argumentative content for listening comprehension

We created a new, rich labeled dataset for the task of machine listening comprehension in the argumentation domain. The dataset contains 200 recorded speeches by expert human debaters on 50 controversial topics, along with more than 800 textual arguments, and a mapping specifying which arguments were used in each speech.

Listening Comprehension over Argumentative Content

Shachar Mirkin et al.

(under review)

Workshops and Tutorials

In addition to publishing our work and sharing benchmark datasets, the IBM Project Debater team participates actively in the research community. We are co-leaders of the following seminars in computational argumentation and debating technologies.

Upcoming∶ 5th Workshop on Argument Mining

The 5th Workshop on Argument Mining will take place during EMNLP 2018.

Learn more

Dagstuhl Seminar on Debating Technologies

During this one-week workshop in Dagstuhl, Germany, in 2015, the term “Computational Argumentation” was coined to denote this rapidly evolving field of research.

Learn more

NLP Approaches to Computational Argumentation

A tutorial delivered during ACL 2016.

Learn more

The
debate
is just
beginning.

Have a question for the Project Debater team? Click here.

Follow us @IBMResearch