Risk information extraction

Probability & dependence information

Risk modeling in healthcare is both ubiquitous and in its infancy. A significant proportion of medical research focuses on determining the factors that influence the incidence, severity and treatment of diseases, which is a form of risk identification. Those studies investigate the micro-level of risk modeling, i.e., the existence of dependences between a reduced set of variables. However, the macro-level of risk modeling, i.e., articulating how a large number of such risk factors interact together and also affect diseases and treatments, is not widespread, though essential for medical decision support.

Bayesian belief networks [1] are a convenient tool for such models and have long been advocated as a useful decision and risk modeling framework in medicine [2]. They are used in several decision-support systems [3, 4], yet the difficulty of building Bayesian networks from scratch has limited their widespread adoption [5]. In fact, [5] reports that the process of building the structure of a model for therapy selection for the treatment of cancer of the esophagus (about 40 variables altogether) required altogether about 300 man-hours. Our focus on automating the extraction and aggregation of relevant information seeks to address this practical challenge.

We have developed a system called MedicalRecap, which automatically extracts risk information from medical papers and then aggregates this knowledge into a Bayesian network. It is a web-based tool designed to extract risk information from PubMed1 and to facilitate its combination into a coherent risk model for decision support.

The tool is composed of three modules: information extraction, term clustering, and model building. The objective of the demonstration is to present a full workflow of the tool starting with query terms and ending with a fully parameterized risk model.

The target audience for the tool includes both general practitioners seeking to understand risk factors associated with specific diseases, symptoms, and medicines in order to better advise their patients, and medical researchers performing literature-based discovery, seeking to make sense of the qualitative and quantitative interactions mentioned in previous research.

To the best of our knowledge, MedicalRecap is a first of a kind cognitive system for assisting users in building decision support systems in the medical domain using information extracted from the literature.

This approach can be reused in other areas of risk analysis that have a large text corpus containing relevant dependencies.


Charles Jochim


[1] Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco: Morgan Kaufmann Publishers Inc,1988

[2] Pauker SG, Wong JB. The influence of influence diagrams in medicine. Decision Analysis 2005: 2: 238–244

[3] Hoffer E, Feldman M, Kim R, Famiglietti K, Barnett G. Dxplain: patterns of use of a mature expert system. AMIA 2005 Symposium Proceedings 2005: 321–325.

[4] Fuller G. Simulconsult: www.simulconsult.com. Journal of Neurology, Neurosurgery and Psychiatry. 2005:76 (10)

[5] van der Gaag LC, Renooij S, Witteman CLM, Aleman B. Taal BF. How to elicit many probabilities.In: Laskey KB and Prade H, eds., San Francisco:Morgan Kaufmann Publishers,1999:647–654

Information extraction tool leveraging natural language processing to extract expert medical knowledge from published medical literature.

From a user query, we extract
Risk info extraction
Risk: Breast cancer
Risk: Approach

Medical Recap screenshots (click to open or close)

Application areas

Smarter water
Smarter energy
Smarter healthcare
Smarter transportation
Smarter cities