Skip to main content

10-100xfaster screening

AI-enriched Simulation: Understanding more about data with less computing.

*By reducing the number of simulations needed, AI-enriched simulation can speed up screening by factors of 10-100x.

AI-enriched Simulation accelerates the discovery process by using AI to identify the most promising simulations to run on a massive data set. Just as importantly, it determines the computing infrastructure best suited for the task—whether that’s a basic calculator or even, a Quantum computer.

See how we used AI-enhanced Simulation to discover a new molecule

How it works

AI Enriched Simulation helps researchers maximize efficiency and efficacy at multiple points in the discovery process.

There are two keys to accelerating a process: doing things right (efficiency) and doing the right thing (effectiveness). In the world of modelling and simulation, this is a complicated problem.

Simulation workflows are notoriously complex. There are a multitude of datapoints, methods, and systems to choose from. Choose wrong at any point, and your entire experiment could be flawed before it's begun, costing dollars and hours of computing.

AI-enriched Simulation helps researchers determine what to study, and how to go about doing it. It efficiently optimizes the candidate search process, so researchers can focus their work on a smaller, more promising pool of options. And, it determines the simulation method that will minimize time and computational load, while yielding effective results.

Over time, our AI acceleration engine learns how to automate and streamline the simulation workflows, so what once took researchers hours to repetitively program and execute, becomes a simpler and “lighter” process.

Figure A1.


Finding the right data set.

The more broadly you look, the more likely you’ll find what you’re looking for. AI enables us to sample a vaster set of options quickly and efficiently—giving researchers the flexibility to cover more ground in less time and at a lower cost.

Using Bayesian optimization, a mathematical strategy that leverages known data to make predictions around unknown data (thus decreasing processing time), our AI Acceleration engine efficiently determines the data-points with the highest probability of fulfilling the desired parameters.

Graphic that illustrates Bayesian optimization


Graphic that illustrates multi-fidelity optimization

Identifying the best method.

In addition to deciding which candidates to test, AI helps us find the best way to test them.

Our engine uses a process called multi-fidelity optimization to determine both what kind of test, and what type of a computer resource, is needed to measure the desired factors at the lowest cost.

Being able to automatically “pair” the right model and infrastructure gives researchers the flexibility to balance speed, cost, and fidelity before moving on to testing.

Figure A2.


Running the simulation.

With the candidates and method set, researchers are ready to run the simulation. OpenShift, IBM’s hybrid cloud engine, connects cloud systems and architectures from around the world to create a flexible, fully-functioning simulation platform.

Rather than intricately programming the simulation across different computers and databases over and over again, researchers can simply use the platform to deploy the tests whenever needed. This automation saves hours of programming.

Graphic that illustrates running a simulation on IBM’s OpenShift platform


Graphic that illustrates the process of memoization.

Streamlining the process.

Our AI-enriched simulation platform uses a productivity-boosted process called “memoization.” Whenever a user runs a workflow, the central 'brain' breaks it down and checks if these steps have ever been run before in previous workflows.

If there are repeats, the engine recycles (automates) the workflow—reusing the old result and shaving off computational time, energy, and money. Even with an entirely new workflow, AI can identify similar functions in its memory bank and give researchers the option to swap them in and streamline the workflow.

A more efficient way to get to impactful results. AI-enriched Simulation lightens the most computation-heavy points

AI-enriched Simulation for Discovery
Dr. Ed Pyzer-Knapp Ph.D.
Dr. Michael Johnston Ph.D.
AI-enhanced Simulation at work:

Project Photoresist

Narrowing the search from 5000+ PAG candidates to 10, in a tenth of the time.

We had thousands of PAGs to compare. AI-enriched Simulation critically accelerated our learning and sorting process.

While Deep Search provided us with a comprehensive catalogue of known PAG molecules, we were missing a vital layer of data: their material properties. As our goal was sustainability, we needed to examine the toxicity and performance of each of these candidates.

Without AI, this process would have been time and labor intensive. Chemists would need to synthesize all 5000+ PAG molecules in a lab to test and measure each. Even virtual simulation would require a dedicated team of computer scientists to design complex models to estimate each property, and then run each molecule through both models.

AI enhancement cut through this bottleneck by running “virtual experiments” on our data. Our scientists simply input the parameters: data set, time and budget limit, and desired fidelity of the results. Our AI acceleration engine took it from there. It used Bayesian optimization to determine key subgroups of the candidate base to test (rather than “brute-forcing” through the entire library).

And, with multi-fidelity optimization, it was able to identify the best model and computing infrastructure combination to meet the time, budget, and output parameters.

From there, our scientists could simply deploy the same virtual experiment on the desired candidates—no intensive coding or computing required. With this layer of information now added to our Deep Search data, we were able to use the properties of promising PAG candidates to train a future generative model.

Figure A3.

Missing information

We had a Deep Search database of most known PAGs, but we were missing information on many of their properties (not many source documents reported molecular properties).

Graphic with the labels “Extracted PAG families”, “Lambda Max”, “Biodegradability”, and “LD50”

Figure A4.

Graphic with the labels “Extracted PAG families”, “Lambda Max”, “Biodegradaibility”, and “LD50”

New data

AI-enhanced simulation accurately estimated the missing properties, filling in the knowledge gap without costly physical testing.

Our AI engine then used this augmented dataset to determine which PAG candidates were most likely to have the right ratio of attributes we were looking for, accelerating the candidate vetting process and identifying target properties.

Case Studies

Case study

Formulating novel polymers - We automated and accelerated the process of identifying properties of novel chemical formulations for speciality chemicals company Johnson Matthey.

Case study

Helping chemists run virtual experiments - We worked with Unilever to enable chemists to efficiently deploy computational models of liquid mixtures on high performance infrastructure.

Blog post

Case study

IBM Research reduced the time to market for the Power10 processor by reducing the number of signal integrity tests by 79%.

Case study

IBM Research reduced the number of simulations required to design a new industrial lubricant by 60%.

IBM Research identified the highest potency compounds in a library of over 20,000 molecules, speeding up the drug discovery process by 30–40x.


Breanndan O. Conchuir, Kirk Gardner, Kirk E. Jordan, David J. Bray, Richard L. Anderson, Michael A. Johnston, William C. Swope, Alex Harrison, Donald R. Sheehy, and Thomas J. Peters

Journal of Chemical Theory and Computation (2020)

Jason Klebes, Sophie Finnigan, David J. Bray, Richard L. Anderson, William C. Swope, Michael A. Johnston, and Breanndan O Conchuir

Journal of Chemical Theory and Computation (2020)

Clyde Fare, Lukas Turcania, and Edward O. Pyzer-Knapp

Physical Chemistry Chemical Physics (2020)

Changwon Suh, Clyde Fare, James A. Warren, and Edward O. Pyzer-Knapp

Annual Review of Materials Research (2020)

James L. McDonagh, William C. Swope, Richard L. Anderson, Michael A. Johnston, and David J. Bray

Polymer International (2020)

Discovery Workloads
on the Hybrid Cloud

Emerging discovery workflows are posing new challenges for compute, network, storage, and usability. IBM Research supports these new workflows by bringing together world-class physical infrastructure, a hybrid cloud platform that unifies computing, data, and the user experience, and full-stack intelligence for orchestrating discovery workflows across computing environments.