IBM Research @ Hartree Centre is seeking several outstanding PhD researchers to work as interns for three months at the Daresbury Research facility in collaboration with the Science and Technology Facility Council’s Hartree Centre.

IBM Research @ Hartree Centre

The successful candidates will join the IBM Research team at Daresbury Laboratory, which aims to have tangible business impact in the UK industry through cutting edge research in technologies and applications — especially by implementing next-generation High-Performance Computing, Big Data and Cognitive Solutions.

Each intern will join one of the four IBM Research groups:
· Chemistry
· Life Sciences
· Engineering
· Enabling Technologies.
Please see the team website for more information.


Resident status

These are paid internships, and candidates must be either UK/EU citizens or possess a non-student working Visa. The interns will be IBM employees during their tenure and work done during the internship cannot be reused as part of a PhD dissertation. Permission to intermit the intern’s PhD for three months must be forthcoming from the intern’s academic supervisor and academic institution or university.

The three-month internships will take place over three cycles each year in spring, summer and autumn.


How to apply

Please submit your application in the form of a CV and cover letter to, with the relevant application reference number(s) highlighted in the header.

Open positions

  • Structured representation and reasoning over unstructured text

    PI: Mohab Elkaref
    Group: Enabling Tech (Machine Learning)
    Internship Cycle: Summer (June to August 2020)
    Application Reference Number: ME022020ML

    Current research efforts in natural language processing focus on building language models based on large collections of unstructured text. This is then used as a starting point for performing a wide variety of other NLP tasks. Conversely, building structured representations from unstructured text is often done using comparatively smaller text corpuses and have seen limited application in other areas of NLP that are not concerned with building different structured representations.

    The aim of this internship is to explore possible ways at improving both syntactic and semantic representations of text as well as possible ways to incorporate them into downstream NLP tasks typically reserved for large-scale language models, such as question answering and summarisation.

    Desired skills:

    • Experience with natural language processing
    • Experience with deep learning frameworks

    Please submit your application in the form of a CV and cover letter to .

  • Geospatial data for public health: Exploring the interface between clinical, urban, environmental and socio-demographic datasets

    PI: Blair Edwards
    Group: Enabling Tech (Data Technologies)
    Internship Cycle: Summer (June to August 2020)
    Application Reference Number: BE022020DT

    The factors that influence people’s health and wellbeing are complex and varied. The study of personal and clinical factors is a well-established area, but the integration of a wide range of urban, environmental and socio-demographic factors at scale has not been realised. We are looking to collaborate with academic researchers to integrate data from geospatial sources (such as land use, air quality, distance to roads, green space, location of amenities, UK census) with clinical datasets, enabling the application of machine learning techniques to explore and determine the significance of different factors.

    During this project, we will explore potential datasets, store them in the IBM PAIRS Geoscope geospatial database, apply machine learning to discover different significant factors and (if time allows) develop a dashboard to display and explore the results.

    Desired skills:

    • Python — at least to a basic level, including pandas and numpy (required)
    • Geospatial data — some understanding of geospatial concepts
    • Machine learning — some knowledge of basic techniques and concepts

    Please submit your application in the form of a CV and cover letter to .

  • Using ML to improve energy efficiency of HPC systems

    PI: Robert Tracey
    Group: Enabling tech (HPC & Cloud)
    Internship Cycle: Summer (June to August 2020)
    Application Reference Number: RT022020HPC

    The project is looking at a holistic approach to energy and power management, which can be described as energy-aware scheduling (EAS). EAS uses performance and power consumption models and software hardware co-design for implementing various energy/power-aware scheduling policies at the node, job and cluster levels.

    The ideal candidate would be studying for a PhD in a mathematical, scientific, engineering or computing domain. Experience with HPC, programming in C/C++ and Python is a plus.

    Please submit your application in the form of a CV and cover letter to .

  • Biosurfactant modelling

    PI: James McDonagh
    Group: Chemistry (Modelling and simulation)
    Internship Cycle: Summer (June to August 2020)
    Application Reference Number: JM022020CHEM

    This internship will be focused on modelling biosurfactant molecules. Biosurfactants are an industrial growth area with a plethora of applications in pharmaceuticals, personal care products and environmental remediation. These molecules are typically composed of a hydrophilic sugar moiety and a hydrophobic hydrocarbon chain.

    This project will focus on the simulation of aggregation behaviours and property predictions of biosurfactants. We have existing work focused on sugar modelling, which we will be extending to apply to biosurfactants. Some software development is also likely to form part of the project. The ideal candidate will be a studying for a PhD in Chemistry, Physics, Biology or Computer science and have a background which includes molecular simulation. A working knowledge of Python and HPC would be an advantage. References

    1. E. Munusamy et al., Structural properties of nonionic monorhamnolipid aggregates in water studied by classical molecular dynamics simulations. The Journal of Physical Chemistry B, 121(23):5781–5793, 2017.

    2. Fakruddin Md, Biosurfactant: production and application. J. Pet. Environ. Biotechnol., 3(124):2, 2012.

    Please submit your application in the form of a CV and cover letter to .

  • Application of machine learning to genomics datasets to identify important biological features

    PI: Laura-Jayne Gardiner
    Group: Enabling Tech (Machine Learning)
    Internship Cycle: Spring (February to April 2020)
    Application Reference Number: LJG012020ML

    Life Sciences has seen an explosion in the generation of large data sets, specifically omics or DNA sequencing information (genomes, transcriptomes and epigenomes). As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret these large biological datasets, typically utilizing HPC resources. We process large genomics datasets bioinformatically and then apply machine learning to predict key biological features of importance e.g. genes with favourable or detrimental functions in plants, animals and humans.

    The candidate will therefore apply machine learning to real-life biological problems embedded in industrially relevant projects. The internship offers a working experience in a professional research organization. The ideal candidate would be studying a PhD in a computational, biological or mathematical domain. They will have an interest in bioinformatics and machine learning with experience in at least one of these areas. A working knowledge of Python is essential and experience in HPC would be an advantage.

    Please submit your application in the form of a CV and cover letter to .