@ IBM Research

Visual and natural language comprehension are rapidly evolving areas of artificial intelligence (AI). A prime example is image captioning – the task of generating one or more natural language descriptions for an image, relying solely on the visual input – which demonstrates a machine’s comprehension of the visual content as well as its ability to describe that content in natural language. The image captioning task continues to be a very active area of research in academic and industrial research labs, including IBM Research.

IBM Research at CVPR 2017: Helping AI systems to 'see' with latest computer vision innovations

Read featured papers at CVPR 2017

Microsoft COCO Image Captioning Challenge

IBM Watson submitted its first entry to the Microsoft COCO Image Captioning Challenge, an ongoing competition since 2015, and is currently in the top spot on the leaderboard! The results obtained by the Watson entry on various evaluation metrics can be viewed on the codalab results page (row labeled “etiennem”) and also on the MSCOCO results page (Watson Multimodal entry under Table-C5 or Table-C40).

View results (Watson Multimodal entry under Table-C5 or Table-C40)

Read blog



IBM advances Watson’s image captioning accuracy


Microsoft COCO Image Captioning Challenge (Watson Multimodal on Table C-5)


Progress in AI, through collaborative research


Get started with the Watson Visual Recognition API

Explore career opportunities @ IBM Research