
Computer Vision
IBM Research is a leading player in the quest to give AI systems sight. We’re enabling Watson, IBM's AI platform, to interpret visual content as easily as it does text.

About us
The field of computer vision has been transformed by the introduction of deep learning. State-of-the-art computer vision systems can now achieve superhuman accuracy and speed for certain tasks in image recognition and analysis. But these systems are still far from truly understanding what they see and making intelligent use of visual data. We aim to advance computer vision analysis from static scenes and images toward dynamic scenes and to integrate audio-visual perception, eventually enabling these systems to understand video input.
CVPR 2020
IBM Research AI is expanding AI’s Field of Computer Vision at CVPR 2020
Featured work

Auto Curation of Sports Highlights
Coach Advisor uses IBM and Red Hat hybrid cloud capabilities to put AI and analytics directly in the hands of USTA coaches to help drive a new level of insight into tennis player performance.

Visual Learning with Limited Labeled Data
IBM Research provides a set of examples and methods designed to help with learning visual models from limited labeled data to lower costs and speed the time to a proof of concept.

Neurosymbolic AI
Combines the power of neural networks with symbolic methods to help AI reason more effectively.

SpotTune: Transfer Learning through Adaptive Fine-Tuning
IBM Research, in collaboration with University of California, San Diego and University of Texas at Austin, created a novel adaptive fine-tuning method called SpotTune that automatically decides which layers of a model should be frozen or fine-tuned.

Workshop on Multi-modal Video Analysis and Moments in Time Challenge
The Workshop on Multi-modal Video Analysis and Moments in Time Challenge at ICCV 2019 aims to particularly focus on modeling, understanding, and leveraging the multi-modal nature of video.

Fashion Interactive Queries demo and challenge
IBM Research proposes a new natural language-based system for interactive, fine-grained image retrieval. This proposal is a framework of an image retrieval system which learns to seek natural language feedback from the user and iteratively refines the retrieval result.

Skin lesion analysis towards melanoma detection
Recently, we’ve been developing techniques in computer vision that could one day enable clinical staff to use pictures to help them screen for disease. Our vision is that taking pictures to diagnose melanoma might one day be as routine as drawing blood to detect other diseases.
Publications
The IBM Research AI Computer Vision team aims to advance computer vision analysis from static scenes and images toward dynamic scenes and to integrate audio-visual perception, eventually enabling these systems to understand video input. Our research has been recognized at major conferences such as CVPR, NeurIPS, and ICLR.
Please explore all of our computer vision research papers
TITLE | RESEARCH AREA | VENUE | ACCESS |
---|---|---|---|
Video Instance Segmentation Tracking | CVPR (2020) |
|
|
Leveraging 2D Data to Learn Textured 3D Mesh Generation | CVPR (2020) | ||
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation | CVPR (2020) | ||
Non-Adversarial Video Synthesis with Learned Priors | CVPR (2020) | ||
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning | CVPR (2020) | ||
Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining | CVPR (2020) |
|
|
Music Gesture for Visual Sound Separation | CVPR (2020) | ||
Dense Regression Network for Video Grounding | CVPR (2020) | ||
Bottom-up Higher-Resolution Networks for Multi-Person Pose Estimation | CVPR (2020) | ||
Towards Verifying Robustness of Neural Networks against Semantic Perturbations | CVPR (2020) | ||
Adversarial Robustness: From Self-Supervised Pretraining to Fine-Tuning | CVPR (2020) | ||
Improving the affordability of robustness training for DNNs | CVPR Workshop on Adversarial Machine Learning in Computer Vision (2020) | ||
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation | CVPR Workshop on Visual Learning with Limited Labels (2020) | ||
Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors | CVPR Workshop on Continual Learning in Computer Vision (2020) | ||
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification | CVPR Workshop on Visual Learning with Limited Labels (2020) | ||
MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification | CVPR Workshop on Visual Learning with Limited Labels (2020) | ||
TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification | CVPR Workshop on Visual Learning with Limited Labels (2020) | ||
DBA: Distributed Backdoor Attacks against Federated Learning | ICLR (2020) | ||
CLEVRER: CoLlision Events for Video REpresentation and Reasoning | ICLR (2020) | ||
Once for All: Train One Network and Specialize it for Efficient Deployment | ICLR (2020) | ||
DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning | ICLR (2020) | ||
Sign-OPT: A Query-Efficient Hard-label Adversarial Attack | ICLR (2020) | ||
Federated Learning with Matched Averaging | ICLR (2020) | ||
Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness | ICLR (2020) | ||
AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos | ICCV (2019) | ||
Face Alignment With Kernel Density Deep Neural Network | ICCV (2019) | ||
Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition | ICCV (2019) | ||
Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine Grained Action Detection | ICCV (2019) | ||
Seeing What a GAN Cannot Generate | ICCV (2019) | ||
Self-supervised Moving Vehicle Tracking with Stereo Sound | ICCV (2019) | ||
The Sound of Motions | ICCV (2019) | ||
Graph Convolutional Networks for Temporal Action Localization | ICCV (2019) | ||
TSM: Temporal Shift Module for Efficient Video Understanding | ICCV (2019) | ||
SPGNet: Semantic Prediction Guidance for Scene Parsing | ICCV (2019) | ||
Learning Implicit Generative Models by Matching Perceptual Features | ICCV (2019) | ||
On the Design of Black-box Adversarial Examples by Leveraging Gradient-free Optimization and Operator Splitting Method | ICCV (2019) | ||
Reasoning about Human-Object Interactions through Dual Attention Networks | ICCV (2019) | ||
Adversarial Robustness vs Model Compression, or Both? | ICCV (2019) | ||
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models | Neurips (2019) | ||
More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation | Neurips (2019) | ||
Cross-channel Communication Networks | Neurips (2019) | ||
LaSO: Label-Set Operations Networks for Multi-label Few-shot Learning | Visual learning with limited labeled data | CVPR (2019) | |
SpotTune: Transfer Learning Through Adaptive Fine-Tuning | Visual learning with limited labeled data | CVPR (2019) | |
RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection | Visual learning with limited labeled data | CVPR (2019) | |
Transferable AutoML by Model Sharing Over Grouped Datasets | Visual learning with limited labeled data | CVPR (2019) | |
Adversarial Semantic Alignment for Improved Image Captions | Vision and language | CVPR (2019) | |
Dialog-based Interactive Image Retrieval | Vision and Language | NeurIPS (2018) | |
Delta-Encoder: an Effective Sample Synthesis Method for Few-shot Object Recognition | Visual learning with limited labeled data | NeurIPS (2018) | |
Moments in Time dataset: one million videos for event understanding | Multimodal video understanding | TPAMI (2019) | |
Automatic Curation of Sports Highlights using Multimodal Excitement Features | Multimodal video understanding | TMM (2018) | |
Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) | Skin image analysis | ISBI (2018) |
Latest news and blog
Discover the latest news and research from the IBM Research AI Computer Vision Team.
IBM Research AI at CVPR 2020
Rogerio Feris | June 12, 2020
IBM Research AI at CVPR 2019
Rogerio Feris | June 14, 2019
SpotTune: Transfer Learning through Adaptive Fine-Tuning
Rogerio Feris | June 14, 2019
IBM Research AI Advancing, Trusting, and Scaling Learning at ICLR
John R. Smith | May 2, 2019
Dialog-Based Interactive Image Retrieval
Xiaoxiao Guo and Hui Wu | February 27, 2019
Efficient Adversarial Robustness Evaluation of AI Models with Limited Access
Pin-Yu Chen and Sijia Liu | January 30, 2019
Delta-Encoder: Synthesizing a Full Set of Samples From One Image
IBM Research AI | November 27, 2018
Restoring Balance in Machine Learning Datasets
IBM Research AI | October 10, 2018
Video Scene Detection Using Optimal Sequential Grouping
IBM Research AI | September 13, 2018
AI and Human Creativity Go Hand in Hand
IBM Research AI | October 19, 2018
MIT-IBM Watson AI Lab
We are a community of scientists at MIT and IBM Research. We conduct AI research and work with global organizations to bridge algorithms to impact for business and society.