images of diverse people

Computer Vision

IBM Research is a leading player in the quest to give AI systems sight. We’re enabling Watson, IBM's AI platform, to interpret visual content as easily as it does text.

pictogram of a computer vision eyeball

About us

The field of computer vision has been transformed by the introduction of deep learning. State-of-the-art computer vision systems can now achieve superhuman accuracy and speed for certain tasks in image recognition and analysis. But these systems are still far from truly understanding what they see and making intelligent use of visual data. We aim to advance computer vision analysis from static scenes and images toward dynamic scenes and to integrate audio-visual perception, eventually enabling these systems to understand video input.

CVPR 2020

IBM Research AI is expanding AI’s Field of Computer Vision at CVPR 2020

Featured work

 

Auto Curation of Sports Highlights 

Coach Advisor uses IBM and Red Hat hybrid cloud capabilities to put AI and analytics directly in the hands of USTA coaches to help drive a new level of insight into tennis player performance.

 

Visual Learning with Limited Labeled Data

IBM Research provides a set of examples and methods designed to help with learning visual models from limited labeled data to lower costs and speed the time to a proof of concept. 

 

Neurosymbolic AI 

Combines the power of neural networks with symbolic methods to help AI reason more effectively.

 

SpotTune: Transfer Learning through Adaptive Fine-Tuning

IBM Research, in collaboration with University of California, San Diego and University of Texas at Austin, created a novel adaptive fine-tuning method called SpotTune that automatically decides which layers of a model should be frozen or fine-tuned.

 

Workshop on Multi-modal Video Analysis and Moments in Time Challenge

The Workshop on Multi-modal Video Analysis and Moments in Time Challenge at ICCV 2019 aims to particularly focus on modeling, understanding, and leveraging the multi-modal nature of video.

 

Fashion Interactive Queries demo and challenge

IBM Research proposes a new natural language-based system for interactive, fine-grained image retrieval. This proposal is a framework of an image retrieval system which learns to seek natural language feedback from the user and iteratively refines the retrieval result.

 

Skin lesion analysis towards melanoma detection

Recently, we’ve been developing techniques in computer vision that could one day enable clinical staff to use pictures to help them screen for disease. Our vision is that taking pictures to diagnose melanoma might one day be as routine as drawing blood to detect other diseases.

Publications

The IBM Research AI Computer Vision team aims to advance computer vision analysis from static scenes and images toward dynamic scenes and to integrate audio-visual perception, eventually enabling these systems to understand video input. Our research has been recognized at major conferences such as CVPR, NeurIPS, and ICLR.

Please explore all of our computer vision research papers

All publications

TITLE RESEARCH AREA VENUE ACCESS
Video Instance Segmentation Tracking CVPR (2020)

 

Leveraging 2D Data to Learn Textured 3D Mesh Generation CVPR (2020)

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation CVPR (2020)

Non-Adversarial Video Synthesis with Learned Priors CVPR (2020)

Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning CVPR (2020)

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining CVPR (2020)

 

Music Gesture for Visual Sound Separation CVPR (2020)

Dense Regression Network for Video Grounding CVPR (2020)

Bottom-up Higher-Resolution Networks for Multi-Person Pose Estimation CVPR (2020)

Towards Verifying Robustness of Neural Networks against Semantic Perturbations CVPR (2020)

Adversarial Robustness: From Self-Supervised Pretraining to Fine-Tuning CVPR (2020)

Improving the affordability of robustness training for DNNs CVPR Workshop on Adversarial Machine Learning in Computer Vision (2020)

Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation CVPR Workshop on Visual Learning with Limited Labels (2020)

Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors CVPR Workshop on Continual Learning in Computer Vision (2020)

StarNet: towards weakly supervised few-shot detection and explainable few-shot classification CVPR Workshop on Visual Learning with Limited Labels (2020)

MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification CVPR Workshop on Visual Learning with Limited Labels (2020)

TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification CVPR Workshop on Visual Learning with Limited Labels (2020)

DBA: Distributed Backdoor Attacks against Federated Learning ICLR (2020)

CLEVRER: CoLlision Events for Video REpresentation and Reasoning ICLR (2020)

Once for All: Train One Network and Specialize it for Efficient Deployment ICLR (2020)

DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning ICLR (2020)

Sign-OPT: A Query-Efficient Hard-label Adversarial Attack ICLR (2020)

Federated Learning with Matched Averaging ICLR (2020)

Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness ICLR (2020)

AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos ICCV (2019)

Face Alignment With Kernel Density Deep Neural Network ICCV (2019)

Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition ICCV (2019)

Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine Grained Action Detection ICCV (2019)

Seeing What a GAN Cannot Generate ICCV (2019)

Self-supervised Moving Vehicle Tracking with Stereo Sound ICCV (2019)

The Sound of Motions ICCV (2019)

Graph Convolutional Networks for Temporal Action Localization ICCV (2019)

TSM: Temporal Shift Module for Efficient Video Understanding ICCV (2019)

SPGNet: Semantic Prediction Guidance for Scene Parsing ICCV (2019)

Learning Implicit Generative Models by Matching Perceptual Features ICCV (2019)

On the Design of Black-box Adversarial Examples by Leveraging Gradient-free Optimization and Operator Splitting Method ICCV (2019)

Reasoning about Human-Object Interactions through Dual Attention Networks ICCV (2019)

Adversarial Robustness vs Model Compression, or Both? ICCV (2019)

ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models Neurips (2019)

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation Neurips (2019)

Cross-channel Communication Networks Neurips (2019)

LaSO: Label-Set Operations Networks for Multi-label Few-shot Learning Visual learning with limited labeled data CVPR (2019)
SpotTune: Transfer Learning Through Adaptive Fine-Tuning Visual learning with limited labeled data CVPR (2019)
RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection Visual learning with limited labeled data CVPR (2019)
Transferable AutoML by Model Sharing Over Grouped Datasets Visual learning with limited labeled data CVPR (2019)
Adversarial Semantic Alignment for Improved Image Captions Vision and language CVPR (2019)
Dialog-based Interactive Image Retrieval Vision and Language NeurIPS (2018)
Delta-Encoder: an Effective Sample Synthesis Method for Few-shot Object Recognition Visual learning with limited labeled data NeurIPS (2018)
Moments in Time dataset: one million videos for event understanding Multimodal video understanding TPAMI (2019)
Automatic Curation of Sports Highlights using Multimodal Excitement Features Multimodal video understanding TMM (2018)
Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) Skin image analysis ISBI (2018)

MIT-IBM Watson AI Lab

We are a community of scientists at MIT and IBM Research. We conduct AI research and work with global organizations to bridge algorithms to impact for business and society.

Learn more

Learn more about IBM Research AI

Collaborate with us

Demos of AI tech

Blog