DL-based Supervised Object Boundary Detection
Object boundary detection is a vital component of vision systems, performing tasks such as object recognition, image segmentation, and others. Segmenting objects correctly requires the ability to distinguish between semantic object boundaries and other 'uninteresting' edges. In this work, we train a ConvNet-based system to discriminate between the above two types of edges. We formulate this task as learning to map local image patch pixel values into probability maps of semantic boundaries.
The proposed deep neural network architecture is depicted in Fig. 1. The input to the first convolutional layer with 5x5 filters is a 36x36x3 feature map, comprising patches taken from the original RGB image. Average-pooling layers operate on neighborhoods of 3x3, with a stride of 2 pixels. Similarly, the second and third convolutional layers have 32 filters of 5x5. The fourth convolutional layer has 64 filters of 5x5. Both fully-connected layers have 256 neurons. The output of the second fully-connected layer is an input to the sigmoid layer. Finally, the output of the sigmoid layer is re-shaped into a 16x16 map of output probabilities.
Fig. 2 shows examples of application of the state-of-art and of the proposed method for the semantic boundary detection task on a BSDS500 dataset. Our method produces similar performance figures, but yields arguably better visual results.
Fig. 1. The regression-output network architecture: The input is an image patch of 36x36 pixels; the output is a 16x16 patch of scores [0…1] corresponding to probabilities of semantic object boundary to be present in each pixel.
Fig. 3. Examples of semantic boundary detection: (a) original image, (b) state-of-art method of P. Dollar, (c) proposed.