Below is the dataset used in the paper:
Human Instance Segmentation from Video using Detector-based Conditional Random Fields.
Vibhav Vineet, Jonathan Warrell, Lubor Ladicky, and Philip Torr
British Machine Vision Conference (BMVC), 2011
pdf
Dataset download:
Buffy.zip.
The dataset contains images selected from the TV series, "Buffy: the Vampire Slayer".
We select a set of 452 images from the first two episodes for training and 160 images from the next three episodes for test purposes.
We provide a pixel-level human/non-human segmentation on these images, with each human instance getting same label/color, generating
the ground truth images for training and testing purposes. Further, we select a set of 60 images from the training set and provide
pixel-level segmentation with each human instance receiving a distinct label/color.
Examples from the dataset:
Original Image
Pixel-level human/non-human segmentation, with each human instance getting same label/colour
Pixel-level human/non-human segmentation, with each human instance getting different label/colour
In this work, we propose a method for instance based human segmentation in images and videos, extending the recent detector-based conditional random field model of Ladicky et.al. Instance based human segmentation involves pixel level labeling of an image, partitioning it into distinct human instances and background. To achieve our goal, we add three new components to their framework. First, we include human partsbased detection potentials to take advantage of the structure present in human instances. Further, in order to generate a consistent segmentation from different human parts, we incorporate shape prior information, which biases the segmentation to characteristic overall human shapes. Also, we enhance the representative power of the energy function by adopting exemplar instance based matching terms, which helps our method to adapt easily to different human sizes and poses. Finally, we extensively evaluate our proposed method on the Buffy dataset with our new segmented ground truth images, and show a substantial improvement over existing CRF methods.
Some results
Original Images
Ground truth
Detector + iterative graph cut without shape prior
Model 0 {Ladicky et.al.}
Model 1 (Our results)
Model 2 (Our results)
References
L. Ladicky, P. Sturgess, K. Alahari, C. Russell, and P. H. S. Torr. What, where and how many? combining object detectors and crfs. In ECCV (4), pages 424.437, 2010.