Comraf models offer a great
variety of design choices for multi-modal clustering. Some modalities can be
clustered top-down, some bottom-up, some can remain flat, and some may not be clustered
at all. In this work, we present a light-weight Comraf model, called Comraf*,
in which only one modality is to be clustered. We show how to translate a
general Comraf model into a series of Comraf* models.
We test the resulting models
on an image clustering task, where the modalities are images, their colors and
texture, their rectangular regions (local features), as well as words from
their captions.
IsraelImages dataset. To evaluate our methods, I collected an image
dataset that consists of 1823 images downloaded from the Israel Images website. Each image is
assigned into one of 11 categories, which represent main aspects of the Israeli
scenery and society. Each image is 375 by 250 pixels and has a 1 to 18
words long caption. Because of the copyright issue, I cannot upload the images
to my website, however, I can list their URLs for download from the original Israel Images website.
The Israel Images website owners declare that their images are free to download
and use for non-commercial purposes. Please contact me if you experience any
difficulty with downloading the images.
|
|
|
|
R. Bekkerman and J. Jeon. Multi-modal Clustering for Multimedia Collections. In Proceedings
of CVPR
2007 |