Image Retrieval with Multiple Regions-of-Interest

Image Retrieval with Multiple Regions-of-Interest

With the proliferation of multimedia, the web and digital imaging, there now exists a high demand for intelligent tools for image management, most importantly indexing, search and retrieval, commonly referred to as QBIC or "query-by-image-content". Existing systems make use of global attributes such as overall color distributions which ignore the actual composition of the image in terms of internal structures.

The goal of this project has been to develop a new image retrieval system based on the principle that it is the user who is most qualified to specify the "content" in an image and not the computer. Therefore the user is asked to provide salient ROIs or "regions-of-interest" and their spatial arrangements in the query image. This technique leads to more acceptable matches returned by the search engine and therefore a more powerful image retrieval tool.

Abbreviations: QBIC: Query-by-Image-Content, ROI: Region-of-Interest

Background and objectives: Most current "query-by-image-content" database indexing and retrieval systems rely on global image characteristics such as color and texture histograms. While these simple descriptors are fast and often do succeed in capturing a vague essence of the user's query, they more often fail due to the lack of higher-level knowledge about what exactly was of interest to the user in the query image - ie. the user-defined content. The goal of this project was to develop and test a new technique in image retrieval using local image representations, grouping them into multiple user-specificed "regions-of-interest" while preserving their relative spatial relationships in order to build a more powerful search engine for various applications of image database retrieval.

Technical discussion: Image retrieval in general is based on two key components: a set of image features (like color or texture attributes) and a similarity metric (used to compare images). To date most systems use global color histograms to represent the color composition of an image, thus ignoring the spatial layout of color in the query image. Likewise, a single global vector (or histogram) of texture measures (usually computed as the output of a set of linear filters at multiple scales) is used to represent non-color image attributes (such as coarseness, etc.) The similarity metric used to compute the degree-of-match between two images is typically a Euclidean norm on the difference between two such global color/texture representations. In contrast, our system divides the image into an array of 16-by-16 pixel blocks each of which contains the following feature representations: a joint color histogram in LUV color space and joint 3D histogram consisting of the edge magnitude, Laplacian and dominant edge orientation, computed at two octave scales. These non-parametric densities represent local color and texture and due to the additive property of histograms can be easily combined to form bigger image blocks all the way up to the entire image at which point they become complete global representations. When the user specifies a region of interest, its underlying blocks are "pooled" to represent a "meta-block" to be searched for in the database. Multiple regions are likewise searched and the intersection of the best matches determines the final similarity ranking of images in the database. In addition, the user has the option of specifying whether multiple selected regions should maintain their respective spatial arrangement.

Collaboration: This project was aided by interns from New York University and Carnegie Mellon University.

Future Directions: Currently the search for ROIs is computationally intensive and pruning strategies should be implemented in order to avoid searching the entire database for a "meta-block" query. Nevertheless, this region-based image retrieval approach should prove useful not only for general image databases, but specifically for medical applications where both appearance and spatial factors play a significant diagnostic role.

Authors: Baback Moghaddam

October 11, 1998
ITA - Mitsubishi Electric
Information Technology Center America
MERL - A Mitsubishi
Electric Research Laboratory


Baback Moghaddam
Henning Biermann (NYU)
Dimitris Margaritis (CMU)
Defining Image Content with Multiple Regions-of-Interest
Moghaddam B., Biermann H., Margaritis D.
MERL Tech Report TR99-10

Computer Vision Research


This page is a local copy of