Feature-Driven Emergence of Model Graphs for Object Recognition and Categorization

* Final gross prices may vary according to local VAT.

Get Access

An important requirement for the expression of cognitive structures is the ability to form mental objects by rapidly binding together constituent parts. In this sense, one may conceive the brain's data structure to have the form of graphs whose nodes are labeled with elementary features. These provide a versatile data format with the ability to render the structure of any mental object. Because of the multitude of possible object variations the graphs are required to be dynamic. Upon presentation of an image a so-called model graph should rapidly emerge by binding together memorized subgraphs derived from earlier learning examples driven by the image features. In this model, the richness and flexibility of the mind is made possible by a combinatorial game of immense complexity. Consequently, emergence of model graphs is a laborious task which, in computer vision, has most often been disregarded in favor of employing model graphs tailored to specific object categories like faces in frontal pose. Invariant recognition or categorization of arbitrary objects, however, demands dynamic graphs.

In this work we propose a form of graph dynamics which proceeds in three steps. In the first step position-invariant feature detectors, which decide whether a feature is present in an image, are set up from training images. For processing arbitrary objects these features are small regular graphs, termed parquet graphs, whose nodes are attributed with Gabor amplitudes. Through combination of these classifiers into a linear discriminant that conforms to Linsker's infomax principle a weighted majority voting scheme is implemented. This network, termed the preselection network, is well suited to quickly rule out most irrelevant matches and only leaves the ambiguous cases, so-called model candidates, to be processed in a third step using a rudimentary version of elastic graph matching, a standard correspondence-based technique for face and object recognition. To further differentiate between model candidates with similar features it is asserted that the features be in similar spatial arrangement for the model to be selected. Model graphs are constructed dynamically by assembling model features into larger graphs according to their spatial arrangement. The model candidate whose model graph attains the best similarity to the input image is chosen as the recognized model.

We report the results of experiments on standard databases for object recognition and categorization. The method achieved high recognition rates on identity, object category, and pose, provided that individual object variations are sufficiently covered by learning examples. Unlike many other models the presented technique can also cope with varying background, multiple objects, and partial occlusion.