Scene recognition combining structural and textural features
Automatic recognition of the contents of a scene is an important issue in the field of computer vision. Although considerable progress has been made, the complexity of scenes remains an important challenge to computer vision research. Most previous approaches for scene recognition are based on the so-called "bag of visual words" model, which uses clustering methods to quantize numerous local region descriptors into a codebook. The size of the codebook and the selection of initial clustering centers greatly affect the performance. Furthermore, the large size of the codebook leads to high computational costs and large memory consumption. To overcome these weaknesses, we present an unsupervised natural scene recognition approach that is not based on the "bag of visual words" model. This approach constructs multiple images of different resolutions and extracts structural and textural features from these images. The structural features are represented by weighted histograms of the gradient orientation descriptor, which is presented in this paper, and the textural features are represented by filter responses of Gabor filters and a Schmid set. We regard the structural and textural features as two independent feature channels, and combine them to realize automatic categorization of scenes using a support vector machine. We then evaluated our approach using three commonly used datasets with various scene categories. Our experiments demonstrate that the weighted histograms of the gradient orientation descriptor outperform the classical scale invariant feature transform descriptor in natural-scene recognition, and our approach achieves good performance with respect to current state-of-the-art methods.
Science China(Information Sciences)
2013年07期
立即查看 >
图书推荐
相关工具书