Learning Sparse Representations for Computer Vision ApplicationsReportar como inadecuado




Learning Sparse Representations for Computer Vision Applications - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Crowd Counting, Feature Selection, Object Classification, People Counting, Multi-Modal Dictionary Learning, Low-rank Learning, Sparse Representation, Compressed Sensing, Dictionary Learning, Dimensionality Reduction, Joint Optimization, Joint Dictionary Learning and Dimensionality Reduction

Foroughi, Homa

Supervisor and department: Zhang, Hong Computing Science Ray, Nilanjan Computing Science

Examining committee member and department: Jagersand, Martin Computing Science Boulanger, Pierre Computing Science Jepson, Allan Computer Science, University of Toronto

Department: Department of Computing Science

Specialization:

Date accepted: 2017-03-17T13:12:58Z

Graduation date: 2017-06:Spring 2017

Degree: Doctor of Philosophy

Degree level: Doctoral

Abstract: At the core of many computer vision methods lies the question of how to represent data. Representing the data in a meaningful way, which highlights its most useful properties, can significantly affect the performance of any vision-based application. Traditional systems are heavily reliant on hand-designed representations that are mostly domain-specific and also need significant amounts of domain knowledge and human effort. Recently, there has been much research in learning representation from data and one of successful approaches is the sparse representation, which tries to represent data as a linear combination of a few elements of a basis or dictionary. A good sparse representation of an image is expected to have high fidelity to the observed image content and reveal its underlying structure and semantic information at the same time. In this thesis, we address the problem of how to learn such representation or dictionary from training images, particularly for crowd counting, image classification, and dimensionality reduction tasks. Counting pedestrians in videos is a topic of great interest in areas such as visual surveillance, public resource management and security purposes. Crowd counting could be a challenging task due to severe occlusions, scene perspective distortions and diverse crowd distributions. In this thesis, we propose two methods for crowd counting based on compressed sensing and sparse representation theories, each of which is capable of resolving some of the aforementioned issues. Firstly, we present a counting method based on image retrieval framework, and also introduce a compact global image descriptor using compressed sensing theory, to estimate the crowd count. Next, we propose a crowd counting method based on sparse representation-based classification and random projection. We adopt a semi-supervised elastic-net to provide a rich training set, that can span variations under testing conditions. By exploiting the sequential information of readily available vast quantity of unlabeled data, we are able to annotate a large portion of data with just a handful of labeled images. Experiments on crowd counting benchmark datasets demonstrate the effectiveness and reliability of proposed methods, especially in large-scale datasets. Image classification based on visual content is a challenging task, mainly because there is usually large amount of intra-class variability, arising from illumination and viewpoint variations, occlusion and corruption. In addition, many real-world vision applications are faced with the problem of high-dimensional data and small number of training samples. To address all these issues, we propose a joint learning framework, in which the subspace projection matrix, the dictionary and sparse coefficients are learned simultaneously. By incorporating competent constraints such as low-rank, incoherence and neighborhood preservation, we are able to learn discriminative and robust sparse representations of images, especially for challenging classification scenarios. Experimental results on several benchmark datasets verify the superior performance of our method for object classification of small datasets, which include considerable amount of different kinds of variation. Feature selection is another solution to deal with high-dimensional data, and recently sparsity constraints have been utilized to select a subset of features. We propose a feature selection method based on the decision rule of dictionary learning, and integrate low-rank matrix recovery, reconstruction residuals, and row-sparsity constraints into the framework. As a result, the proposed method selects optimal subset of features simultaneously, and provides well-separated classes in the reduced space. Our method is capable of selecting discriminative features, even when the data are contaminated due to occlusion, illumination or pose variations and corruption. Extensive experiments on benchmark datasets verify the superior performance of the proposed method for feature selection, image-video classification and counting specific populations of tumor cells in microscopic images.

Language: English

DOI: doi:10.7939-R3H41K01S

Rights: This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.





Autor: Foroughi, Homa

Fuente: https://era.library.ualberta.ca/


Introducción



Learning Sparse Representations for Computer Vision Applications by Homa Foroughi A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computing Science University of Alberta c Homa Foroughi, 2017 Abstract At the core of many computer vision methods lies the question of how to represent data. Representing the data in a meaningful way, which high- lights its most useful properties, can significantly affect the performance of any vision-based application.
Traditional systems are heavily reliant on handdesigned representations that are mostly domain-specific and also need significant amounts of domain knowledge and human effort.
Recently, there has been much research in learning representation from data and one of successful approaches is the sparse representation, which tries to represent data as a linear combination of a few elements of a basis or dictionary.
A good sparse representation of an image is expected to have high fidelity to the observed image content and reveal its underlying structure and semantic information at the same time.
In this thesis, we address the problem of how to learn such representation or dictionary from training images, particularly for crowd counting, image classification, and dimensionality reduction tasks. Counting pedestrians in videos is a topic of great interest in areas such as visual surveillance, public resource management and security purposes.
Crowd counting could be a challenging task due to severe occlusions, scene perspective distortions and diverse crowd distributions.
In this thesis, we propose two methods for crowd counting based on compressed sensing and sparse representation theories, each of which is capable of resolving some of the aforementioned issues.
Firstly, we present a counting method based on image retrieval framework, and also introduce a compact global image descriptor using compressed sensing theory, to estimate the crowd count.
Next, ...





Documentos relacionados