Regularization methods for neural networks and related models
Citations
Altmetric
Author
Demyanov, SergeyDate
2015Affiliation
Computing and Information SystemsMetadata
Show full item recordDocument Type
PhD thesisAccess Status
Open AccessDescription
© 2015 Dr. Sergey Demyanov
Abstract
Neural networks have become very popular in the last few years. They have demonstrated the best results in areas of image classification, image segmentation, speech recognition, and text processing. The major breakthrough happened in early 2010s, when it became feasible to train deep neural networks (DNN) on a GPU, which made the training process several hundred times faster. At the same time, large labeled datasets with millions of objects, such as ImageNet , became available. The GPU implementation of a convolutional DNN with over 10 layers and millions of parameters could handle the ImageNet dataset in just a few days. As a result, such networks could decrease classification error in the image classification competition LSVRC-2010 by 40\% compared with the hand-made feature algorithms.
Deep neural networks are able to demonstrate excellent results on tasks with a complex classification function and sufficient amount of training data. However, since DNN models have a huge number of parameters, they can also be easily overfitted, when the amount of training data is not large enough. Thus, regularization techniques for neural networks are crucially important to make them applicable to a wide range of problems. In this thesis we provide a comprehensive overview of existing regularization techniques for neural networks and provide their theoretical explanation.
Training of neural networks is performed using the Backpropagation algorithm (BP). Standard BP has two passes: forward and backward. It computes the predictions for the current input and the loss function in the forward pass, and the derivatives of the loss function with respect to the input and weights on the backward pass. The nature of the data usually assumes that two very close data points have the same label. This means that the predictions of a classifier should not change quickly near the points of a dataset. We propose a natural extension of the backpropagation algorithm that minimizes the length of the vector of derivatives of the loss function with respect to the input values, and demonstrate that this algorithm improves the accuracy of the trained classifier.
The proposed invariant backpropagation algorithm requires an additional hyperparameter that defines the strength of regularization, and therefore controls the flexibility of a classifier. In order to achieve the best results, the initial value of this parameter needs to be carefully chosen. Usually this hyperparameter is chosen using a validation set or cross-validation. However, these methods might not be accurate and can be slow. We propose a method of choosing the parameter that affects a classifier flexibility and demonstrate its performance on Support Vector Machines. This method is based on the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), and uses the disposition of misclassified objects and VC-dimension of the classifier.
In some tasks, data consists of feature values as well as additional information about feature location in one or more dimensions. Usually these dimensions are space and time. For example, image pixels are described by their coordinates in horizontal and vertical axes. Various time series necessarily contain information when their elements were taken. This information might be used by a classifier. Some regularizers are particularly targeted at this goal, restricting a classifier from learning an inappropriate model. We present an overview of such regularization methods, describe some of their applications and provide the result of their usage.
Video is one of the domains where one has to consider time. One of the challenging tasks of this domain is deception detection from visual cues. The psychology literature indicates that people are unable to detect deception with high accuracy and is slightly better than a random guess. At the same time, trained individuals were shown to be able to detect liars with an accuracy up to 73\% \cite{ekman1991can, ekman1999few}. This result confirms that visual and audio channels contain enough information to detect deception. In this thesis we describe an automated multilevel system of video processing and feature engineering based on facial movements. We demonstrate that the extracted features provide a classification accuracy that is statistically significantly better than a random guess. Another contribution of this thesis is the collection of one of the largest datasets of videos with truthful and deceptive people recorded in more natural conditions than others.
Keywords
regularization; neural networks; model selection; support vector machines; deception detectionExport Reference in RIS Format
Endnote
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
Refworks
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References