Show simple item record

dc.contributor.authorGrebenyuk, Dmitry
dc.date.accessioned2020-11-13T01:16:16Z
dc.date.available2020-11-13T01:16:16Z
dc.date.issued2020
dc.identifier.urihttp://hdl.handle.net/11343/251377
dc.description© 2020 Dmitry Grebenyuk
dc.description.abstractA Markov decision process (MDP) cannot be used for learning end-to-end control policies in Reinforcement Learning when the dimension of the feature vectors changes from one trial to the next. For example, this difference is present in an environment where the number of blocks to manipulate can vary. Because we cannot learn a different policy for each number of blocks, we suggest framing the problem as a POMDP instead of the MDP. It allows us to construct a constant observation space for a dynamic state space. There are two ways we can achieve such construction. First, we can design a hand-crafted set of observations for a particular problem. However, that set cannot be readily transferred to another problem, and it often requires domain-dependent knowledge. On the other hand, a set of observations can be deduced from visual observations. This approach is universal, and it allows us to easily incorporate the geometry of the problem into the observations, which can be challenging to hard-code in the former method. In this Thesis, we examine both of these methods. Our goal is to learn policies that can be generalised to new tasks. First, we show that a more general observation space can improve the performance of policies tested in untrained tasks. Second, we show that meaningful feature vectors can be obtained from visual observations. If properly regularised, these vectors can reflect the spacial structure of the state space and used for planning. Using these vectors, we construct an auto-generated reward function, able to learn working policies.
dc.rightsTerms and Conditions: Copyright in works deposited in Minerva Access is retained by the copyright owner. The work may not be altered without permission from the copyright owner. Readers may only download, print and save electronic copies of whole works for their own personal non-commercial use. Any use that exceeds these limits requires permission from the copyright owner. Attribution is essential when quoting or paraphrasing from these works.
dc.subjectreinforcement learning
dc.subjectvariational autoencoders
dc.subjectlatent space regularisation
dc.subjectmachine learning
dc.subjectunsupervised learning
dc.subjectcomputer vision
dc.titleLearning to generalise through features
dc.typeMasters Research thesis
melbourne.affiliation.departmentComputing and Information Systems
melbourne.affiliation.facultyEngineering
melbourne.thesis.supervisornameNir Lipovetzky
melbourne.contributor.authorGrebenyuk, Dmitry
melbourne.thesis.supervisorothernamePeter Stuckey
melbourne.thesis.supervisorothernameMiguel Ramirez Javega
melbourne.thesis.supervisorothernameKrista Ehinger
melbourne.tes.fieldofresearch1080101 Adaptive Agents and Intelligent Robotics
melbourne.tes.fieldofresearch2080104 Computer Vision
melbourne.tes.confirmedtrue
melbourne.accessrightsOpen Access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record