Computing relationships and relatedness between contextually diverse entities
AffiliationScience - Information Systems
Document TypePhD thesis
CitationsGrieser, K. (2011). Computing relationships and relatedness between contextually diverse entities. PhD thesis, Science - Information Systems, The University of Melbourne.
Access StatusOpen Access
© 2011 Dr. Karl Grieser
When presented with a pair of entities such as a ball and a bat, a person may make the connection that both of these entities are involved in sport (e.g., the sports baseball or cricket, based on the individual's background), that the composition of the two entities is similar (e.g., a wooden ball and a wooden stick), or if the person is especially creative, a fancy dress ball where someone has come dressed as a bat. All of these connections are equally valid, but depending on the context the person is familiar with (e.g., sport, wooden objects, fancy dress), a particular connection may be more apparent to that person. From a computational perspective, identifying these relationships and calculating the level of relatedness of entity pairs requires consideration of all ways in which the entities are able to interact with one another. Existing approaches to identifying the relatedness of entities and the semantic relationships that exist between them fail to take into account the multiple diverse ways in which these entities may interact, and hence do not explore all potential ways in which entities may be related. In this thesis, I use the collaborative encyclopedia Wikipedia as the basis for the formulation of a measure of semantic relatedness that takes into account the contextual diversity of entities (called the Related Article Conceptual Overlap, or RACO, method), and describe several methods of relationship extraction that utilise the taxonomic structure of Wikipedia to identify pieces of text that describe relations between contextually diverse entities. I also describe the construction of a dataset of museum exhibit relatedness judgements used to evaluate the performance of RACO. I demonstrate that RACO outperforms state-of-the-art measures of semantic relatedness over a collection of contextually diverse entities (museum exhibits), and that the taxonomic structure of Wikipedia provides a basis for identifying valid relationships between contextually diverse entities. As this work is presented in regard to the domain of Cultural Heritage and using Wikipedia as a basis for representation, I additionally describe the process for adapting the principle of conceptual overlap for calculating semantic relatedness and the relationship extraction methods based on taxonomic links to alternate contextually diverse domains, and for use with other representational resources.
KeywordsWikipedia; information Extraction; cultural heritage; semantic relatedness; natural language processing
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References