Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 19
  • Item
    Thumbnail Image
    Interest-based negotiation in multi-agent systems
    rahwan, iyad ( 2004)
    Software systems involving autonomous interacting software entities (or agents) present new challenges in computer science and software engineering. A particularly challenging problem is the engineering of various forms of interaction among agents. Interaction may be aimed at enabling agents to coordinate their activities, cooperate to reach common objectives, or exchange resources to better achieve their individual objectives. This thesis is concerned with negotiation: a process through which multiple self-interested agents can reach agreement over the exchange of scarce resources. In particular, I focus on settings where agents have limited or uncertain information, precluding them from making optimal individual decisions. I demonstrate that this form of bounded-rationality may lead agents to sub-optimal negotiation agreements. I argue that rational dialogue based on the exchange of arguments can enable agents to overcome this problem. Since agents make decisions based on particular underlying reasons, namely their interests, beliefs and planning knowledge, then rational dialogue over these reasons can enable agents to refine their individual decisions and consequently reach better agreements. I refer to this form of interaction as “interested-based negotiation.” (For complete abstract open document)
  • Item
    Thumbnail Image
    Network intrusion detection techniques for single source and coordinated scans
    ZHANG, DANA ( 2005-10)
    A prelude to most malicious network attacks involves a systematic scan on a target network. Scans attempt to gather intelligence on the internal structure of a network with the aim to find weaknesses to exploit. Various algorithms have been developed to identify these types of intrusions, however there is no heuristic to confirm the accuracy of these results. Current algorithms only deal with attackers scanning from single sources with no consideration for attackers that may be working from multiple locations. This thesis addresses the need for a conclusive evaluation technique and the need to effectively detect coordinated scans. Two innovative algorithms have been developed. The first is an improved comparison technique for current single scan detection algorithms that can accurately measure the false positive rate and precision or identified scanners. The second is a coordinated scan detection algorithm that is capable of correctly identifying sets of sources that are working in collusion to explore the topology of a network.
  • Item
    Thumbnail Image
    Statistical interpretation of compound nouns
    NICHOLSON, JEREMY ( 2005-10)
    We present a method for detecting compound nominalisations in open data, and deriving an interpretation for them. Discovering the semantic relationship between the modifier and head noun in a compound nominalisation is first construed as a two-way disamiguation task between an underlying subject or object semantic relation between a head noun and its modifier, and second as a three-way task between subject, direct object, and prepositional object relations. The detection method achieves about 89% recall on a data set annotated by way of Celex and Nomlex, and about 70% recall on a randomly-sampled data set based on the British National Corpus, with 77% recall on detecting a more general set of compound nouns from this data. The interpretation method achieved about 72% accuracy in the two-way task, and 57% in the three-way task, using a statistical measure based on z-scores - the confidence interval - in selecting one of the relations. Our proposed method has the advantage over previous research in that it can act over open data to detect and interpret compound nominalisations, as opposed to only operating in a limited domain or requiring hand-selection or hand-tuning.
  • Item
    Thumbnail Image
    The effects of part-of-speech tagsets on tagger performance
    MACKINLAY, ANDREW ( 2005-11)
    In natural language processing (NLP), a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with part-of-speech labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabelled data. Previous work has tended to focus on applying new algorithms to the problem of adding hand-tuned features to assist in classifying difficult instances. Using these methods, a number of distinct approaches have plateaued to similar accuracy figures of 96.9 ± 0.3%. Here we approach the problem of improving accuracy in POS tagging from a unique angle. We use a representative set of tagging algorithms and attempt to optimise performance by modifying the inventory of tags (or tagset) used in the pre-labelled training data . We modify tagsets by systematically mapping the tags of the training data to anew tagset. Our aim is to produce a tagset which is more conducive to automatic POS tagging by more accurately reflecting the underlying linguistic distinctions which should be encoded in a tagset. The mappings are reversible, enabling the original tags to be trivially recovered, which facilitates comparison with previous work and between competing mappings. We explore two different broad sources of these mappings. Our primary focus is on using linguistic insight to determine potentially useful distinctions which we can then evaluate empirically. We also evaluate an alternative data-driven approach for extracting patterns of regularity in a tagged corpus. Our experiments indicate the approach is not as successful as we had predicted. Our most successful mappings were data-driven, which give improvements of approximately0.01% in token level accuracy over the development set using specific taggers, with increments of 0.03% over the test set. We show a wide range of linguistically motivated modifications which cause a performance decrement, while the best linguistic approaches maintain performance approximately over the development data and produce up to 0.05%improvement over the development data. Our results lead us to believe that this line of research is unlikely to provide significant gains over conventional approaches to POS tagging.
  • Item
    Thumbnail Image
    An investigation of interactivity and flow: student behaviour during online instruction
    PEARCE, JON MALCOLM ( 2004-12)
    This thesis combines ideas from human-computer interaction, education and psychology to explore the interactions of students in an online learning environment. The motivation for the work was to understand better how to engage students in a highly enjoyable experience of online learning. The thesis describes three experiments. The first experiment was an exploratory study investigating the influence of learner interactions in an online physics learning task. Students worked through an online learning experience that offered high and low levels of interactivity. The aim was to explore their interactions and choices in an environment in which they could elect to move from the highly interactive mode to the less interactive mode at any time. Web logs were used to track their interactions and question probes gathered data on their emotions, learning goals and strategies. The analysis revealed a number of different patterns of interaction. Statistical analysis showed that most, but not all, preferred to follow an interactive path through the material. Students who used the interactive materials showed improved learning gains in transfer-style questions compared to those in the less interactive mode. Several issues were identified as important to consider in a follow-up study: emotions, affect, challenge, and the degree of control that the learner perceives.
  • Item
    Thumbnail Image
    Browsing and searching compressed documents
    Wan, Raymond ( 2003-12)
    Compression and information retrieval are two areas of document management that exist separately due to the conflicting methods of achieving their goals. This research examines a mechanism which provides lossless compression and phrase-based browsing and searching of large document collections. The framework for the investigation is an existing off-line dictionary-based compression algorithm. (For complete abstract open document)
  • Item
    Thumbnail Image
    Efficient mining of interesting emerging patterns and their effective use in classification
    FAN, HONGJIAN ( 2004-07)
    Knowledge Discovery in Databases (KDD), or Data Mining is used to discover interesting or useful patterns and relationships in data, with an emphasis on large volume of observational databases. Among many other types of information (knowledge) that can be discovered in data, patterns that are expressed in terms of features are popular because they can be understood and used directly by people. The recently proposed Emerging Pattern (EP) is one type of such knowledge patterns. Emerging Patterns are sets of items (conjunctions of attribute values) whose frequency change significantly from one dataset to another. They are useful as a means of discovering distinctions inherently present amongst a collection of datasets and have been shown to be a powerful method for constructing accurate classifiers. (For complete abstract open document)
  • Item
    Thumbnail Image
    Natural language as an agent communication language
    MARCH, OLIVIA ( 2004)
    Intelligent agents should be able to communicate with each other using an extensible, expressive language. Agents should have the ability work together in a heterogeneous environment to solve complex goals, while, acting on their own initiative and maintaining autonomy. Current agent communication languages are not expressive enough to facilitate coordination of agents in a heterogeneous system. Natural languages, such as English have evolved to become expressive enough to advance the human race to be the dominant species. It has been refined over millenia and is proven extensible. This research demonstrates the feasibility of using natural language as an agent communication language for intelligent agents solving a collaborative task
  • Item
    Thumbnail Image
    Combining part of speech induction and morphological induction
    Wilson, Charlotte ( 2004-11)
    Linguistic information is useful in natural language processing, information retrieval and a multitude of sub-tasks involving language analysis. Two types of linguistic information in all languages are part of speech and morphology. Part of speech information reflects syntactic structure and can assist in tasks such as speech recognition, machine translation and word sense disambiguation. Morphological information describes the structure of words and has application in automated spelling correction, natural language generation and information retrieval for morphologically complex languages. Machine learning methods in natural language processing acquire linguistic information from corpora of natural language text. While supervised learning algorithms are trained on texts that have been annotated with linguistic features, induction algorithms learn linguistic information from unannotated corpora. Such algorithms avoid any requirement for linguistically annotated training data - a resource that is highly time-intensive to produce. However, in learning from unannotated corpora, only limited sources of information are available. In practice, part of speech induction methods usually learn from distributional evidence about the contexts in which words occur. In contrast, morphological induction methods tend to be based on the orthographic structure of the words in the corpus. However, a word’s morphological form and syntactic function often correlate: a word’s morphology may indicate its syntactic function and vice versa. Thus, both distributional and orthographic evidence may be useful for both tasks. This thesis investigates the extent to which the information induced by one learner can be used to bootstrap the other: specifically, whether the incorporation of explicit annotations from one learner can improve the performance of the other.
  • Item
    Thumbnail Image
    XSLT as a linguistic query language
    Taylor, Claire Louise ( 2003-11)
    With the growing use of linguistic data, suitable storage techniques and query languages need to be developed. A traditional relational database management system is inappropriate for linguistic data as it typically has some sort of structure associated with it, which can represent hierarchical or sequential relationships. Although there are many different forms of linguistic annotation, there are few query languages that succinctly service the data by providing the necessary features such as data accessibility, transformation and integration. The current challenge facing the creators of linguistic corpora and the corresponding query languages is to find a query language that is expressive enough to enable the features mentioned above while still providing an interface to the data that allows the corpus to be queried in terms of the user’s conceptual model. Previous work in this area has suggested that the hierarchical nature of XML would be well suited to linguistic data and that an existing XML query language could be applied to linguistic queries. This thesis represented two linguistic corpora, TIMIT and the Penn Treebank in XML. Two possible XML representations for TIMIT were explored to illustrate that a permutation in the structure of the data has a significant effect on the ease of writing queries for it. Data structures that were closely related to the user’s conceptual model of the data for a given query were easier to write queries for. It was concluded that the final XML representation for a given corpus would depend on the possible uses of the data.