XSLT as a linguistic query language
AuthorTaylor, Claire Louise
AffiliationEngineering: Department of Computer Science and Software Engineering
MetadataShow full item record
Document TypeHonours thesis
CitationTaylor, C. L. (2003). XSLT as a linguistic query language. Honours thesis, Department of Computer Science and Software Engineering, The University of Melbourne.
Access StatusOpen Access
Deposited with permission of the author. © 2003 Claire Louise Taylor
With the growing use of linguistic data, suitable storage techniques and query languages need to be developed. A traditional relational database management system is inappropriate for linguistic data as it typically has some sort of structure associated with it, which can represent hierarchical or sequential relationships. Although there are many different forms of linguistic annotation, there are few query languages that succinctly service the data by providing the necessary features such as data accessibility, transformation and integration. The current challenge facing the creators of linguistic corpora and the corresponding query languages is to find a query language that is expressive enough to enable the features mentioned above while still providing an interface to the data that allows the corpus to be queried in terms of the user’s conceptual model. Previous work in this area has suggested that the hierarchical nature of XML would be well suited to linguistic data and that an existing XML query language could be applied to linguistic queries. This thesis represented two linguistic corpora, TIMIT and the Penn Treebank in XML. Two possible XML representations for TIMIT were explored to illustrate that a permutation in the structure of the data has a significant effect on the ease of writing queries for it. Data structures that were closely related to the user’s conceptual model of the data for a given query were easier to write queries for. It was concluded that the final XML representation for a given corpus would depend on the possible uses of the data.
Keywordsnatural language generation; automatic speech recognition; TIMIT; Penn Treebank; linguistic query language
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References