Fast query for large treebanks
Citations
Altmetric
Author
GHODKE, SUMUKH; BIRD, STEVENDate
2010Source Title
Human Language Technologies: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational LinguisticsPublisher
Association for Computational LinguisticsAffiliation
Engineering - Computer Science and Software EngineeringMetadata
Show full item recordDocument Type
Conference PaperCitations
Ghodke, S., & Bird, S. (2010). Fast query for large treebanks. In Human Language Technologies: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, USA.Access Status
Open AccessDescription
This is a pre-print of a paper from Human Language Technologies: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 published by Association for Computational Linguistics. http://naaclhlt2010.isi.edu/
Abstract
A variety of query systems have been developed for interrogatingparsed corpora, or treebanks. With the arrival of efficient,wide-coverage parsers, it is feasible to create very largedatabases of trees.However, existing approaches that use in-memory search,or relational or XML database technologies, do not scale up.We describe a method for storage, indexing, and query oftreebanks that uses an information retrieval engine.Several experiments with a large treebank demonstrateexcellent scaling characteristics for a wide rangeof query types. This work facilitates the curation ofmuch larger treebanks, and enables them to be used effectivelyin a variety of scientific and engineering tasks.
Export Reference in RIS Format
Endnote
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
Refworks
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References