Unified Stanford Dependencies
:rich linguistic annotation for Modern Hebrew texts
What
This page provides rich linguistic annotation for Modern Hebrew, extending the Stanford Dependencies Scheme (SD) scheme of De-Marneff et al 2006. We provide synchronized analyses of syntactic categories, functional features and morphological information, for Hebrew raw texts.How
Both resources are annotated with grammatical function labels from the U-SD scheme defined in Tsarfaty 2013. The treebanks come in two flavors:- a constituency treebank annotated with SD types as labeled edges
- a dependency treebank annotated with SD types as labeled arcs
- a functional treebank annotated with SD types as labeled nodes
Data
The treebanks are derived from the Hebrew treebank developed at the the knowledge center for processing Hebrew at the Technion.- the constituency version extends the relational-realizational resources of Tsarfaty 2010.
- the dependency version extends the unlabeled-dependencies resources of Goldberg 2011.
Download
The treebanks are in the data directory.- the constituency treebank is provided in bracketed (Lisp-like) format, one tree per line, where every node-label is formatted as follows:
CATEGORY-[SD]-[[MORPH_FEATS]]
- the constituency treebank is provided in CoNLLX format, one word per line, where every word is annotated as follows:
INDEX WORD LEMMA CPOS POS MORPH_FEATS PARENT SD _ _
License
The are distributed under the GPL v3 license or the license currently published on MILA's website -- the more restrictive license applies. This page is provided as a free service with no particular guarantees.Citing
If you found these resources useful for academic or other publications please cite
Reut Tsarfaty, A Unified Morpho-Syntactic Scheme of Stanford Dependencies, In: Proceedings of ACL, 2013 [pdf]
Contact
- mail me any questions or comments you might have