Difference between revisions of "Splitting and joining words"

From Icelandic Parsed Historical Corpus (IcePaHC)
Jump to: navigation, search
Line 16: Line 16:
 
      (NP-OB1 (PRO-A það-það)
 
      (NP-OB1 (PRO-A það-það)
 
</pre>
 
</pre>
 +
 +
==Items treated as unitary==
 +
Items of this kind may be written as one word or more. When they are written as one, they get a simple POS tag but when written apart each part gets its own '''numbered''' POS tag. Together they project one tag:
 +
 +
<pre>
 +
(NP-OB1 (NS-D (N21-A kapal-kapall) (NS22-D hestum-hestur))))))
 +
</pre>
 +
 +
The first number (2) is the number of parts in the item, the second one (1 / 2) shows each part's place within the sequence.
 +
 +
Note that in the case of nouns, adjectives and pronouns, different parts of these items usually don't have the same case, as in the example above; the first part usually gets accusative or genitive since it modifies the last part which is assigned case by

Revision as of 09:39, 6 July 2010

Items that are split

Definite article (determiner):

(NP-VOC (N-N gæska$-gæska)
        (D-N $n-hinn)
        (NP-POS (PRO-N mín-minn)))

Suffixed þú 'you' on finite verbs. -du, -ðu-, -tu is always NP-SBJ:

	      (ADVP-RSP (ADV þá-þá))
	      (VBPI veis$-vita)
	      (NP-SBJ (PRO-N $tu-þú))
	      (NP-OB1 (PRO-A það-það)

Items treated as unitary

Items of this kind may be written as one word or more. When they are written as one, they get a simple POS tag but when written apart each part gets its own numbered POS tag. Together they project one tag:

(NP-OB1 (NS-D (N21-A kapal-kapall) (NS22-D hestum-hestur))))))

The first number (2) is the number of parts in the item, the second one (1 / 2) shows each part's place within the sequence.

Note that in the case of nouns, adjectives and pronouns, different parts of these items usually don't have the same case, as in the example above; the first part usually gets accusative or genitive since it modifies the last part which is assigned case by