Difference between revisions of "Nonstructural labels"

From Icelandic Parsed Historical Corpus (IcePaHC)
Jump to: navigation, search
Line 78: Line 78:
Fragments are grammatical utterances which consist of at least two constituents. In the utterance, however, is not enough material to construct an IP.
Fragments are grammatical utterances which consist of at least two constituents. In the utterance, however, is not enough material to construct an IP.
See [http://www.ling.upenn.edu/hist-corpora/annotation/disfluencies.htm#rep PPCHE].

Latest revision as of 15:24, 12 August 2014


Click here for PPCME2, PPCEME documentation.

When unsure of parse (COM for comment):

(CODE ({COM:unsure_of_parse}))

Foreign language passages

Click here for PPCME2, PPCEME documentation.

Foreign language passages more than one word are labeled with the language name (e.g. Latin). If the passage forms its own clause, there are two options.

1) If the passage is a direct speech or, e.g., a prayer, it is parsed as QTP which idoms LATIN (which in turn idoms FWs).

2) If the language passage is not a direct speech, it is parsed as Latin on the clause level (idominating FWs).

Rule of thumb: if a word is not found in the Icelandic Dictionary, tag it as FW (foreign word).

( (LATIN (FW Assumptio-assumptio) (FW sancte-sancte) (FW Marie-marie)))
	  (IP-MAT=1 (CONJ en-en)
		    (PP (P í-í)
			(NP (OTHER-D öðru-annar) (N-D lífi-líf)))
		    (VB veita-veita)
		    (NP-OB2 (PRO-D oss-ég))
		    (NP-OB1 (ADJR-D meiri-mikill)
			    (N-D dýrð-dýrð)
			    (PP (P en-en)
				(CP-CMP (WNP-4 0)
					(C 0)
					(IP-SUB (NP-OB1 *T*-4)
						(NP-SBJ (PRO-N vér-ég))
						(ADVP-TMP (ADV nú-nú))
						(VB biðja-biðja))))))
	  (, .-.)
	  (NP-PRN (D-N SÁ-SÁ)
		  (D-N inn-inn)
		  (ADJ-N sami-samur)
		  (NPR-N Jesús-jesús)
		  (NPR-N Kristur-kristur)
		  (, ,-,)
		  (CP-REL (WNP-1 0)
			  (C ER-ER)
			  (IP-SUB (NP-SBJ *T*-1)
				  (PP (P með-með)
				      (NP (N-D feður-faðir)
					  (CONJP (CONJ og-og)
						 (NP (ADJ-D helgum-helga) (N-D anda-andi)))))
				  (VBPI (VBPI lifir-lifa) (CONJ og-og) (VBPI ríkir-ríkja)))))
	  (LATIN (FW per-per) (FW omnia-omnia) (FW secula-secula) (FW seculorum-seculorum))
	  (. .-.)))
	  (ADVP-TMP-RSP (ADV þá-þá))
	  (VBPI fer-fara)
	  (NP-SBJ (PRO-N hann-hann))
	  (PP (P til-til)
	      (LATIN (FW templum-templur) (FW Domini-dominur)))
	  (IP-INF-PRP (TO að-að)
		      (VB bera-bera)
		      (ADVP-DIR (ADV þar-þar))
		      (NP-OB1 (N-A reykelsi-reykelsi)))
	  (. .-.)))


Click here for PPCME2, PPCEME documentation.

Used, e.g., in chapter headings that are part of the author's text (and not the editor's).


Fragments are grammatical utterances which consist of at least two constituents. In the utterance, however, is not enough material to construct an IP.




Quotation phrase

Click here for PPCME2, PPCEME documentation.

QTP and FRAG do not idom arguments of verbs. They can, however, idom NP-VOC, NP-ADV ...

When yes or no are an argument of a verb, they are parsed as QTP (and tagged as INTJ).


Click here for PPCME2, PPCEME documentation.