Treatment of individual words

From Icelandic Parsed Historical Corpus (IcePaHC)
Revision as of 14:19, 30 August 2022 by Einarfs (Talk | contribs) (Þ)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

A

A for negation, see AT

AÐEINS, tag as FP=focus particle, cf. only in English

AF HVERJU 'why', parse (WPP (P af) (WNP (WPRO-D hverju)))

AFTAN can be ADV or P, depending on whether it takes a complement or not.

AFTUR, 'again', tag as ADVP, not as ADVP-TMP, cf. PPCME. When AFTUR means 'back', tag as ADVP-DIR. When it is ambiguous, tag as ADVP-DIR.

AFVEGA is tagged ADV.

ALLA REIÐU, parse NP-TMP.

ALLUR, ALLIR, tagged Q

ALLEINASTA, ALLEINA is generally used as a focus particle and accordingly tagged FP, as with English ONLY in focus particle use.

ALLTAF is ADVP-TMP. When written ALLT AF, tag it as ADV21 allt ADV22 af, projecting ADVP-TMP.

ALLT AF LÉTTA, "af létta" is a PP and the lemma of létta is létti.

ALLT Í EINU

ALLT TIL + NP (e.g. "allt til enda veraldar"), the whole thing is a PP idominating Q-N allt:

(PP (Q-N allt-allur)
    (P til-til)
    (NP (ADJS-G yðsta-ytri)
	(NP-POS (N-G jarðar-jörð))
	(N-G enda-endi)))))))))))

ALLS can be an ADV or P. ALLS is always tagged P when it introduces a CP-ADV, in which case it has a meaning akin to "þar sem" or "af því að".

ALLSKYNS

ALÞING(I) is tagged NPR rather than N.

ANGRA is *not* treated as having an accusative subject. So in sentences like "Hún angraði þau" (= "She angered them"), "Hún" is parsed as the NP-SBJ. This analysis is based on the intuitions of modern speakers, but note that it may not be the correct analysis for *all* texts and periods of the language.

ANNAÐHVORT tagged OTHER+WPRO, and should not have case (only) when it functions as a correlative conjunction.

ANNAÐ TVEGGJA, tag as NP-ADV

ANNAR, in most cases is tagged as OTHER; AÐRIR as OTHERS. However, if ANNAR clearly means the ordinal number "second" in context, it is tagged ADJ as all ordinal numbers are.

ANNARS, when it occurs alone, without being assigned the genitive gase, parse as NP-ADV.

ANNO Latin "year", used for dates, is tagged FW.

ARNA/ARNI the swearing-expletive-like element, as in ""skituna þá arna", is simply tagged N.

AT or A, negation suffix (the lemma is -at): "þá verðura honum gagn" 'then will.be-not him use'

AUK can be P or ADV, depending on whether it takes a complement or not.

Á

Á either RP or P; see RP

ÁÐUR is ADV when it occurs alone, projecting an ADVP-TMP. But when it introduces an adverbial clause alone, it is a P. When ÁÐUR introduces a comparative clause (which has an adverbial function) along with "en", see ÁÐUR EN below.

ÁÐUR EN, ÁÐUR EN AÐ ÁÐUR is tagged ADVR and projects an ADVP-TMP. The EN is a P and frequently takes a CP-CMP complement; see the documentation there. (cf. also FYRR EN, FYRR EN AÐ). Note that this construction sometimes occurs without the EN in older texts.

ÁLÍKA is tagged ADVR when it introduces a comparative clause.

ÁN, this preposition assigns genitive in Modern Icelandic, but in Old Icelandic, such as Homiliubok (12th century), Thorlakur (13th century), Alexander (13th/14th century), Marta (14th century), Bandamenn (15th century), Ector (15th century) and Judit (15th century), it sometimes assigns dative.

B

Á BAK VIÐ, the whole expression is a PP; the PP headed by VIÐ is the complement of the noun BAK. In the cases where the prepositon Á is missing, BAK VIÐ, parse it nevertheless as a PP with an empty preposition. This is similar to the Á MÓTI expression (which takes a NP-COM complement).

BAKA is treated as NS-G in the PP til baka.

BARA is tagged FP.

BÁÐIR, tagged as Q.

BÆÐI is tagged CONJ when it is part of a correlative conjunction, but otherwise it is a form of the quantifier BÁÐIR.

BÆÐI OG is a FLOATED CONJUNCTION. Note that bæði can be a neuter form of the quantifier BÁÐIR.

BRAUT undeclined "braut" indicating motion away ("abroad"), either occurring alone or inside a PP, is tagged ADV. Any declined forms or otherwise clearly nominal forms are tagged N.

BRÁÐ, as in Í BRÁÐ, Í BRÁÐINA, is a noun projecting an NP.

BRÁTT, tag as ADVP-TMP.

BURTSÉÐ is parsed as BURT$ $SÉÐ and tagged as ADV VAN.

BÚA in the periphrastic perfect construction VERA BÚINN AÐ X (where X is some VB forming an IP-INF with AÐ), BÚA is tagged VBN, not VAN. Click on link for an example.

D

DAGLEGANA, and some other adverbs ending with "lega+na", are temporal (cf. also NÝLEGANA)

DÆMI is N, and "til dæmis" can project a FRAG in cases such as the one below:

( (QTP (" ")
       (PP (P Af-af)
	   (NP (PRO-D því-það)
	       (, ,-,)
	       (CP-QUE-SPE-PRN (WNP-1 (WPRO-N hvað-hver))
			       (C 0)
			       (IP-SUB-SPE (NP-ADV-XXX *T*-1)
					   (NP-SBJ (PRO-N þér-þú))
					   (VBPI veljið-velja)
					   (NP-OB2 (PRO-D yður-þú))
					   (NP-OB1 (ADJ-A góða-góður)
						   (NS-A vini-vinur)
						   (, ,-,)
						   (FRAG (PP (P t.-til)
							     (NP (N-G d.-dæmi)))
							 (NP (N-A herra-herra) (NPR-A Þorlák-þorlák))))))))
       (. .-.)
       (" ")))

E

EF 'if', tagged as P; it introduces an adverbial clause (like ÞEGAR often does)

EF TIL VILL, e.t.v. 'maybe, perhaps'

EFTIR can be P or ADV. See also discussion in RP and NP-TMP. When "eftir" modifies an NP-TMP, the structure is:

(NP-TMP (OTHER-A Annan-annar) (N-A dag-dagur) (ADV eftir-eftir))

EI, usually tagged as NEG. However, it sometimes means 'always, forever' (as in EILÍFUR). Then it is parsed ADVP-TMP.

EIGIN, tag as ADJ.

EILÍFUR, tagged ADJ. In the PP að eilífu, EILÍFUR projects ADJP.

EINMITT is generally tagged ADV, but it may also have a focus particle use, and so the tagging convention may be revised (to FP) in later versions of the corpus.

EINHVER and EITTHVAÐ, etc, are tagged ONE+Q (when the meaning is "some"), not ONE+WPRO, see also EINHVERN TÍMA.

EINN, usually tagged as ONE. However, if it means "alone" in a copular clause (e.h. "Jón var þar einn"), it is tagged ADJ. Also, it can be tagged FP following the English corpora in the following case:

"When ONE means ONLY, ALONE and follows the noun or pronoun it focuses or when it follows NOT in the meaning NOT ONLY, it is treated as a focus particle (FP)."

EINN can also be in the plural, in which case it is tagged ONES.

EINNIG, EINNINN 'also', tag as ALSO

EINS, meaning 'alike' as in ekki fór eins fyrir honum og henni, tag as ADJ. Otherwise, ADVR. In the EINS OG construction (a type of comparative) or any other comparative construction (see CP-CMP), "eins" is tagged ADVR. See EINS OG in ADJP#ADJ_heads_of_ADJP and ADJP#ADVR_heads_of_ADJP. Also, "eins" is ADVR in "undir eins".

EINSKONAR and margskonar, etc: Q+N-G, projecting NP-POS.

EINUNGIS is tagged FP.

EITTHVAÐ is tagged ONE+Q.

ELLEGAR , "otherwise", is tagged ADV and projects ADVP.

EN can be tagged CONJ or P, much like BUT in English. It can also be FP if it participates in the NEG...BUT construction.

ENDA, usually tagged as ADV, but it can be CONJ in cases where it clearly conjoins clauses. ENDA can also be tagged P, but *only* where it *clearly* introduces a subordinate clause of the CP-ADV type; in this latter case ENDA usually means something like "on the condition that", and it introduces a CP-ADV without a C node but with V-to-C movement of a conditional verb [1]: "enda sé hann svo lítillátur..."

ENGINN, tagged Q

ENNÞÁ or ENN ÞÁ, tag as (ADVP-TMP (ADV enn) (ADV þá))

ER When it means 'which, that' it is a complementizer of a relative clause (CP-REL). When ER means "when", there are two possibilities (1) if there is no antecedent we take it to be a C as before, projecting a CP-ADV with no wh-word (as in the CP-ADV complement of "þegar") (2) if there is a temporal antecedent it introduces a CP-REL clause. Rarely, but on occasion ER can also be a complementizer projecting a CP-THT clause. (Check the latter parse when you find it in the corpora, as there may be confusion on this point).

ETC as in the English corpora, "etc" is tagged FW. It can appear at the clause level as FW in some cases, though it generally functions as an adverb phrase there.

EYKT is tagged N and projects NP-TMP. It means "half past 3 o´clock".

F

can be VB, or also MD when it tages a VBN complement (see also GETA).

FEIKN meaning "a great quantity" is tagged N and usually projects a NP-MSR.

FIRRUM is tagged ADV, projecting ADVP-TMP, like FORÐUM.

FJARRI, tag either as ADVR or ADJR and lemmatize as FJÆR. The superlative of FJARRI is FJARST (ADVS) or FJARSTUR (ADJS) (both lemmatized as FJÆR); the superlative of FJÆR is FJÆRST (ADVS) or FJÆRSTUR (ADVR).

FJARST, FJARSTUR, FJÆR, FJÆRST, FJÆRSTUR, lemmatized as FJÆR. See FJARRI.

FJÓRÐUNGUR is tagged N, like HUNDRUÐ (not like HÁLFUR).

FRAM is tagged RP when it does not take a complement. Note that as with many of the words tagged RP, when FRAM immediately precedes a preposition heading a PP, it is parsed as a specifier of PP.

FRAMAR, FRAMAST are usually ADJR, ADJS respectively, projecting an NP-MSR like English "further/farther", "furthest/farthese". See also LANGT.

FRAMVEGIS, o.s.frv., og svo framvegis, etc: tagged ADV. These words can project an ADVP which can be coordinated with any category, as in the examples below. Note that when "svo" appears, it is attached at the level of CONJ, not inside the ADVP headed by FRAMVEGIS.

                  (PP-1 (P about)
                        (NP (NP (NS Males))
                            (CONJP (CONJ and)
                                   (NP (NS Females)))
		            (, ,)
                            (CONJP (CONJ and)
                                   (ADVP (ADV so))
                                   (ADVP (ADV forth)))))

(NP-PRN-1 (NP (N-D stöðuglyndi-stöðuglyndi)
			(CONJ og-og)
			(N-D sparsemi-sparsemi)
			(IP-INF-PRP (TO að-að)
				    (VB passa-passa)
				    (RP upp-upp)
				    (PP (P á-á)
					(NP (N-A heilsu-heilsa)
					    (NP-POS (PRO-A sína-sinn))))))
		    (CONJP (CONJ og-og)
			   (ADVP (ADV svo-svo))
			   (ADVP (ADV framvegis-framvegis))))
	  (. .-.)))

FREMI, tag as ADV

FULLUR

FRÁ is tagged RP when it does not take a complement, like English "fro" in the English corpora. FRÁ can also be an ADV projecting an ADVP *only* when it is the complement of a preposition.

FYRIRGEFA when it takes 2 arguments, the dative (usually animate) one is NP-OB2, and the accusative one (the sin to be forgiven) is NP-OB1.

FYRIR OFAN, parse as recursive PPs

FYRR EN, FYRR EN AÐ FYRR is tagged ADVR and projects an ADVP-TMP. The EN is a P and it frequently takes a CP-CMP complement.(cf. ÁÐUR EN, ÁÐUR EN AÐ).

FYRST, tagged as P introducing CP-ADV when the meaning is 'since', as in I will do it since you won't. When it is a temporal adverb, it is tagged ADVS (though there may be some inconsistency about whether it is tagged ADV or ADVS in the corpus).

FYRSTA unlike the English corpora, FYRSTA is tagged ADJ (not ADJS, i.e. not superlative), projecting an NP, in the PP í fyrstu 'at first' but not as ADV in that case. A strong argument for not doing it as in the English corpora is that FYRSTA can have a determiner, cf. í fyrstunni. For the ordinal number form FYRSTA as in í fyrsta skipti 'for the first time', see FYRSTI. For the temporal adverb parallel to English first, see FYRST.

(PP (P í)
    (NP (ADJ-D fyrstu)))
(PP (P í)
    (NP (ADJ-D fyrstu$) (D-D $nni))))

FYRSTI, the ordinal number FYRSTI is tagged ADJ.

G

GAMAN is tagged as an N by default. However, when modified by an adverb it is treated as an adjective.

GEGN, GEGNUM can be P when it takes a complement, otherwise ADV. See also PP

GER see GJÖR

GERA is tagged DO, DODI, DOPI, etc., in all meanings. GERAST, however, is not tagged DO; GERAST is VB in the meanings "to happen" and "to become" in which case it takes a predicate. However, it is wise to include both DO and VB in searches for GERAST. See also Lemmatization.

GJÖR is usually counterpart of the DO verb GERA, but when it means 'better' it is ADVR (comparative, the superlative is GJÖRST/GERST). GER sometimes has the meaning 'good' and is tagged as ADJ.

GJÖRSVOVEL (subject to revision) we split this up into VBPI, ADV, and ADV, and parse normally. In this way, "við bara gjörsvovel og veiðum hann" (= "we just go ahead and catch him") will be split into two matrix tokens, and it will be necessary to search for "(VBDI gjör$)" in order to find such examples.

GETA when it takes a participle and means "be able to", it is tagged as a modal (MD). See modals MD. When GETA means 'mention' it is a regular verb, VB.

GIFTA can be a double object verb, taking NP-OB1 and NP-OB2.

GIFTAST where this verb takes a single object, who is the person that the subject is marrying, that object is NP-OB2.

GÆR as in "í gær" is tagged N-A.

H

HANDA projects NP in 'til handa honum' but PP in 'handa honum'

HÁLFUR usually NUM, following PPCME2 guidelines for HALF.

HÁTTUR, see MEÐ SAMA HÆTTI

HEILL tagged ADJ.

HEILSA takes an NP-OB1 object, unlike ÞAKKA.

HEIM, HEIMA is tagged ADV. This is not like "home" in the English corpora, which is N. HEIMA is different because unlike English "home", HEIMA is generally not used as a noun (except in the construction "að eiga heima").

HEIMKOMINN "heim" is split off as an ADV projecting an ADVP-DIR, and "kominn" is VBN.

HELDRI is ADJR, see HELSTUR for ADJS.

HELDUR is ADVR; where it means "but", we assume there is a silent "but" or no conjunction. HELDUR can occasionally be tagged FP, especially if it appears to participate in the NEG...BUT construction. In the FP use, HELDUR can be translated as English "only" (in the NEG...BUT construction, NEG and NEMA together mean "only"). Click on the link for more information, as well as for examples of the HELDUR EN ("rather than") construction.

Please note that the NEG...BUT construction is *not* the same (does not have the same meaning as) NEG inside of a conjunction with HELDUR, which also occurs.

"Allra helst" consists of an ADVS and an NP-POS, projecting an NP-ADV (preliminary version). Similarly, "allra flest" consists of an QS and an NP-POS, projecting an NP-MSR (preliminary version).

HELSTUR is tagged ADJS. HELST can also occur as an ADVS.

HÉR 'here' is usually tagged as ADVP-LOC. When it does not have a locative meaning, as in hér eftir 'from now on', it is only tagged as ADVP.

HINN is tagged D. It is still tagged D even when it is used in the meaning 'other'.

HINUMEGIN meaning "on the other side" is tagged N-D, and it usually takes an NP-POS complement, as in "hinumegin árinnar" (="on the other side of the river"). It projects an NP-ADV.

HJÁLPA takes a dative object parsed as NP-OB2 rather than NP-OB1.

HUNDRAÐ and other quantity words that can occur in the plural (e.g. þÚSUND, TYLFT) are tagged N or NS, rather than NUM. This is following the English corpora guidelines for the plurals of such quantity words.

HVAÐ, sometimes WADV, as in HVAÐ ER ÞETTA MIKIÐ? (similar to HVE MIKIÐ ER ÞETTA?).

HVAÐA, as in Hvaða fólk er þetta, is tagged WD.

HVAÐAN AF, WPP like English whereto

HVAÐVETNA, HOTVETNA, or HVERSVETNA, usually tagged as Q, projecting an ADVP-LOC. However, in older texts, it can also be a wh-word.

HVAR, meaning 'where' tag as WADV

HVARS, old form for hvar es (= "hvar er"), see also ÞARS

HVARVETNA, HORVETNA, when it does not intruduce a CP, it is tagged as WADV projecting ADVP-LOC.

HVATKI

HVE, HVERSU 'how', tag as WADV, like English how.

HVERIGUR is sometimes tagged WD.

HVERNIG, HVERNINN

HVER meaning 'each' (as in HVER ANNAR 'each other') is tagged Q, but WPRO when it means 'who' (interrogative pronoun or relative pronoun).

HVERGI meaning "nowhere" is tagged Q+ADV, it projects ADVP-LOC or ADVP-DIR. HVERGI can also be a quantifier, meaning 'every (one)'.

HVERT is tagged as WADV.

HVÍ introduces CP-QUE. It usually means 'why' and is tagged WADV. However, in older Icelandic, it sometimes is the dative form of HVAÐ 'what' (as in "Hví sætir það?"). Then it is tagged WPRO-D.

HVÍLÍKUR can be tagged SUCH, or it can be WD, in which case it functions as the wh-word counterpart to words like SLÍKUR and ÞVÍLÍKUR.

HVOR, as in hvor hjá öðrum 'each with the other'

HVORKI, HVORTKI meaning "neither" and used with NÉ (= "nor") is tagged CONJ.

HVORT usually tagged WQ, in which case it means "whether" and introduces an indirect question. Occasionally it can be WPRO and project a WNP, as in the case of English "whether" when it means "which of two", and HVORT is tagged WPRO in the expression "hvort sem er" (see CP-FRL).

HVORTVEGGJA, HVORTTVEGGJA, and hvorirtveggja, hvorartveggja, etc., are tagged Q+NUM. HVORTVEGGJA is sometimes used as a correlative conjunction, similar to ANNAÐHVORT ... EÐA.

HVORUGUR, HVORGI is tagged Q when a quantifier, CONJ if it is part of a correlative conjunction ("hvorugt...né..."). See the documentation in the English corpora for EITHER, NEITHER.

HÆGRI 'right' and VINSTRI 'left', tag ADJ. These are frequently accusative, and may project an NP-DIR.

I

INNANTIL, utantil, útifyrir, etc. are tagged ADV and generally project an ADVP-LOC.

ITEM the Latin word for "also", used in lists. We tag it FW, like "etc.", and it is similarly attached at the IP level in most cases. (See also ETC).

J

'yes', tag INTJ. is sometimes tagged as an adverb (Það er jú markmiðið)

JAFN(T) usually ADVR or ADJR, though it can also be ADJ like SAMUR. For more information about the ADVR and ADJR use, see the page linked to JAFN(T).

JAFNFRAMT is tagged ADVR+P.

JAFNFÆTIS ADVR+ADV, which frequently has a, NP-CMP sister.

JAFNVEL is tagged FP when it means 'even' but ADVR+ADV when it means equally well, and similarly other compounds with JAFN- are tagged "ADVR+"...

K

KANNSKI 'perhaps, maybe', tagged ADV. KANN SKE is tagged (ADV (ADV21 KANN) (ADV22 SKE))

KONAR, tag N-G. It is usually modified by a quantifier, cf. ALLS KONAR, and projects NP-POS. When written in one, ALLSKONAR is tagged Q+N-G; similarly EINSKONAR is tagged Q+N-G.

KRING, KRINGUM 'around, round', as in round the edges of the flowers, is tagged P:

(PP (P kringum)
    (NP (PRO-A hana)))

As in the case of MILLI, KRING(UM) always projects a PP, even if it is sometimes intransitive.

                            (IP-SUB (ADVP-TMP *T*-2)
				    (NP-SBJ (NPR-N Pétur-pétur))
				    (VBDI ferðaðist-ferðast)
				    (PP (P um-um)
					(PP (P kring-kring)))
				    (IP-INF-PRP (NP-OB1 (Q-G allra-allur))
						(TO að-að)
						(VB vitja-vitja)))))
( (IP-MAT (CONJ og-og)
	  (NP-SBJ (Q-N allar-allur) (NS-N ekkjur$-ekkja) (D-N $nar-hinn))
	  (VBDI flykktust-flykkjast)
	  (ADVP-LOC (ADV utan-utan))
	  (PP (P um-um)
	      (PP (P kring-kring)
		  (NP (PRO-A hann-hann))))
	  (IP-PPL (VAG grátandi-gráta))))

See also UMKRINGIS. See KRINGUR for cases like í krók og kring

KRINGUR is a noun, as in í krók og (í) kring

L

LANGUR usually ADJ and LENGRI is usually ADJR, both frequently project NP-MSR like English "further/farther"; see especially NP-MSR#NP-MSR_heading_ADJP. See also FRAMAR.

LANGTUM, parsed as NP-MSR.

LENGI is ADV, as in "hann var þar lengi" (lit. he was there long, i.e. 'he stayed there for a long time'). However, LENGI, like LANGUR, frequently projects an NP-MSR. Not to be confused with forms of LANGUR.

LIFA (verb, meaning live), often takes NP-MSR. "He lived 80 years."

LIFANDI, tag VAG

LÍFS, usually used as a predicate, "Hann er lífs" 'He is alive'. LÍFS projects an NP-POS which projects NP-PRD. With verbs like KOMAST, as in "Hann komst þaðan lífs", LÍFS projects NP-ADV.

LÍKA 'also', tag as ALSO; can exceptionally be ADVR and license a CP-CMP in Oddur Gottskálksson´s New Testament.

LíKT is ADVR when it licenses a comparative.

LÍTILL, LÍTIÐ 'little, not much', tag as ADV in "Þær þekktust lítið", cf. "Þær þekktust vel". Otherwise it is usually Q, QR, or QS, parallel to MIKIÐ provided it cannot be replaced by "smár" in the given usage. Anywhere that LÍTILL can be replaced by "smár", "smærri", or "smæstur", it is tagged ADJ, ADJR, or ADJS as appropriate. In cases in doubt (as to the precise meaning of the word in context), the default is Q (or QR, QS). See also NP-MSR.

M

MANNGI, MANGI, tagged Q

MARGUR, tagged Q

MEÐ SAMA HÆTTI

MEÐAL is P. See also MILLUM, MILLI.

MEÐAN, Á MEÐAN, MEÐAN Á, Á MEÐAN Á; usually tagged P, projecting a PP and taking a CP-ADV complement, as with English "while". When MEÐAN takes no complement, it is tagged ADV and projects ADVP (this is the parse whether or not MEÐAN is the complement of another preposition). When MEÐAN takes no complement and occurs without Á, it generally projects an ADVP-TMP.

MEÐFERÐ tagged N, and frequently projects an NP-ADV as in "En-en kóngar-kóngur þeir-sá sem-sem þú-þú hefir-hafa meðferðar-meðferð eru-vera mínir-minn fangar-fangi" (usually "meðferðis") in the modern language.

MEGIN tagged N. In "öðru megin", MEGIN is tagged N-D and the phrase projects NP-ADV.

MIÐUR

MIKIÐ, MIKILL This is tagged as a quantifier, Q, QR (for MEIRA), or QS, when it cannot be replaced by "stór", "stærri", "stærstur". If it can be replaced by "stór", then it is tagged ADJ, ADJR, or ADJS as appropriate. In cases in doubt (as to the precise meaning of the word in context), the default is Q (or QR, QS). See also NP-MSR.

MILLI/MILLUM tagged as P, as in (PP (P millum) (NP bæja)) or (PP (P á) (PP (P millum (NP bæja))). If there is no complement of MILLI there can be cases where the PP idomsonly (P milli).

MITT can be the neuter form of the ADJ "miður", but when it means "in the middle" and does not agree with some argument in number and case, we tag it ADV and it projects an ADVP-LOC.

When MITT is a measure phrase, it is ADJ-A and projects NP-MSR.

MIÐUR as in "því miður":

	      (ADVP (NP-MSR (PRO-D því-það))
		    (ADVR miður-miður))

MISKUNNA takes NP-OB1.


See also NP-MSR.

MJÖG is tagged ADV. When it occurs alone and means "much" or "a lot", it projects NP-MSR.

MÓTI as in á móti honum tagged "N", even when it means "facing", e.g. "hvor á móti öðrum". In this construction, "móti" is considered a dative N taking an NP-COM, analogously to English "side"; see the PPCME2,PPCEME guidelines on Complements of N and NP-COM. When the preposition is missing parse the whole phrase as a PP with silent P.

N

NÉ is tagged CONJ, like English "nor"

NEI 'no', tag as INTJ.

NEINN 'no one, (not) any', tag as Q. In older texts, NÉ EINN is written meaning NEINN. This is tagged (Q (Q21 né) (Q22 einn))

NEMA tagged as P, analogously to English "except". NEMA can occasionally be tagged FP, especially if it appears to participate in the NEG...BUT construction. In the FP use, NEMA can be translated as English "only" (in the NEG...BUT construction, NEG and NEMA together mean "only"). See also HELDUR and EN.

NÉ EINN, see NEINN.

NOKKUR usually Q. But when NOKKUÐ means "quite" or "somewhat", it is tagged ADV.

NÓGU, as in "nógu góður", is tagged ADVR.

NÓGUR, GNÓGUR is tagged ADJR, and it frequently licenses an IP-INF-DEG or CP-DEG.

NÁLÆGT is generally tagged ADJ and frequently projects an ADJP-LOC, like NÆR. It can also be ADV.

NÆR, NÆRRI, NÆSTUM, are either ADV or ADJ. See the discussion in ADVP#Complements_of_ADVP and ADJP#Complements_of_ADJ. Note that NÆR can sometimes mean 'when', in which case it is ADV. NÆR can also be the wh-word "when" in Vídalínspostilla, in which (exceptional) case it is tagged WADV.

NÆRINDIS is usually ADJ, but otherwise like NÆRRI below; see the discussion in ADVP#Complements_of_ADVP and ADJP#Complements_of_ADJ.

NÆSTA 'very', tag as ADV; when it means 'almost', see NÆSTUM

NÆST meaning "next" in "því næst" and "þessu næst" is tagged ADVS.

NÆSTUR when it is morphologically an adjective is tagged ADJS, whether prenominal or postnominal.

O

OG is most frequently tagged CONJ, but it can also be ALSO depending on the meaning. In comparatives OG is tagged P, including the case of "um leið og .." (= at the same time as). See CP-CMP.

OGSVO can be tagged ALSO, if it has that meaning.

OF tagged as ADVR when it means 'too' as in 'too much', P when it takes a complement, and RP otherwise (the last case applies often when of is a word that has no obvious meaning in Old Icelandic)

OFSÖGUR, in ofsögum sagt, OFSÖGUM projects NP-ADT.

R

RAUNAR, tagged as ADV.

S

SAKIR or SÖKUM is always tagged as NS, projecting an NP. It frequently projects a PP with a silent preposition, like MÓTI.

SAMAN is tagged ADV, usually attached high (Þeir komu saman). Sometimes it is though a part of an NP (Ég hitti þau saman, tímunum saman). For TIL SAMANS, see PP documentation.

SAMUR 'same', tag as ADJ

SAMT SEM ÁÐUR, see CP-CMP

SANNLEGA, two of these, SANNLEGA, SANNLEGA, are parsed as one ADVP, cf. English, TRULY, TRULY (Bible)

SEINN usually an ADV projecting an ADVP-TMP, when it is a clausal modifier denoting the time that something happened. It can also be tagged ADJ when it modifies a noun. See the PPCME2/PPCEME guidelines on EARLY and LATE.

SEM is always tagged as a complementizer, C. This is true even for comparatives such as feitur sem svín, 'as fat as a pig'; see CP-CMP for a full discussion. This is not the same as the treatment of "as" in the PPCME2, PPCEME, and in general, our treatment of comparatives differs somewhat. All comparatives are treated as clausal, i.e. involving a CP-CMP, in Icelandic.

When SEM introduces an adverbial clause, it is still a complementizer, and it simply projects a CP-ADV in which it occupies the C position.

SEM AÐ is treated as a single C, like: (C (C21 sem-sem) (C22 að-að)).

SÉRHVER is tagged Q, just like the quantifier usage of HVER.

SICUT is tagged FW. It frequently introduces a CP-CMP. However, it sometimes doesn't.

SÍÐAN, is usually an ADV projecting ADVP-TMP. It can also be a preposition (Ég hef verið hér síðan í gær 'I have been here since yesterday') which can introduce CP-ADV (Ég hef verið þreyttur síðan þú komst heim 'I have been tired since you came home')

SÍÐASTA is tagged ADJ (not ADJS, i.e. not superlative), projecting an NP, in the PP að síðustu(nni) 'at last'

SÍÐUR is tagged QR when it means "less", and it projects an ADVP in the PP "að síður".

SJALDAN 'seldom', sometimes SKJALDAN (e.g. in Ævisaga Jóns Steingrímssonar), projects ADVP-TMP

SJÁLFUR is tagged PRO, and is parsed as an NP-PRN when it modifies another pronoun, parallel to emphatic "himself", "herself", etc., in the English corpora.

SKIPA, when it means 'appoint' it takes an NP-SPR.

SKÖMMU, as in SKÖMMU SÍÐAR, is NP-MSR.

SLÍKUR, 'such' is tagged SUCH, like English such

SMÁR, taged ADJ. But "smám saman":

(NP-MSR (ADV Smám) (ADV saman))

SODAN, SODDAN, SVODDAN tagged SUCH.

SPYRJA, 'ask' takes ((NP-SBJ NOM) SPYR (NP-OB2 ACC) (NP-OB1 GEN))

STRAX is ADVP-TMP, it sometimes introduces CP-CMP, as in STRAX OG, STRAX SEM.

SVO is tagged as ADVR when it is a degree adverb, e.g. when it modifies an adjective or another adverb or occurs in a svo...að clause. As with English "so", when "svo" is not used in a degree sense (ADVR) or as a preposition (P), SVO is tagged ADV. In its adverbial (ADV) use, SVO can generally be paraphrased by "þannig" or "á þá leið" (in English, IN THAT WAY).

SVO SEM

STADDUR Although this is etymologically the participle of "steðja", it is generally used as an adjective and we tag it ADJ in most cases. On the very rare occasion that this parse is unlikely, STADDUR can also be tagged VAN or VBN.

STÓRUM when it means "much" is tagged ADV and projects an NP-MSR.

STUNDUM is tagged NS-D, and projects an NP-TMP.

SUMUR, SUMIR, tagged as Q.

SUNDUR is ADV, projecting an ADVP complement in Í SUNDUR

T

TÍÐUM is NS-D and projects NP-TMP, like STUNDUM.

TUGUR is tagged N, and it frequently occurs within an NP-MSR (though this need not always be true).

TRÁSS can be a P taking a dative complement, meaning "even though", like German trotz or Danish.

TVENNUR, TVISVAR, ÞRENNUR, etc. are tagged NUM, analogously to "once", "twice", "thrice" in English. These words are not tagged for case when they occur alone, but they are tagged for case if they appear inside a larger NP (e.g. "tvisvar sinnum").

TVÖFALDUR (TVEFALDUR) 'double', ÞREFALDUR, etc. are tagged NUM.

U

UMKRINGIS is tagged ADV projecting ADVP-LOC when it does not take a complement. When it does, it is tagged as P. This is similar to the convention for UMHVERFIS. When UMKRINGIS is written in two words, it is parsed as two Ps, similar to UM KRING (see KRINGUM).

UMHVERFIS When it does not have a complement, it is tagged ADV and projects an ADVP-LOC. When UMHVERFIS does take a complement, as in "umhverfis borgina", it is tagged P.

UNDIR is a P. In expressions like "undir eins", "undir hádegi", or "undir eins og CP-CMP", UNDIR is still tagged P and takes a PP complement. See also PP.

UNNVÖRPUM is NS-D with lemma UNNVARP, projecting NP-ADT.

UNS 'until', tag as P with CP-ADV as sister.

UTAN can be ADV or P, depending on whether it takes a complement or not. It can also be FP if it occurs in the NEG...BUT construction, like NEMA.

V

VAKIUÐR, tag VAN.

VANUR Note that VANUR frequently takes IP-INF complements; see the discussion in IP-INF and ADJP.

VERÐA is tagged with its own tag, RD (RDDI, RDPI, RDDS, RDPS, etc.), in all uses. This is because, like BE, it has auxiliary and non-auxiliary uses in Icelandic. This matches the treatment of "werden" in Caitlin Light´s Early New High German corpus.

VETTUGI is tagged Q, meaning "nothing", "naught", as in "meta e-n að vettugi".

VINSTRI 'left' and HÆGRI 'right', tag ADJ.

VINSÆLL is an ADJ, even in sentences like "Hann var vinsæll af öllum mönnum" 'He was popular by every man'

VIRKA takes a small clause (IP-SMC) when it takes a predicate: "Þetta virkar ótrúverðugt".

VOÐA is tagged as an adverb, as in voða lítið.

Y

YFIR is tagged P when it takes a complement and projects a PP, and RP when it does not take a complement. It can also be a degree modifier, like English OVER, in which case it is tagged ADVR: "yfir tvær þúsundir manns".

YFRIÐ is tagged as ADV when it modifies an adjective. YFRINN is, however, an adjective.

Ý

ÝMIST meaning "variously" is tagged Q without case and projecting an ADVP.

Ö

ÖLDUNGIS can be FP or ADV, depending on meaning.

ÖNDVERÐUR is ADJ. It projects an ADJP, even when it is inside a PP.

ÖRSKOT is tagged N, but it can project an NP-ADV in many contexts.

ÖÐRUVÍSI is tagged ADV. There may be some inconsistencies currently on this point, but it is easy to find.

Þ

ÞAÐ The object pronoun ÞAÐ is always tagged PRO. The subject pronoun ÞAÐ is tagged as PRO in any syntactic context where it *never* disappears under subject-finite-verb inversion. In Icelandic, when the finite verb fronts over the subject under V-to-C movement in matrix clauses or embedded topicalizations, truly non-expletive ÞAÐ will still surface in subject position, but truly expletive ÞAÐ will disappear. In any syntactic context in which ÞAÐ would disappear under subject-verb inversion, it is tagged ES (in accordance with Caitlin Light's Early New High German corpus). Such contexts include, at least: weather expressions, impersonals, and existentials.

ÞAÐ is tagged ES even in contexts where there is inter-speaker or inter-text (or diachronic) variation with regard to whether it disappears.

See Expletives also, for *exp*, which is the empty category corresponding to "(ES ÞAÐ)". Note that when ÞAÐ disappears under verb-movement, "(NP-SBJ *exp*)" is only inserted in the sentence if there is no other possible subject (e.g. it is not inserted in subject-postposition constructions).

ÞAÐ ER AÐ SEGJA, ÞAÐ ER, Þ.E. 'that is to say'

ÞANNIG

ÞANGAÐ TIL, ÞAR TIL 'until' and ÞAR TIL AÐ, ÞANGAÐ TIL AÐ, 'until that', introduce an adverbial clause

ÞAR Á MEÐAL, treated like ÞAR Á MILLI

ÞAR Á MILLI, second PP idoms a P and a trace of the R-pronoun "þar".

ÞARS, old form for þar es (= "þar er"), split "(ADV þar$-þar) (C $s-er), see also HVARS.

ÞÁ ER, ÞÁ ÞEGAR

ÞEGAR, 'when', is tagged as P when it introduces an adverbial clause (i.e. in the case that there is no antecedent for the "when"-clause), CP-ADV, so þegar Norðmenn tóku ..., 'when Norwegians took ...', is (PP (P þegar) (CP-ADV (C 0) (IP-SUB (NP-SUB Norðamenn) (VBD tóku) (...))).

However, ÞEGAR is tagged WADV when it introduces an indirect question or a relative clause, just as with English WHEN in the PPCME2/PPCEME.

ÞEGAR can also be tagged ADV, projecting an ADVP-TMP or an ADVP where it unambiguously means "immediately" or "promptly".

ÞEIM MUN, ÞESS AÐ

ÞEYGI Icel. '(þó) eigi', '(though) not', tag (ADV þ$) (NEG $eygi).

ÞÓ, ÞÓTT 'while' and ÞÓ AÐ, ÞÓTT AÐ, 'while that', introduce an adverbial clause (cf. PPCME)

ÞVÍ is normally a dative pronoun, PRO. However, when it is a wh-word, meaning "hví" ("why"), it is tagged WADV. Where ÞVÍ means "because" by itself, or appears to function as an adverb by itself roughly meaning "therefore", it is still tagged PRO: see CP-THT for more information on how these constructions are parsed.

ÞVÍSA, tag as D-D.

ÞVÍLÍKUR is tagged SUCH, in the same was as SLÍKUR

for þar, þangað, þaðan, as in þar sem (e. where), þangað sem (e. to where), þaðan sem (e. from where), see CP-REL

Æ

ÆTLI