Difference between revisions of "Annotation Process"

From Icelandic Parsed Historical Corpus (IcePaHC)
Jump to: navigation, search
(First annotator)
(First annotator)
Line 62: Line 62:
 
<pre>
 
<pre>
 
1)
 
1)
a) JW NOTE: made "að honum látnum" be an IP-SMC complement of P because it looks a lot like English examples with "with" (cf. url-to-docs)
+
a) JW NOTE: made "að honum látnum" be an IP-SMC complement of P because it looks a lot like  
 +
            English examples with "with" (cf. url-to-docs)
 
b) AGREE, AKI: changed lemma of "ekki" from "ekki" to "ekkert"
 
b) AGREE, AKI: changed lemma of "ekki" from "ekki" to "ekkert"
 
c) AKI DISCUSS: Treatment of NP-PRN is not consistent with NP-SBJ in "file-x.psd" sentence 4.  
 
c) AKI DISCUSS: Treatment of NP-PRN is not consistent with NP-SBJ in "file-x.psd" sentence 4.  

Revision as of 13:11, 11 April 2010

This is a guide for the local annotation team only. This stuff is under construction.

Documenting the annotation history of a file

Every file that is edited has exactly one file with notes about its edit history. If the file name is piltur1.psd, the corresponding notes file is piltur1.notes.txt.

Syntax of the notes file

  • For each sentence there is a section in the file that starts with its number.
  • Each section is an alphabetized list of notes about the sentence in question.
  • Each note starts with the initials of the annotator who wrote it.
  • The format is always exactly the same

Example:

1)
a) AKI: changed lemma of "ekki" from "ekki" to "ekkert"

2)
a) AKI: added missing expletive subject to IP-MAT 
b) EFS: changed tag of "epli" from N to NS.

Note categories

Every note is classified according to its nature. The types of notes are as follows:

  • (no label), change to correct an error in the file. This is the default and the most common kind of a note -- therefore no label is needed.
  • NOTE, important information about the parse that does not reflect a change. This is typically used by the first annotator to share some information with the reviewers. This includes, in particular, arguments for the parse that resulted from a difficult choice between two or more alternatives (in which case citing documentation may be a good idea). NOTE can also be used to express that the meaning of the sentence is unclear to the annotator.
  • DISCUSS, a request that something is discussed among the annotators. Use sparingly and try really hard to come to a clear conclusion that results in a clear decision (change or keep previous parse). DISUSS should be used when there is an apparent inconsistency in the corpus or the documentation -- the goal of DISCUSS should be to increase consistency when needed.

Example:

1)
a) JW NOTE: made "að honum látnum" be an IP-SMC complement of P because it looks a lot like English examples with "with" (cf. url-to-docs)
b) AKI: changed lemma of "ekki" from "ekki" to "ekkert"
c) AKI DISCUSS: Treatment of NP-PRN is not consistent with NP-SBJ in "file-x.psd" sentence 4. We should decide between those two parse, correct it in one of the places and document the decision.

2)
a) JW NOTE: I'm not sure what "jarteinir" means here, can this be something other than a noun?
            AKI: yes, this is a verb in this context! changed parse accordingly
b) AKI: added missing expletive subject to IP-MAT 
c) EFS: changed tag of "epli" from N to NS.

How to parse and review parses

General principles

  • Always make sure that all of your notes are labeled with your initals
  • Use the Checklist
  • Be careful not to spend too much time on decisions
  • Be careful not to make notes that cause unnecessary delays or discussions
  • Still, if something really needs to be discussed, discuss it

First annotator

  • Create a notes file. If the file name is piltur1.psd, the corresponding notes file is piltur1.notes.txt.
  • If you don't understand the sentence properly, pick a plausible parse and make a NOTE about the problem in the notes file.
  • If you spend a lot of time making a decision (studying documentation, etc.) or if you believe the reviewer(s) need to know about some argument for the parse, make a NOTE (and cite documentation if you think that will be useful)
  • If you are unsure of the parse after spending some time on it, make a NOTE (like "AKI NOTE: Unsure of parse").
  • React to changes made to the file by reviewer(s) as necessary
    • write AGREE/DISAGREE in front of each change that has been made
    • discuss the DISAGREE points with the reviewer who made them
1)
a) JW NOTE: made "að honum látnum" be an IP-SMC complement of P because it looks a lot like 
            English examples with "with" (cf. url-to-docs)
b) AGREE, AKI: changed lemma of "ekki" from "ekki" to "ekkert"
c) AKI DISCUSS: Treatment of NP-PRN is not consistent with NP-SBJ in "file-x.psd" sentence 4. 
                We should decide between those two parse, correct it in one of the places and document the decision.

2)
a) JW NOTE: I'm not sure what "jarteinir" means here, can this be something other than a noun?
            AKI: yes, this is a verb in this context! changed parse accordingly
b) DISAGREE, there is a subject there already!, AKI: added missing expletive subject to IP-MAT 
c) AGREE, EFS: changed tag of "epli" from N to NS.
  • Make sure that you track the state of your file until it has been placed in the "finished" directory (the first annotator of a file is responsible for the file)

Review

  • Add notes to the existing collection of notes for this file, do not create a new file!
  • Do not use the review to point out that there is a potential ambiguity in the sentence to discuss. The previous annotator already spent time on making a decision. If you believe an ambiguity was resolved in a wrong way, change the parse, otherwise the parse should not be changed. If you are unsure whether it should be changed, do not change it.
  • Make sure that all changes you document in the notes file are reflected in the updated version of the psd-file
  • Some decisions are necessarily judgment calls. Do not spend time on those unless you disagree quite strongly with the previous parse. Those include:
    • PP-attachment (which does usually not have serious effects on searching anyway)
  • At the end of a review, look over any DISCUSS points you made and see if they can be eliminated by making a clear decision