Difference between revisions of "Annotation Process"

From Icelandic Parsed Historical Corpus (IcePaHC)
Jump to: navigation, search
(Syntax of the notes file)
(Review)
Line 47: Line 47:
  
 
==Review==
 
==Review==
 +
* Add notes to the existing collection of notes for this file, do not create a new file!
 
* Do not use the review to point out that there is a potential ambiguity in the sentence to discuss. The previous annotator already spent time on making a decision. If you believe an ambiguity was resolved in a wrong way, change the parse, otherwise the parse should not be changed. If you are unsure whether it should be changed, do not change it.
 
* Do not use the review to point out that there is a potential ambiguity in the sentence to discuss. The previous annotator already spent time on making a decision. If you believe an ambiguity was resolved in a wrong way, change the parse, otherwise the parse should not be changed. If you are unsure whether it should be changed, do not change it.
 
* Some decisions are necessarily judgment calls. Do not spend time on those unless you disagree quite strongly with the previous parse. Those include:
 
* Some decisions are necessarily judgment calls. Do not spend time on those unless you disagree quite strongly with the previous parse. Those include:
 
** PP-attachment (which does usually not have serious effects on searching anyway)
 
** PP-attachment (which does usually not have serious effects on searching anyway)
 
* At the end of a review, look over any DISCUSS points you made and see if they can be eliminated by making a clear decision
 
* At the end of a review, look over any DISCUSS points you made and see if they can be eliminated by making a clear decision

Revision as of 12:33, 11 April 2010

This is a guide for the local annotation team only. This stuff is under construction.

Every file that is edited has exactly one file with notes about its edit history. If the file name is piltur1.psd, the corresponding notes file is piltur1.notes.txt.

Syntax of the notes file

  • For each sentence there is a section in the file that starts with its number.
  • Each section is an alphabetized list of notes about the sentence in question.
  • Each note starts with the initials of the annotator who wrote it.
  • The format is always exactly the same

Example:

1)
a) AKI: changed lemma of "ekki" from "ekki" to "ekkert"

2)
a) AKI: added missing expletive subject to IP-MAT 
b) EFS: changed tag of "epli" from N to NS.

Note categories

Every note is classified according to its nature. The types of notes are as follows:

  • (no label), change to correct an error in the file. This is the default and the most common kind of a note -- therefore no label is needed.
  • NOTE, important information about the parse that does not reflect a change. This is typically used by the first annotator to share some information with the reviewers. This includes, in particular, arguments for the parse that resulted from a difficult choice between two or more alternatives (in which case citing documentation may be a good idea). NOTE can also be used to express that the meaning of the sentence is unclear to the annotator.
  • DISCUSS, a request that something is discussed among the annotators. Use sparingly and try really hard to come to a clear conclusion that results in a clear decision (change or keep previous parse). DISUSS should be used when there is an apparent inconsistency in the corpus or the documentation -- the goal of DISCUSS should be to increase consistency when needed.

Example:

1)
a) JW NOTE: made "að honum látnum" be an IP-SMC complement of P because it looks a lot like English examples with "with" (cf. url-to-docs)
b) AKI: changed lemma of "ekki" from "ekki" to "ekkert"
c) AKI DISCUSS: Treatment of NP-PRN is not consistent with NP-SBJ in "file-x.psd" sentence 4. We should decide between those two parse, correct it in one of the places and document the decision.

2)
a) JW NOTE: I'm not sure what "jarteinir" means here, can this be something other than a noun?
            AKI: yes, this is a verb in this context! changed parse accordingly
b) AKI: added missing expletive subject to IP-MAT 
c) EFS: changed tag of "epli" from N to NS.

First annotator

  • If you don't understand the sentence properly, pick a plausible parse and make a NOTE in the notes file.
  • If you spend a lot of time making a decision (studying documentation, etc.) or if you believe the reviewer(s) need to know about some argument for the parse, make a NOTE and cite documentation
  • If you are unsure of the parse after spending some time, make a NOTE (like "AKI NOTE: Unsure of parse").

Review

  • Add notes to the existing collection of notes for this file, do not create a new file!
  • Do not use the review to point out that there is a potential ambiguity in the sentence to discuss. The previous annotator already spent time on making a decision. If you believe an ambiguity was resolved in a wrong way, change the parse, otherwise the parse should not be changed. If you are unsure whether it should be changed, do not change it.
  • Some decisions are necessarily judgment calls. Do not spend time on those unless you disagree quite strongly with the previous parse. Those include:
    • PP-attachment (which does usually not have serious effects on searching anyway)
  • At the end of a review, look over any DISCUSS points you made and see if they can be eliminated by making a clear decision