Annotald

From Icelandic Parsed Historical Corpus (IcePaHC)
Jump to: navigation, search

Annotald is a program that facilitates editing of phrase structure in labeled bracketing format. The program is tightly integrated with the Icelandic Parsed Historical Corpus (IcePaHC). The program, as well as this documentation, is a work in progress, and should be expected to be incomplete in various ways. Users are encouraged to save their files frequently and back them up to a repository to prevent loss of data.

Installing and running

In addition to the Annotald program files, included in the treedrawing directory of the IcePaHC git repository, the following 3rd party programs and libraries are required:

  • Python 2 (not compatible with Python 3, yet)
  • CherryPy 3 (not compatible with CherryPy 2)
  • Google Chrome (not compatible with any other browser)

CherryPy is required in order to launch the Annotald web server that handles communication between the web-based Annotald user interface and manipulation of files on the hard drive. To install CherryPy 3 on Ubuntu, type in a terminal:

sudo apt-get install python-cherrypy3

To run Annotald from the IcePaHC parsing directory and open the file "filename.psd", type:

./annotald filename.psd

This is equivalent to:

python ../treedrawing/treedrawing.py filename.psd

When the program runs it starts a webserver on port 8080. You can then edit "filename.psd" by directing Google Chrome to the following path:

http://localhost:8080

General philosophy

Annotald is designed to maximize the speed of phrase structure annotation. It is important for the understand the philosophy of the design to make the most of the program. First, most tasks can be performed without ever taking the right hand off the mouse and the left hand off its normal position at the left side of the keyboard. Sometimes it is possible to enter a phrase label or tag using the full keyboard, but this is rarely a good idea, since it takes the annotator's hands out of the previously described position where all the shortcuts are accessible.

Selecting and unselecting nodes

Nodes are selected by left clicking them. The first node selected becomes the "startnode" and the second node selected becomes the "endnode" for the purposes of commands that deal with more than one node at a time. Only two nodes can be selected at any one time. Clicking a selected node unselects the node. When one of two selected nodes is unselected, the remaining selected node becomes the "startnode", whether it had that status before or not.

Clear selection with space bar

The fastest way to clear the selection entirely, including both nodes if two are selected, is to press the space bar. Another way to unselect everything is to left click the highlighted area to each side of editing area where the sentences are displayed. Left clicking the area to the left of the editing area is a fairly fast way clear the selection.

Moving nodes around

Left click + Right click

A node that has been selected can be moved under another node by right clicking the target node. Clause level nodes are highlighted (using a different background color) to make it easier to identify the current "floor" of the sentence. Annotald will not allow the user to move nodes in a way that violates the linear order of the words in the sentence, or results in an otherwise ill-formed structure.

Select 2 sisters + Right click

Multiple nodes can be moved at once by making use of a "startnode" and an "endnode". Select two nodes that are sisters and right click a target to move the two sisters and all their sisters that are in between them. Annotald will again not allow movement that results in an ill-formed structure.

Splitting and merging tokens

Splitting and joining tokens is done like any other moving of nodes in Annotald. Move a node or nodes to the leftmost part of the editing area using the left click + right click method to split it away from its current token. Move the root node of a token the same way inside an adjacent token to merge tokens.

Changing labels

The label of a selected node can by changed using shortcuts or by typing it in. The latter method should be avoided for it takes a longer time for a trained user and introduces typing mistakes as a potential source of errors.

Mouse shortcuts

Many of the most common label changes can be performed using mouse shortcuts. To bring up a context menu for a node, right click the node. When no node is selected (no "startnode" or "endnode" by means of left clicking) any node can be right clicked to bring up its menu. When one node is selected, that node, and only that node, has an active context menu. The context menu is not active at all when two nodes are selected.

Simple relabeling from context menu

A simple relabeling of the node is done by selecting an option from the "Label" section of the context menu. A "label" section appears for all nodes. In most cases Annotald offers labels that belong to a set of labels related to the current label, but as a fallback it shows verb tags, when it doesn't have a definition for what amounts to a related label.

Changing the case of a phrase or tag

When a right click is used to bring up the context menu for a phrase that has case, most commonly a noun phrase, a special "case" section appears in the context menu. Selecting case from the context menu of a phrase changes all the case extensions on tags that are immediately dominated by the phrase. This saves time when correcting case on multiple items, like when an NP has a quantifier, an adjective and a noun. The "case" section is also displayed in the context menu for tag nodes that have case, but only one tag can be corrected at a time using that feature.

Keyboard shortcuts

Changing a label using keyboard shortcuts can be very fast. When a node is selected (always the "startnode"), an annotator can pick a label for it using shortcuts on the left side of the keyboard. For this reason, the left hand should always rest on the left side of the keyboard while annotating. Various keys on the left side of the keyboard represent a label group:

NOTE: The shortcuts might change a little bit as the program moves from beta to stable

  • E = CP-ADV, CP-CMP
  • R = CP-REL, CP-FRL, CP-CAR, CP-CLF
  • S = IP-SUB, IP-MAT, IP-IMP
  • V = IP-SMC, IP-INF, IP-INF-PRP
  • T = CP-THT, CP-THT-PRN, CP-DEG, CP-QUE
  • G = ADJP, ADJP-SPR, NP-MSR, QP
  • F = PP, ADVP, ADVP-TMP, ADVP-LOC, ADVP-DIR
  • 2 = NP, NP-PRN, NP-POS, NP-COM
  • Q = CONJP, ALSO, FP
  • W = NP-SBJ, NP-OB1, NP-OB2, NP-PRD

When one of the above shortcut keys is pressed, the selected (startnode) is updated. If the previous label is not in the group of the key just pressed, the label is set to the first item in the list. If the previous label is in the group of the key, then pressing the key repeatedly rotates through the labels in the group. Therefore, and XP can be changed into NP-OB1 by pressing the W key twice.

L + Typing

Pressing the L button triggers the edit label mode. This key is on the right side of the keyboard since the user will anyway have to do the rare move of the right hand away from the mouse to do the typing. When L is pressed an edit box is displayed with the current label selected. To replace the label entirely, type away and the previous label will disappear. To change only the last part of the label, like adding or changing an extension to the tag, press space bar while in edit mode, which will unselect label and place the caret at the end of it. Press Enter to confirm the label that has been typed.

Creating nodes

X for XP

Pressing X creates an XP over the currently selected node or nodes. The XP remains selected for further editing. To create an NP-SBJ over a selected node (or a span of nodes by selecting a starnode and an endnode), press X and then W. To create an NP-OB1, press XWW. A node with a label that is not accessible from the keyboard shortcuts can be created by using XL, which opens edit mode for the newly created node.

Ctrl + Left Click

When a startnode has been selected, holding down Ctrl and left clicking a sister of the startnode creates an XP that spans the area from the startnode to the node that was clicked. This has the same effect as left clicking to add an endnode to the selection an then pressing X to create the node.

B and A for new leafs before and after

B and A are shortcuts for creating arbitrary leafs before are after the selected node. This can be used for empty categories like expletive subjects. The edit mode for the new leaf works the same as the edit mode for labels. Typing away replaces the content of the box, spacebar moves the caret to the end for editing the last part. The tab key is used to move between the two boxes. Enter confirms what has been typed.

B and A for traces before and after

When two nodes are selected, a startnode and an endnode, B and A work like in the above section, but create a trace of the endnode before/after the startnode. The label of the endnode is copied to the newly created trace. If the endnode is a W-phrase of some sort, a *T* trace is created and the W is removed from the copied label that appears over the trace. If the endnode is not a W-phrase, an ICE trace is created. Pressing Enter after B/A confirms the initial suggestion, but editing is possible and works as in other uses of the B/A shortcuts.

Leaf before from context menu

Right clicking a node brings up the context menu. At the right side the section "Leaf before" provides a fast way to add the most common empty categories. This is the preferred method to create empty subjects and empty C and P heads in Annotald.


Delete/prune node (D)

Pressing the D key prunes a selected phrase node. D normally has no effect on tag nodes since words should not be deleted. The exceptions are empty categories (start with *), comments (in curly brackets) and special tags (start with lt-sign), which can be deleted using the D key.

CoIndex (C)

Pressing C co-indexes two selected nodes. If they are already coindexed, repeatedly pressing C rotates through combinations of - (normal) and = (gapping) indices. The order for selected startnode/endnode is:

  • --
  • -= (gapping)
  • =- (backwards gapping)
  • == (coindex two instances of gapping)
  • remove all indices

Selecting a single indexed node and pressing C removes the index on that node.

Ordering of indices is reset to a nice organization on save if they become unordered during editing.

Undo (Z)/redo (1)

Undo can be triggered using the Z key or a button to the right of the editing area.

Redo can be triggered using the 1 key or a button to the right of the editing area.

Save

Use the button in the menu to the right of the edit area. The last save is copied into a backup file in the same directory and the current state of the file is saved under the same name as the original file. This is different from CorpusDraw, which saves under ".new". The method used in Annotald is more appropriate for a project that uses a version control system, since the version control system keeps track of the various stages of the file. The user is of course responsible for committing changes frequently enough in case there is a problem with Annotald.

Known issues

  • Splitting and joining words is not currently possible in Annotald. The current workaround is to save the file in Annotald, open it a text editor, do the change there, and use refresh to load the updated version into Annotald. This will be fixed soon.
  • Annotald does not give feedback when the save operation is complete. Save seems to always work though, but be careful and save and commit changes frequently.