The development of the Icelandic Error Corpus at our lab has been going pretty well lately. Thanks to Xindan Xu’s data analysis magic, here is a nice graphs of the annotation progress over the last few months. Move the mouse over the data points for details.

This general Icelandic error corpus is at version 0.9.4 when this is written but new editions are pushed out regularly with corrections and additions. The latest version at any given time is available at Github under a CC-BY 4.0 license, free for anyone to tinker with and do something cool that we have not thought of yet.

The annotation scheme includes just over 200 error codes and these are still evolving. This is all made possible by the great work of everyone at the Language and Technology Lab and the Icelandic government’s Language Technology Programme as well as the government’s COVID-19 stimulus package. It would also not be possible without a strong commitment by the Icelandic Language Technology community to keep its infrastructure free and open source.

Importantly, our friends, collaborators, and industry partners at Miðeind are actively working on delivering really cool products that make use of this resource and at this point, a large part of the improvements to the corpus come directly from constant back and forth between our lab and Miðeind. It is a really wonderful dynamic and it feels good to have the research flow directly into exciting and innovate software products outside academia.

Lots of new updates are to be expected; stay tuned for more!