Anton Karl Ingason

  • Increase font size
  • Default font size
  • Decrease font size
Anton Karl Ingason

Workshop on Formal Ways of Analyzing Variation (FWAV)

Print
We are planning a workshop on Formal Ways of Analyzing Variation (FWAV) which will be part of the 25th SCL (Scandinavian Conference of Linguistics) in Reykjavik, May 13-15, 2013.

We invite abstracts for 20 minute papers (plus 10 minutes for questions). Abstract submission for SCL workshops uses the same procedure as the general conference, so please refer to the general call for papers for guidelines (indicate that your abstract is for the FWAV workshop): http://conference.hi.is/scl25/call-for-papers/

Deadline: November 15, 2012
Notification of acceptance: December 1, 2012

Anton Karl Ingason ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Einar Freyr Sigurdsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Charles Yang ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )

Formal Ways of Analyzing Variation (FWAV)

Labov’s pioneering study on contraction and deletion of the copula in African American Vernacular English (1969) and subsequent work on linguistic variation and change has drawn substantial attention to the relationship between formal analysis and quantitative usage patterns. Robust quantitative regularities have been studied in synchronic as well as diachronic corpus data using a variety of theoretical frameworks. Recently available evidence shows that discrete acceptability judgments in syntax, drawn from a large sample of speakers, also manifest regular quantitative patterns (Thráinsson 2012).

This themed session is a venue for case studies on formal analyses of variation and its implications on grammatical theory, acquisition and change. A specific focus will be on the use of methodology which provide ready access to data and development tools to facilitate replication and extension of research results.

What do formal analyses of variation predict to be possible and impossible?

The session aims to investigate the empirical content of analyses of speaker variation. Representative research questions include, but are not limited to:
  • What are the limits of variation?
  • Do our analyses provide unifying accounts for apparently disparate clusters of linguistic properties?
  • How does the child analyze a heterogeneous pool of primary linguistic data?
  • What types of diachronic trajectories are consequences of language acquisition under variation?
  • Is the statistical distribution of variation constrained by grammatical factors?
  • How do we make the best use of statistical tools for formal linguistic analysis?
  • On a more practical note, the session hopes to contribute to the the practice of replicability, data access, and collaborative development.
What does the variation attach to?

We also ask about the relationship between the linguistic machinery and the mechanisms that are responsible for how speakers alternate between functionally equivalent variants. One line of research adopts the design of Chomskyan structure building while proposing independent mechanisms for acquisition of probabilities (Labov 1969, Kroch 1989, Yang 2002). A constraint based parallel is found in Stochastic OT (Boersma & Heyes 2001). Other proposals suggest that frequency distributions in language use are tightly interwoven with the grammar itself. Guy (1991) argued that repeated rule application in Lexical Phonology was responsible for an exponential decay in final -t/-d production in English. Anttilla (1997) and Adger (2006) have proposed analyses where usage probabilities reflect the number of times that equally likely paths through the grammar lead to a particular output. Coetzee (2004) suggested that the comparison-based nature of OT imposes an ordering on the frequency of variants. How can we compare and contrast such a multitude of formal proposals?

It may not be the case that all instances of variable usage are of the same nature. Even if we assume acquired probabilities are a part of a speaker’s knowledge about language, it may still be the case that the variation is due to other, non-linguistic, factors. Furthermore, different domains of language may be subject to different constraints on variation. It has been suggested that unlike phonology, syntax is less sensitive to social evaluation (Labov & Harris 1986) but a concrete formulation of this effect is quite a nuanced task (Ingason et al 2012). The role of interfaces is also important, since variables in syntax can be affected by constraints that operate across the interface, e.g. prosodic constraints on variation in other domains (e.g. Labov 1969, Anttila et al. 2010). Representative questions include:
  • Where does the variation come from and how can we distinguish the formal models empirically?
  • How do we know which type of mechanism is responsible for which part of language usage?
  • How does a formal analysis of variation handle different domains of language and the interfaces between them?
References

Adger, David. 2006. Journal of Linguistics 42:503–530.
Anttila, Arto. 1997. Deriving variation from grammar. In Variation, change, and phonological theory , ed. Frans Hinskens, Roeland van Hout, and W. Leo Wetzels, 35-68. Amsterdam: John Benjamins.
Anttila, Arto; Matthew Adams; and Michael Speriosu. 2010. The role of prosody in the English dative alternation. Language and Cognitive Processes. 25(7-9):946-981.
Boersma, Paul, and Bruce Hayes. 2001. Empirical tests of the gradual learning algorithm. Linguistic Inquiry 32:45-86. Available on Rutgers Optimality Archive, http://ruccs.rutgers.edu/roa.html. Coetzee, Andries. 2004. What it means to be a loser: Non-optimal candidates in Optimality Theory. Ph. D dissertation, UMass Amherst.
Fowler, Joy. 1986. The social stratification of (r) in New York City Department Stores, 24 years after Labov. NYU term paper.
Guy, G. R. 1991. Explanation in variable phonology. Language Variation and Change 3,1:1-22. Ingason, Anton Karl, Einar Freyr Sigurðsson and Joel C. Wallenberg. 2012. Antisocial Syntax. Disentangling the Icelandic VO/OV parameter and its lexical remains. Paper presented at DiGS, 14. Lisbon, 6 July 2012.
Kroch, Anthony S. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1:199-244.
Labov, William. 1966. The social stratification of English in New York City. Center for Applied Linguistics, Washington.
Labov, William. 1969. Contraction, Deletion and Inherent Variability of the English Copula. Language, 45,4:715-762.
Labov, William, and Wendell A. Harris. 1986. De facto segregation of black and white vernaculars. In Diversity and Diachrony, ed. D. Sankoff, 1–24. Philadelphia: John Benjamins. MacDonald, Jeff. 1984. The social stratification of (r) in New York City department stores revisited. Paper written for Anthropology 150, Anthropological Linguistics, for Nancy Bonvillain.
Thráinsson, Höskuldur. 2012. Ideal speakers and other speakers. The case of dative and other cases. Variation in Datives: A Micro-Comparative Perspective. Oxford Studies in Comparative Syntax, Oxford University Press, Oxford.
Yang, Charles. 2002. Knowledge and Learning in Natural Language. Oxford: Oxford University Press.

Last Updated on Sunday, 14 October 2012 16:46
 

IcePaHC 0.9. 1 million words of syntactically parsed (hand-corrected) Icelandic

Print
We are very pleased to announce that version 0.9 of the Icelandic Parsed Historical Corpus (IcePaHC) is now available for free download.

The corpus can be downloaded from:
www.linguist.is/icelandic_treebank/Download

The corpus is a treebank of over 1 million words in size, annotated for full phrase structure parse, and hand-corrected, using an adaptation of the annotation scheme used by the Penn Treebank and the Penn parsed corpora of historical English (http://www.ling.upenn.edu/hist-corpora/). Note that this release contains all of the text for version 1.0, but some minor corrections remain to be finished.

The corpus contains:

- 1 002 361 words total, consisting of ~100 000-word samples from each century from the 12th to the beginnng of the 21st century.
- Annotated with a phrase structure parse, part-of-speech-tagged, and lemmatized.
- The entire parse, pos-tagging, and lemmata for every sentence have been *hand-corrected*.
- Text samples are balanced for genre within each century.
- LGPL license: You are free to copy, modify and redistribute the corpus for research and/or profit with appropriate citation.

The corpus is distributed as raw UTF-8 data in labeled bracketing format and it is therefore compatible with various existing programs, including CorpusSearch (http://corpussearch.sourceforge.net/).

A plain text version without markup and a set of info files containing philological information accompany the corpus download.

The entire corpus may be downloaded in a plain text version, a platform-independent GUI, and a Windows-compatible GUI for ease of searching.

Further information on the annotation guidelines and project organization can be found on the project wiki:
www.linguist.is/icelandic_treebank/


Joel C. Wallenberg ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Anton Karl Ingason ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Einar Freyr Sigurðsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Eiríkur Rögnvaldsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
University of Iceland

We were grateful to receive support for this project through the following grants:

Icelandic Research Fund (RANNÍS), grant nr. 090662011,"Viable Language Technology beyond English – Icelandic as a test case".

U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), grant #OISE-0853114, "Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English".

University of Iceland Research Fund (Rannsóknasjóður Háskóla Íslands), grant Icelandic Diachronic Treebank (Sögulegur íslenskur trjábanki)

Last Updated on Monday, 29 August 2011 14:04
 

Available: IcePaHC 0.4 (now includes a visual Windows version)

Print
IcePaHC 0.4, the latest version of the Icelandic Parsed Historical Corpus, is now available for download:

http://linguist.is/icelandic_treebank/Download

- 440.000 words total, from every century between the 12th and the 19th centuries inclusive annotated for phrase structure, part-of-speech-tagged and lemmatized
- An optional easy-to-install visual user interface for Windows
- LGPL license: You are free to copy, modify and redistribute the corpus for research and/or profit

Joel C. Wallenberg ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Anton Karl Ingason ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Einar Freyr Sigurðsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Eiríkur Rögnvaldsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
University of Iceland

The project is funded by the following grants:

Icelandic Research Fund (RANNÍS), grant nr. 090662011,"Viable Language Technology beyond English – Icelandic as a test case".

U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), grant #OISE-0853114, "Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English".

--------------------------------

IcePaHC 0.4, íslenski trjábankinn (nú með Windows útgáfu)


IcePaHC 0.4, nýjasta útgáfa íslenska trjábankans, er komin út:

http://linguist.is/icelandic_treebank/Download

- Samtals 440.000 orð frá öllum öldum frá og með 12. öld til og með 19. öld, sem búið er að greina setningafræðilega, marka og lemma
- Einföld Windows uppsetning á myndrænu notandaviðmóti
- LGPL leyfi: Notendur geta afritað málheildina, breytt henni og endurútgefið vegna rannsókna og/eða í hagnaðarskyni

Joel C. Wallenberg ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Anton Karl Ingason ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Einar Freyr Sigurðsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Eiríkur Rögnvaldsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )

Verkefnið er styrkt af:

RANNÍS, styrk nr. 090662011, "Hagkvæm máltækni utan ensku - íslenska tilraunin".

U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), styrk #OISE-0853114, "Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English".
Last Updated on Tuesday, 12 April 2011 15:45
 

Fix accent problem in TexMaker on Ubuntu

Print
To fix the accent problem with TexMaker in Ubuntu where the accents stop going over the character but are instead written before them, so you get 'a instead of á (Icelandic, Spanish etc.), install the ibus-qt4 package:

sudo apt-get install ibus-qt4

Fedora equivalent:

yum install ibus-qt

(Source thread)

Easy fix for a very annoying and unpredictable problem. I have no idea why this happens occasionally without that package but according to online sources the bug affects some more Linux distributions even in the latest version of TexMaker.
Last Updated on Wednesday, 24 August 2011 09:36
 

Available: Icelandic Parsed Historical Corpus (IcePaHC), V0.2

Print
We are pleased to announce that version 0.2 of the Icelandic Parsed Historical Corpus (IcePaHC) is now available for free download.

The corpus is syntactically parsed, annotated for full phrase structure using an adaptation of the annotation scheme used by the Penn parsed corpora of historical English and other corpora in that tradition (see links from website). The corpus contains ca. 120.000 words from 6 different centuries (12th, 13th, 16th, 17th, 18th and 19th). Please note that this is a small portion of the ultimate goal for the completed corpus, ca. 1 million words from the 12th-19th centuries.

The corpus is distributed as raw UTF-8 data in labeled bracketing format and it is therefore compatible with various existing programs, including CorpusSearch.

The corpus can be downloaded from:
www.linguist.is/icelandic_treebank/Download

Further information on the annotation guidelines and project organization can be found on the project wiki:
www.linguist.is/icelandic_treebank/

We hope that this release will result in feedback that allows us to improve the resource for upcoming versions. Updates are released every three months - the upcoming 0.3 version will be released on January 1st 2011. Between releases, development can be tracked at our open repository at Github (http://github.com/antonkarl/icecorpus) but use of released versions is encouraged to ensure that results can be replicated.

Texts included in Version 0.2:
4585 words from The First Grammatical Treatise (entire text) (12th century)
8179 words from Íslensk hómilíubok (Icelandic book of homilies) (12th century)
3459 words from Egils saga (theta fragment) (13th century)
22719 words from Sturlunga saga (13th century)
20683 words from the New Testament's Gospel of John (1540)
16421 words from the New Testament's Acts (1540)
4521 words from Jón Indíafari's travelogue (1661)
22097 words from Jón Steingrímsson's biography (1791)
17837 words from Piltur og stúlka (novel by Jón Thoroddsen) (1850)
Total number of words: 120355

Joel Wallenberg ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Anton Karl Ingason ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Einar Freyr Sigurðsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
Eiríkur Rögnvaldsson ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
University of Iceland

The project is funded by the following grants:

Icelandic Research Fund (RANNÍS), grant nr. 090662011,"Viable Language Technology beyond English – Icelandic as a test case".

U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), grant #OISE-0853114, "Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English".
Last Updated on Friday, 01 October 2010 17:53
 
  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  4 
  •  5 
  •  Next 
  •  End 
  • »


Page 1 of 5