DescriptionLanguage complexity has been a hot and controversial topic in the past decade which still engages researchers from diverse areas of linguistics and beyond. This workshop aims to bring together complexity researchers from typology, second language acquisition research, psycholinguistics, computational linguistics, language evolution and other related fields. The goal is to evaluate and compare different measures of language complexity by means of a shared task.
The workshop will be organised into traditional keynote talks and presentations of the shared task. We will conclude the workshop with an interactive session in which the different measures will be evaluated. While the workshop is open for discussion of all topics relevant to research on language complexity, we will specifically focus on the following practical and theoretical issues:
- How do different complexity metrics correlate across parallel and non-parallel corpora, and other types of data?
- How well do different complexity metrics deal with different language types, i.e. are some language types/families easier or more consistently measurable than others?
- How well do measures within each domain correlate? Is it true that morphological complexity measures show better agreement than syntactic complexity measures (cf. Berdicevskis et al. 2018. Using Universal Dependencies in cross-linguistic complexity research. In: Proceedings of the Universal Dependencies Workshop 2018, EMNLP, Brussels, Belgium. Association of Computational Linguistics.)
- How robust are trade-offs, such as between morphology and syntax, across different measures and corpora?
- How do corpus-based complexity metrics correlate with the feature-based complexity information available in The World Atlas of Language Structures (WALS)?
Participation requirementsAfter acceptance, participants are required to apply their own complexity (or complexity-related) measure(s) to either (A) a parallel text database, or (B) a non-parallel annotated text database. Track (A) covers a typologically diverse database of 30 languages selected from the The World Atlas of Language Structures (WALS) database, https://wals.info/), and will consist of a parallel text corpus of these languages (e.g. Bible corpora such as http://homepages.inf.ed.ac.uk/s0787820/bible/). Track (B) will consist of an annotated non-parallel corpus, specifically, the Universal Dependencies corpora, http://universaldependencies.org/). The measures will be compared and evaluated, with special focus on the languages common to both track (A) and (B). Prior to the workshop, participants will be required to submit their results, calculations, and necessary scripts (if applicable), including detailed descriptions of their methodologies and explanation of their results. These will serve as basis for the statistical and theoretical evaluation in the interactive session.
Do not hesitate to contact us if you are unsure about the suitability of your measure for the proposed task.
SubmissionFor both tracks, please submit preliminary anonymised abstracts of 500-600 words. The abstracts should give a short overview of the theoretical background of the measure and detailed explanations of (1) which level of language (e.g. morphology, syntax) is addressed, (2) what exactly is measured (e.g. irregularity, transparency, lexical diversity), (3) how the measure is operationalised/calculated. Additionally, indicate whether you will participate in track (A), for text-based measures, or track (B), for annotation-based measures.
Key datesPreliminary abstract submission: 31 December 2018
Notification of acceptance: 15 February 2019
Publication of datasets: 15 February 2019
Shared task submission: 30 June 2019
Registration: 15 May 2019
Workshop: 12-13 September 2019
Invited speakersMichael Cysouw
María Dolores Jiménez-López