Description

An influential line of thinking within evolutionary linguistics is that languages change in response to socioecological pressures, i.e. adapt to their environmental niches. Language complexity is a common parameter to test for such adaptation. It is, however, notoriously difficult to define and measure. Virtually every study of complexity uses its own operationalization and measure. On one hand, this diversity is beneficial for the field, since an intricate phenomenon is being studied from different angles. On the other hand, the comparison of different studies is inhibited. This is particularly problematic if different measures yield different conclusions, since there currently is little consensus about how measures themselves can be evaluated and compared.

To overcome this, we organize a shared task (shared tasks are widely used in computational linguistics) on linguistic complexity, namely: Measure and compare the complexities of a set of 37 language varieties of 7 families. The list of languages is given in the Data section. The participants are free to choose whether they want to measure just one facet of complexity (e.g. phoneme/grapheme inventory, morphology, word order), or try to develop an overall complexity measure. The complexity measure can be based on any conceivable metric. The submissions, however, have to clearly state: 1) what exactly is being measured (e.g. overspecification, lexical diversity, irregularity, verbosity, opacity etc.); 2) how the measure is calculated, and the theoretical rationale behind the method; 3) the resulting value for each language.

To facilitate the comparability of different measures, we request that the participants who apply corpus-based measures use the corpora available via the Universal Dependencies project, v2.1. The subsample to be used for this workshop is downloadable from the Data section. Participants are free to decide which level of annotation they want to use. Plain-text files are also available for those who do not need any annotation. Participants who do not need corpora are exempt from this requirement. We also require that the participants submit all relevant calculations and scripts as supplementary materials.

The presentation format will depend on the number and quality of submissions. We will allot a slot for a detailed discussion of the proposed measures, their strengths, drawbacks and comparability, their relation to each other, and the possibilities to flesh out uniform approaches to measure language complexity and its evolution. We consider writing a joint paper based on the conclusions of the workshop.

Submission

max. 4 pages excluding references, please use the Evolang templates (http://evolang.org/submissions). You have to submit three files:

1) a pdf of the paper (anonymized according to the Evolang rules);
2) a .csv file with your results (use the template provided with the dataset);
3) a single archive with your supplementary materials (SM). You may decide yourself what should be included into the SM, the main requirement is that we should be able to reproduce your results using your SM without additional explanations. SM should also be referenced in the main text.

submission link

We aim to publish electronic proceedings of the workshop.