Phase 2

Phase 2.1. Online Course Creation.

We have identified seven core subjects in Digital Linguistics. Below is a list of the corresponding courses and detailed descriptions for each one.

Course title Mining and Managing Multilingual Terminology
Course instructor Špela Vintar
Course description Systematic terminology management is essential for efficient communication of specialised content, either within the language industry, in business or institutional environments. This course provides students with an understanding of specialised discourse and an inventory of theoretical foundations, methods and tools for terminology management in multilingual contexts. A core component of this course requires students to develop the skills necessary for compiling domain-specific text collections and employing computational methods to extract specialised knowledge from such collections.
Course objectives
  • To recognise terms and their properties in different communicative settings
  • To understand the needs of various users of terminology
  • To plan, design and populate a termbase in any terminology management tool of choice
  • To compile a specialised corpus of texts to be used for terminology mining
  • To use a corpus workbench to extract single- and multi-word terms, definitions and other relevant information, in one or several languages
  • To validate, evaluate and implement the results of terminology mining
Topics covered
  • Basic concepts in terminology
  • Concepts, definitions and relations
  • Terms across specialised domains, genres, languages
  • Terminology management: structuring an entry, available tools
  • Building a corpus for term mining
  • Approaches to term mining
  • Finding keywords and terms with a built-in term grammar
  • Extracting terms and definitions with advanced CQL queries
  • Multilingual term mining
  • Validation and evaluation of results
Level Intermediate
Modality Interactive presentations, video & screen recordings, exercises, guided tasks and assignments, reading & self-guided research
Time commitment 120 hrs
Course title Localization Tools and Workflows
Course Instructor Caroline Reiss
Course description Today’s localization industry is faced with the task of translating a huge volume of texts to produce high-quality localized products in a short turnaround time, to satisfy the needs of global and local marketplaces. Such an undertaking would be inconceivable without the use of technology and, in recent decades, the development of localization tools has been instrumental and given rise to changes in collaborative workflows. This course aims to equip learners with a critical understanding towards the use of these tools, namely computer assisted translation (CAT) tools.
Course objectives
  • To describe the different types of computer tools available to support the localization process
  • To employ the basic functionalities of computer assisted translation tools;
  • To combine the use of translation tools with other computer tools and resources to process language data;
  • To assess the current limitations of these tools;
  • To demonstrate an awareness of the level of planning and management involved in the localization process.
Topics covered
  • The historical development of the localisation industry
  • The content production and localisation cycle
  • Controlled language
  • Translation technologies
  • Basic concepts in translation memory software
  • Localisation project lifecycle and management
Level Introductory
Modality Recorded presentations, interactive activities, quizzes, guided exploration and a project-based approach to learning about language technologies
Time commitment 100 hours
Course title Computational lexicology and lexicography
Course instructor Nives Mikelić Preradović
Course description Computational lexicology may be defined as the application of computers to the study of the lexicon. Taken in its broadest sense, it is a multidisciplinary field involving the analysis of man-made dictionaries using computers to study their machine-readable text as well as a study of the computational linguistic content and organization of lexicons for use by natural-language processing applications.  This course provides theoretical and practical information regarding current processes for building dictionaries and lexical databases used by natural-language processing applications. The topic is covered from the point of view of a computational lexicographer preparing dictionaries with the use of natural-language processing. Technical issues of dictionary building are also covered. In the project, students will explore dictionary entries in different computational lexicons that were built using the described tools, data and processes.
Course objectives ●   To understand the content and limitations of print dictionaries for computational purposes

●   To critically compare the design, structure and content of various kinds of monolingual and bilingual subcategorisation (valency) lexicons.

●   To explain the theoretical aspects and most important methods of building subcategorization lexicons

●   To construct the valency entry in a bilingual valency lexicon

●   To compare the design and content of various kinds of sentiment lexicons

●   To plan a small-scale lexicographic project and implement it by applying the techniques discussed in class

Topics covered
  • Introduction to Computational lexicology and lexicography
  • Electronic lexicography, computational and corpus lexicography
  • Morphological lexicons
  • Derivational and inflectional morphological lexicons of different European languages
  • Lexical relations and lexical databases
  • Wordnets for different EU languages
  • Subcategorization (valency) lexicons
  • Semantic lexicons
  • Sentiment lexicons
  • Formats, standards and automatic acquisition of computational lexicons
Level Intermediate
Modality Interactive presentations, video & screen recordings, exercises, knowledge quizzes, guided research tasks and assignments, directed readings
Time commitment 120 hrs
Course title Introduction to Python for Linguists
Course instructor Petra Bago
Course description Python is one of the easiest programming languages out there right now. This course provides students with an understanding of elementary concepts in programming focusing on acquiring the knowledge and skills necessary for text processing. It is aimed at students of linguistics and other disciplines with no prior programming experience, who are interested in learning Python in order to process large volumes of text.
Course objectives
  • To understand and implement Python syntax and semantics.
  • To identify, describe, and implement variables, operators and functions in Python.
  • To identify, describe and implement integers, floating-point numbers, strings, lists, files and dictionaries in Python.
  • To identify, describe and implement control flow in Python.
  • To identify, describe and implement regular expressions in Python.
Topics covered
  • Introduction to Python.
  • Basic data types.
  • Variables.
  • Basic operators.
  • Basic functions.
  • Working with strings.
  • Working with lists.
  • Control flow.
  • Working with files.
  • Basic regular expressions.
  • Working with dictionaries.
Level Introductory
Modality Interactive presentations, video & screen recordings, exercises, knowledge quizzes, readings, and assignments with instructor’s feedback for student submissions.
Time commitment 120
Course title Post-Editing Machine Translation
Course Instructors Jean Nitzke, Anke Tardel
Course description This course will give you an introduction to post-editing (the correction of raw machine translated output by a human translator). It will cover both theoretical backgrounds and practical exercises.
Course objectives
  • To learn how machine translation works
  • To understand what post-editing is and how it can be executed
  • To distinguish between human translation, post-editing, and proofreading
  • To evaluate the factors influencing the post-editing process
  • To learn about post-editing in practice and research
Topics covered
  • MT history
  • MT approaches
  • General PE
  • Different PE styles
  • PE and Translation Memory Systems
  • PE and controlled languages
  • PE in research
  • PE in practice
Level Introductory to intermediate
Modality Interactive presentations, audio & screen recordings, exercises, knowledge quizzes, readings
Time commitment 120 hrs
Course title Introduction to Text Processing and Analysis
Course instructors Lucie Chlumská, Pavel Vondřička
Course description The main objective of the course is to provide beginner-level digital linguistics students with all the necessary information on text processing and analysis. Starting with basic topics, such as characteristics of a plain text format and the difference between data and metadata, the course goes on to explain the specifics of XML and different types of text annotation, to introduce the process of tokenization, segmentation and morphological analysis, to describe the limits and possibilities of syntactic and semantic tagging and, finally, to summarize the principles of CQL and corpus querying, including the use of regular expressions and querying parallel corpora.
Course objectives
  • To understand how computers work with textual data;
  • To distinguish between different data formats and extract textual content from them (e.g. using OCR);
  • To understand the specifics of plain text and XML formats;
  • To understand the principles and issues of text annotation, incl. morphological analysis, syntactic and semantic tagging;
  • To learn about available resources for text processing and analysis, including taggers and concordancers;
  • To be able to analyse existing as well as own corpora in a variety of available corpus-based tools;
  • To build complex CQL queries, including regular expressions and logical operators.
Topics covered
  • File formats related to textual data
  • Plain text: Encoding, data and metadata
  • Extensible Markup Language or XML
  • Regular expressions
  • Tokenization and corpus-data formats
  • Morphological analysis: principles and tools
  • Syntactic and semantic annotation
  • Corpus exploration and analysis
  • Querying corpus data with CQL
  • Text alignment and parallel corpora
Level Introductory
Modality Interactive presentations, audio & screen recordings, exercises, knowledge quizzes, readings
Time commitment 100 hrs
Course title Variability of languages in time and space
Course instructors Anna Nedoluzhko, Magda Ševčíková, Šárka Zikánová
Course description The course provides students with basic information about the diversity of natural languages around the globe and the main dimensions along which they differ.
Course objectives
  • to understand basic linguistic notions needed for analyzing language variance on different levels of language description
  • to distinguish language types according to different typological criteria such as morphology and word order
  • to identify main types of writing systems
  • to understand the main principles of phonological, morphological and syntactic language changes from the diachronic point of view
  • to analyze implications of language diversity for contemporary language technology
Topics covered
  • Languages around the world
  • Linguistic sign, language system
  • Grammar, lexicon, and word formation
  • Word formation across languages
  • Linguistic typology of grammar: Morphology
  • Linguistic typology of grammar: Syntax
  • Writing systems around the world
  • Linguistic typology of grammar: Phonology
  • Influence of Diachronic Language Processes on the Language Variability
  • Diachronic Changes in Languages
Level Introductory
Modality Presentations and recorded presentations, accompanied by recommended additional reading and interspersed with interactive quizzes and other types of activities.
Time commitment 80 hrs