Computational morphosyntax

Code
570519
Credits
5cr

Prior recommendations

Students should have good working knowledge of the Python programming language (Python 3) before the start of the course. Students new to programming or to Python should take the course Natural Language Processing in the first semester. 

 

Selection of Python tutorials that may help students in their training:

Gauld, Alan (2011) Learning to program: http://www.alan-g.me.uk/l2p2/index.htm  

This is a basic introduction that also introduces some basic concepts of programming (in addition to Python3, some examples are also given of JavaScript and VBScript). Adequate for absolute beginners in programming. At least the sections under “Concepts” and “The basics” plus the two first ones under “Advanced Topics” should be read and practiced.

Zed A. Shaw (2017) LEARN PYTHON 3 THE HARD WAY. A Very Simple Introduction To The Terrifyingly Beautiful World Of computers And Code (Third Edition): https://www.pdfdrive.com/learn-python-3-the-hard-way-e52089947.html  

This is a step-by-step thorough introduction to the basics of Python3. Adequate for people desiring to acquire programming skills. The first 39 sections should be read and practiced.

Charles R. Severance (2016) Python for Everybody. Exploring Data Using Python 3: https://www.py4e.com/

This is a complete introduction to programming in Pyhton 3. A complete view of the possibilities of using Python 3 for data management.  The first 11 topics should be read and practiced.

In all cases the sections/chapters should be worked out on a Python 3 interpreter. Just reading them is not enough!

 

Goals

Referring to knowledge

The main goal is for students to identify state-of-the-art techniques used in industry and academia to structure language data and extract information from it. This goal can be further subdivided into two. First, from a theoretical perspective, the aim is for students to understand how linguistic data can be processed and analysed with different computational methods; and to recognize the advantages and drawbacks of the different options. Second, on the practical side, the aim is for students to be able to process natural language data on their own, and to be able to build on the knowledge acquired in this class to tackle problems not covered in it.

 

Referring to abilities, skills

This course provides an introduction to central aspects of natural language processing. Emphasis is places on hands-on experience with the acquisition, manipulation, curation, and processing of linguistic data. The course covers both symbolic and statistical methods, from a theoretical and practical angle.

The main goal of this course if for students to identify state-of-the-art techniques used in industry and academia to structure language data and extract information from it; while also gaining confidence in the application of this knowledge to new problems outside the scope of the course.

 

Associated skills

  •  Programming (Python)
  •  Linguistic data acquisition, manipulation, curation, and processing
  •  Machine learning
  •  Quantitative reasoning applied to language sciences

 

Contents

1. Handling text

2. Language models

3. Tagging

4. Parsing

5. Information extraction

6. Other topics of interest to students (e.g., human-in-the-loop machine learning and summarization)

 

Teaching methods

This course is largely based on a flipped-classroom format. Students are expected to prepare weekly readings and to lead a section of a weekly session (see "Evaluation"). The remaining time is devoted to theoretical discussions and practical applications of the concepts introduced.

 

Evaluation

20% participation in class discussions/presentations

80% practical exercises (exercise 1: 25%, exercise 2: 25%, exercise 3: 30%)

 

Re-evaluation

The re-evaluation will consist of delivering failed assignments. Only those students who score a minimum of 3 (i.e. ranging from 3 to 4.9) can opt for the resit. The only possible final mark after the resit will be 5. The resit takes place a week after the evaluation period finishes.
 

Examination-based assessment

Under exceptional and justified circumstances, a single examination (100% of the grade) can be requested during the 30 first days of the semester using the proper form. The requirement is 10-page essay (and code) or exam to be written on a topic to be agreed with the instructor.

 

Bibliography

 

Book

Jurafsky, Daniel & Martin, James H. (2021), Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd edition.  Enllaç

Manning, Christopher D. & Schütze, Hinrich (1999), Foundations of Statistical Natural Language Processing. The MIT Press.

Vasiliev, Yuli (2020), Natural Language Processing with Python and SpaCy: A Practical Introduction


Web page

There is plenty of Python tutorials on the web. In addition to the ones listed above under Requirements, here a couple of tutorials that may be useful to you:

  1. For beginners: https://www.pythonprogramming.net/python-fundamental-tutorials/
  2. Advanced content: https://diveintopython3.net/