This repository contains course material (lectures, labs, and assignments) for the 2022 fall edition of the course.
The course followed a standard format, with lectures, assignments and group projects. Physical lecturing led by the lecturer, included discussions, Kahoot quizzes, and hands-on exercises. The lectures were not recorded/streamed, physical attendance was expected. Labs were led by TAs and dedicated to working on individual (graded) assignments. Student that needed help with assignments were expected to attend in person. For the group projects, groups could book 15min weekly slots to get help/feedback on progress and ideas from the lecturer/TAs.
The semester (consisting of 13 weeks) was divided into two periods: (1) lectures lectures and individual assignments during the first 10 weeks, with group projects group project work running in parallel, (2) last 3 weeks dedicated entirely to the group projects. Dedicated time was provided to get help with the individual assignments (from TAs) as well as to get feedback on the group project (from lecturer and TAs).
The overall grade came from two components, both of which needed to be >F in order to pass:
- 40% Project work. Half of the score came from the individual assignments and another half from the group project.
- 60% Written (digital) exam.
Module | Topic | Lecture | Exercises |
---|---|---|---|
1 | Text preprocessing and similarity | lecture | exercises |
2 | Text classification and similarity | lecture | exercises |
3 | Search engine architecture and basic retrieval models | lecture | exercises |
4 | Advanced retrieval models | lecture | exercises |
5 | Retrieval evaluation | lecture | exercises |
6 | Knowledge bases and entity retrieval | lecture | exercises |
7 | Entity linking | lecture | exercises |
8 | Semantic search | lecture | exercises |
9 | Word embeddings and dense retrieval | lecture | exercises |
10 | Transformer-based models | lecture | exercises |
11 | Conversational information access | lecture | |
12 | Advanced evaluation | lecture | |
13 | Conversational search systems | lecture | exercises |
14 | Fairness and transparency in IR | lecture | exercises |
During the course, students had to complete 9 assignments (deliverables) individually. These were graded and accounted for 50% of the project work (i.e., 20% of the final grade). Students were given a certain time limit (typically 10 days) to complete each assignment. For each assignment students were provided with a Python file with code skeleton, which they were expected to complete according to the task description. Assignments were graded automatically, using a combination of public and hidden tests. Public tests were released together with the assignment; if the solution passed these tests, it was likely correct. However, in addition to those tests, there were some hidden tests that the solution was tested against after submission; these typically contained larger inputs/datasets, corner cases, or other inputs in order to test that the student fully understood the methods and/or followed the instructions.
The assignments are not released publicly as they might be reused in future editions of the course. (Feel free to reach out to us in email if you would like to get a copy of these.)
Id | Topic | Points |
---|---|---|
A1.1 | Classifier: Feature extraction | 2 |
A1.2 | Classifier: Evaluation | 3 |
A1.3 | Classifier: Training and evaluation | 5 |
A2.1 | Search engine: Indexing | 4 |
A2.2 | Search engine: Scoring | 7 |
A2.3 | Search engine: Evaluation | 4 |
A3.1 | Entity retrieval | 8 |
A3.2 | Entity ranking | 7 |
A3.3 | Learning to Rank | 10 |
- Zhai & Massung. Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. Morgan & Claypool, 2016.
- Balog. Entity-Oriented Search. Springer, 2018.
Krisztian Balog (course responsible), Ivica Kostric (lecturer), Nolwenn Bernard (TA), Weronika Lajewska (TA)