Tutorials

Neuro-Symbolic Representations for Information Retrieval

Duration: Full-day

Laura Dietz (University of New Hampshire, USA)
Hannah Bast (University of Freiburg, Germany)
Shubham Chatterjee (University of Glasgow, Scotland)
Jeff Dalton (University of Glasgow, Scotland)
Edgar Meij (Bloomberg, UK)
Arjen de Vries (Radboud University, The Netherlands)

This tutorial will provide an overview of recent advances on neuro-symbolic approaches for information retrieval. A decade ago, knowledge graphs and semantic annotations technology led to active research on how to best leverage symbolic knowledge. At the same time, neural methods have demonstrated to be versatile and highly effective. From a neural network perspective, the same representation approach can service document ranking or knowledge graph reasoning. End-to-end training allows to optimize complex methods for downstream tasks. We are at the point where both the symbolic and the neural research advances are coalescing into neuro-symbolic approaches. The underlying research questions are how to best combine symbolic and neural approaches, what kind of symbolic/neural approaches are most suitable for which use case, and how to best integrate both ideas to advance the state of the art in information retrieval.

Understanding and Mitigating Gender Bias in Information Retrieval Systems

Duration: Half-day

Amin Bigdeli (Ryerson University, Toronto, Canada)
Negar Arabzadeh (Google Brain, Montreal, Canada)
Shirin Seyedsalehi (Ryerson University, Toronto, Canada)
Morteza Zihayat (Ryerson University, Toronto, Canada)
Ebrahim Bagheri (Ryerson University, Toronto, Canada)

Recent studies have shown that information retrieval systems may exhibit stereotypical gender biases in outcomes which may lead to discrimination against minority groups, such as different genders, and impact users’ decision making and judgements. In this tutorial, we inform the audience of studies that have systematically reported the presence of stereotypical gender biases in Information Retrieval (IR) systems and different pre-trained Natural Language Processing (NLP) models. We further classify existing work on gender biases in IR systems and NLP models as being related to (1) relevance judgement datasets, (2) structure of retrieval methods, (3) representations learnt for queries and documents, (4) and pre-trained embedding models. Based on the aforementioned categories, we present a host of methods from the literature that can be leveraged to measure, control, or mitigate the existence of stereotypical biases within IR systems and different NLP models that are used for down-stream tasks. Besides, we introduce available datasets and collections that are widely used for studying the existence of gender biases in IR systems and NLP models, the evaluation metrics that can be used for measuring the level of bias and utility of the models, and de-biasing methods that can be leveraged to mitigate gender biases within those models.

Deep Learning Methods for Query Auto Completion (Online only)

Duration: Half-day

Manish Gupta (Microsoft, Hyderabad, India)
Meghana Joshi (Microsoft, Vancouver, Canada)
Puneet Agrawal (Microsoft, Hyderabad, India)

Query Auto Completion (QAC) aims to help users reach their search intent faster and is a gateway to search for users. Everyday, billions of keystrokes across hundreds of languages are served by Bing Autosuggest in less than 100 ms. The expected suggestions could differ depending on user demography, previous search queries and current trends. In general, the suggestions in the AutoSuggest block are expected to be relevant, personalized, fresh, diverse and need to be guarded against being defective, hateful, adult or offensive in any way. In this tutorial, we will first discuss about various critical components in QAC systems. Further, we will discuss details about traditional machine learning and deep learning architectures proposed for four main components: ranking in QAC, personalization, spell corrections and natural language generation for QAC.

Uncertainty Quantification for Text Classification

Duration: Half-day

Dell Zhang (Thomson Reuters Labs, UK)
Murat Sensoy (Amazon Alexa AI, UK)
Masoud Makrehchi (Thomson Reuters Labs, Canada)
Bilyana Taneva-Popova (Thomson Reuters Labs, Switzerland)

This half-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and epistemic uncertainty for text classification models. Then, we describe several state-of-the-art approaches to uncertainty quantification and analyze their scalability to big text data: Virtual Ensemble in GBDT, Bayesian Deep Learning (including Deep Ensemble, Monte-Carlo Dropout, Bayes by Backprop, and their generalization Epistemic Neural Networks), as well as Evidential Deep Learning (including Prior Networks and Posterior Networks). Next, we discuss typical application scenarios of uncertainty quantification in text classification (including in-domain calibration, cross-domain robustness, and novel class detection). Finally, we list popular performance metrics for the evaluation of uncertainty quantification effectiveness in text classification. Practical hands-on examples/exercises are provided to the attendees for them to experiment with different uncertainty quantification methods on a few real-world text classification datasets such as CLINC150.

Trends and Overview: The Potential of Conversational Agents in Digital Health

Duration: Half-day

Tulika Saha (University of Liverpool, UK)
Abhishek Tiwari (Indian Institute of Technology Patna, India)
Sriparna Saha (Indian Institute of Technology Patna, India)

With the COVID-19 pandemic serving as a trigger, 2020 saw an unparalleled global expansion of tele-health. While the COVID-19 pandemic sped up the adoption of virtual healthcare delivery in numerous nations, it also accelerated the creation of a wide range of other different technology-enabled systems and procedures for providing virtual healthcare to patients. One important technological advancement is the increasing use of Conversational Agents (CAs) or Virtual Assistants (VAs) in people’s life, which now have numerous health applications. Numerous healthcare surveys conducted over the past years have revealed a worrying shortage of doctors compared to the doctor-to-population ratio in physical health and even more severe for mental health. Thus, CAs in healthcare is becoming more and more popular, driven by the need to assist the doctors and utilize their time effectively. It has become imperative, more than ever to focus on understanding and analysing growing trends of human-computer interfaces, i.e., VAs which will further pave way for developing robust computational models in healthcare including mental health. Our motivation behind proposing this tutorial is to analyze the growing trend of VAs in healthcare and provide the IR researchers with an overall perspective of where the AI and NLP communities are heading which can further pave way for ground-breaking novelties benefitting the research community and the society at large.

Website: https://healthassistant-ecir23.github.io/

Crowdsourcing for Information Retrieval

Duration: Half-day

Dmitry Ustalov (Toloka, Serbia)
Alisa Smirnova (Toloka, Switzerland)
Natalia Fedorova (Toloka, Serbia)
Nikita Pavlichenko (Toloka, Serbia)

In our tutorial, we will share more than six years of our crowdsourcing experience and bridge the gap between crowdsourcing and information retrieval communities by showing how one can incorporate crowdsourcing into their retrieval system to gather the real human feedback on the model predictions. Most of the tutorial time is devoted to a hands-on practice, when the attendees will, under our guidance, implement an end-to-end process for information retrieval from problem statement and data labeling to machine learning model training and evaluation.

Legal IR and NLP: the History, Challenges, and State-of-the-Art

Duration: Half-day

Debasis Ganguly (University of Glasgow, UK)
Jack G. Conrad (Thomson Reuters Labs, USA)
Kripabandhu Ghosh (Indian Institute of Science Education And Research Kolkata, India)
Saptarshi Ghosh (Indian Institue of Technology Kharagpur, India)
Pawan Goyal (Indian Institue of Technology Kharagpur, India)
Paheli Bhattacharya (Indian Institue of Technology Kharagpur, India)
Shubham Kumar Nigam (Indian Institute of Technology Kanpur, India)
Shounak Paul (Indian Institue of Technology Kharagpur, India)

Artificial Intelligence (AI), Machine Learning (ML), Information Retrieval (IR) and Natural Language Processing (NLP) are transforming the way legal professionals and law firms approach their work. The significant potential for the application of AI to Law, for instance, by creating computational solutions for legal tasks, has intrigued researchers for decades. This appeal has only been amplified with the advent of Deep Learning (DL). It is worth noting that working with legal text is far more challenging than in many other subdomains of IR/NLP, mainly due to factors like lengthy documents, complex language and lack of large-scale datasets. In this tutorial, we shall introduce the audience to the nature of legal systems and texts, and the challenges associated with processing legal documents. We shall then touch upon the history of AI and Law research, and how it has evolved over the years from rudimentary approaches to DL techniques. There will also be a brief introduction into the recent, state-of-the-art research in general domain IR and NLP. We shall then discuss in more detail about specific IR/NLP tasks in the legal domain and their solutions, available tools and datasets, as well as the industry perspective. This will be followed by a hands-on coding/demo session, which is likely to be of great practical benefit to the attendees.