Ricardo Campos (INESC TEC; Ci2-Smart Cities Research Center - Polytechnic
Institute of Tomar, Tomar, Portugal)
Alípio M. Jorge (INESC TEC; University of Porto, Portugal)
Adam Jatowt (University of Innsbruck, Austria)
Sumit Bhatia (Adobe Media and Data Science Research Lab, India)
Marina Litvak (Shamoon Academic College of Engineering, Israel)
Recent years have shown a stream of continuously evolving information making it unmanageable and time-consuming for an interested reader to track and keep up with all the essential information and the various aspects of a story. Automated narrative extraction from text offers a compelling approach to this problem. It involves identifying the sub-set of interconnected raw documents, extracting the critical narrative story elements, and representing them in an adequate final form (e.g., timelines) that conveys the key points of the story in an easy-to-understand format. Although information extraction and natural language processing have made significant progress towards an automatic interpretation of texts, the problem of automated identification and analysis of the different elements of a narrative present in a document (set) still presents significant unsolved challenges. In the sixth edition of the Text2Story workshop, we aim to bring to the forefront the challenges involved in understanding the structure of narratives and in incorporating their representation in well-established models, as well as in modern architectures (e.g., transformers) which are now common and form the backbone of almost every IR and NLP application. It is hoped that the workshop will provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction task.
Duration: Half-day (morning)
Xuke Hu (German Aerospace Center, Germany)
Yingjie Hu (University at Buffalo, USA)
Bernd Resch (University of Salzburg, Austria)
Jens Kersten (German Aerospace Center, Germany)
Kristin Stock (Massey University, New Zealand)
Vast and ever-increasing amounts of semi- and unstructured text data, like social media posts, website texts, historical archives, and scientific articles are available online and offline. These documents often refer to geographic regions or specific places and contain valuable but textually encoded geographic information in the form of toponyms, place names, and complex location descriptions. The information is useful not only for scientific studies, such as spatial humanities, but can also contribute to various practical applications, such as disaster management, traffic management, and disease surveillance. Scientists from many fields, like information retrieval, natural language processing, and geographic information science, have an increased interest in researching and applying methods to infer the geographical focus of documents or to extract geographic references from unstructured and heterogeneous texts and finally resolve these references unambiguously to places or spaces on the Earth’s surface. Despite the encouraging progress in geographic information extraction, there are still many unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. In the first GeoExT workshop, we aim to foster the discussion and exchange on recent advances in different aspects of geographic information extraction from texts, such as the methods, datasets, and systems for geolocating documents and recognizing and resolving toponyms and location descriptions. Our goal is to establish a common and long-term forum to consolidate multi-disciplinary efforts from both researchers and practitioners in Europe and beyond.
Suzan Verberne (LIACS, Leiden University, Netherlands)
Evangelos Kanoulas (University of Amsterdam, Netherland)
Gineke Wiggers (Leiden University, Netherlands)
Florina Piroi (TU Wien, Austria)
Arjen de Vries (Radboud University, Netherlands)
Legal professionals spend up to a third of their time doing research and investigation. During this research, legal information retrieval plays an important role. Although this is the first legal IR workshop organized at ECIR, the topic has a long history of prior successful events and benchmark campaigns. Two specific legal tasks that have attracted the attention of the IR community in the past decades are eDiscovery and case law retrieval. But these are only two of many retrieval tasks in the legal domain; other legal IR tasks have received less attention, for example legal web search in commercial legal search engines, legal community question answering, or lawyer finding. In this workshop, we aim to address the complete scope of legal IR tasks, challenges, and methods needed to address those challenges. The full-day workshop has three keynote speakers: Maura Grossman, Tjerk de Greef, and Milda Norkute, talks based on extended abstracts, and time for discussion.
Marinella Petrocchi (IIT-CNR, Italy)
Marco Viviani (University of Milano-Bicocca, Italy)
With the advent of the Social Web, we are constantly and more than ever assaulted by different kinds of information pollution, which may lead to severe issues for both individuals and society as a whole. In this context, it becomes essential to guarantee users access to genuine information that does not distort their perception of reality. For this reason, in recent years, numerous approaches have been proposed for the identification of misinformation, in different contexts and for different purposes. However, the problem has not yet been sufficiently addressed in the field of Information Retrieval because it has been treated primarily as a classification task to identify information versus misinformation. Hence, the purpose of this Workshop is to address the IR community for solutions to be proposed in which the genuineness of information is considered as one of the dimensions of relevance within search engines or recommender systems, early detection of misinformation can be achieved, the results obtained are explainable with respect to the users of Information Retrieval Systems, user's privacy is taken into consideration.
Ludovico Boratto (University of Cagliari, Italy)
Stefano Faralli (Sapienza University of Rome, Italy)
Mirko Marras (University of Cagliari, Italy)
Giovanni Stilo (University of L'Aquila, Italy)
Creating efficient and effective search and recommendation algorithms has been the main objective of industry practitioners and academic researchers over the years. However, recent research has shown how these algorithms trained on historical data lead to models that might exacerbate existing biases and generate potentially negative outcomes. Defining, assessing and mitigating these biases throughout experimental pipelines is a primary step for devising search and recommendation algorithms that can be responsibly deployed in real-world applications. This workshop aims to collect novel contributions in this field and offer a common ground for interested researchers and practitioners.
Ingo Frommholz (University of Wolverhampton, United Kingdom)
Philipp Mayr (GESIS, Germany)
Guillaume Cabanac (RIT - Université Paul Sabatier, Toulouse, France)
Suzan Verberne (LIACS, Leiden University, Netherlands)
The Bibliometric-enhanced Information Retrieval workshop series (BIR/BIRNDL) tackles issues
related to academic search, at the intersection between Information Retrieval and
Bibliometrics. BIR is a hot topic investigated by both academia (e.g., ArnetMiner,
CiteSeerX) and industry (e.g., Google Scholar, Microsoft Academic Search, and Semantic
Scholar). Searching for scientific information is a long-lived information need. In the
early 1960s, Salton was already striving to enhance information retrieval by including clues
inferred from bibliographic citations. The development of citation indexes pioneered by
Garfield proved determinant for such a research endeavor at the intersection between the
nascent fields of Information Retrieval (IR) and Bibliometrics. The pioneers who established
these fields in Information Science—such as Salton and Garfield—were followed by scientists
who specialised in one of these, leading to the two loosely connected fields we know of
today. BIR tries to bridge this gap. An overview of the BIR/BIRNDL workshop series can be
found at https://sites.google.com/view/bir-ws/home.
Going into its 13th iteration, BIR will tackle issues related to academic search, at the intersection between Information Retrieval and Bibliometrics. We strive to get the ‘retrievalists’ and ‘citationists’ active in both academia and the industry together, who are developing search engines and recommender systems for scholarly search.
Giorgio Maria Di Nunzio (University of Padua, Italy)
Evangelos Kanoulas (University of Amsterdam, Netherlands)
Prasenjit Majumder (DAIICT, India)
Technology-assisted review systems (TARS) use a kind of human-in-the-loop approach where
classification and/or ranking algorithms are continuously trained according to the relevance
feedback from expert reviewers. This is a high-recall task in which machine-learning methods
need large amounts of human relevance assessments which represent the primary cost of such
methods. It is necessary to evaluate these systems both in terms of traditional IR
“batch”/off-line performances and in terms of the time spent per assessment, the hourly pay
rate for assessors, and the quality of the assessor. Consequently. We want to compare both
1) the vetting approach that uses evaluation collections to optimize systems and carry out
pre-hoc evaluation, and 2) the validation of the system to measure the actual outcome of the
system in real situations.
In this workshop, we aim to fathom the effectiveness of TARS looking at different dimensions of evaluation. Some of the important open questions are: How are labeling errors in TARS different from traditional IR systems? Are these differences (if any) meaningful for the effectiveness of a TAR system? What is the objective function during the optimization (effectiveness/efficiency)? Are there any ethical issues (hidden or explicit) in the workflow process? If any, is there any mitigation process that is (or should be) taken into account? Is there any bias/fairness issue in the human-AI interaction? In this regard, have you ever experienced any serious flaw in the choice of seed documents or active learning process that make this issue evident?
What about explainability in TAR systems?
Duration: Half-day (morning)
Guglielmo Faggioli (University of Padova, Italy)
Nicola Ferro (University of Padova, Italy)
Josiane Mothe (Institut de Recherche en Informatique de Toulouse, France)
Fiana Raiber (Yahoo Research, Israel)
Query Performance Prediction (QPP) is currently primarily used for ad-hoc retrieval tasks.
The Information Retrieval (IR) field is reaching new heights thanks to recent advances in
large language models and neural networks, as well as emerging new ways of searching, such
as conversational search. Such advancements are quickly spreading to adjacent research
areas, including QPP, necessitating a reconsideration of how we perform and evaluate QPP.
This workshop aims at stimulating discussion on three main aspects concerning the future of QPP: i) What are the emerging QPP challenges posed by new methods and technologies, including but not limited to dense retrieval, contextualized embeddings, and conversational search? ii) How might these new techniques be used to improve the quality of QPP? and iii) Can we claim that the current techniques for evaluating QPP are effective in all arising scenarios? Can we envision new evaluation protocols capable of granting generalizability in new domains?