Making LLM-Generated Text Reliable: Spotlight on Claire Gardent’s Research Chair

At the intersection of linguistics and computer science, Claire Gardent is a specialist in Natural Language Processing (NLP) and a CNRS Senior Research Director. Her contributions to the field were recognized with the CNRS Silver Medal in 2022, and she was named an ACL Fellow by the Association for Computational Linguistics. She has been conducting her research at Loria since 2001 and currently holds the ENACT research chair “Semantically Consistent LLM Based Text Generation.”

Q: Could you introduce yourself and tell us about your academic and professional background?

My path has been somewhat winding—I certainly didn’t expect my career to take this direction. Initially, I wanted to become an agricultural engineer. But I travelled extensively: I spent time in New Zealand, Ireland, and Germany. When I returned, I decided to resume my studies to become an interpreter.

During my years at the interpreting school in Geneva, Maghi King, director of the ISSCO (Dalle Molle Institute for Semantic and Cognitive Studies), introduced me to machine translation. I then pursued a master’s degree in England, in what was then called “expert systems,” which was essentially artificial intelligence.

After that, I completed a PhD in cognitive science in Edinburgh, although by that time my work was already focused on Natural Language Processing. I went on to do postdoctoral research in the Netherlands for three years, and then in Germany for seven years before being recruited by CNRS as a Senior Research Scientist in 2001.

Q: Could you present the research chair you lead?

This research chair focuses on text generation. My goal is to address the following question: How can we improve the outputs produced by large language models so that they are factually correct and consistent with the input data?

Large language models (LLM) are extremely good at generating text, which means that many of the challenges we previously worked on have now become less relevant. However, despite the high quality of the generated text, it is not always semantically correct. The output may contain inaccurate information based on our knowledge of the world, or it may be inconsistent with the input data.

My research mainly focuses on conditional text generation, that is, generating text from structured or unstructured input data (databases, texts, graphs, images, or videos). Quite often, the generated text is not faithful to the information contained in the input. These cases are referred to as hallucinations, or omissions, when relevant information that should appear in the generated text is missing. I study ways to address these issues in order to improve performance across different applications. Automatic summarization is a good example: when asking an LLM to produce a summary from one or several documents, the resulting summary must remain factually consistent with the input information.

A few years ago, I worked with Angela Fan (from Meta) on the automatic generation of biographies from texts collected on the web. We observed that for women’s biographies there was significantly less available information, which resulted in lower-quality generated texts. We also found that this problem became more pronounced in a multilingual setting. For instance, when working with English-language documents and attempting to generate the biography of an Asian scientist, the quality was significantly lower than for the biography of an English scientist. This was largely due to the fact that the documents we used were only in English. Clearly, if we want to generate the biography of a Thai woman, it is much better to retrieve texts written in Thai. Generating text not only from multiple documents but also from multilingual sources remains a difficult task. This is another topic I would like to address within the ENACT research chair.

Finally, I am also interested in how these models are evaluated. Typically, we rely on a benchmark dataset that contains both the inputs (raw data) and the expected outputs. The texts generated by the models are then compared to the outputs in this benchmark dataset. The challenge is that, in multilingual settings, we often lack such benchmark datasets, and creating them is extremely costly. One of the objectives of the ENACT chair is therefore to propose innovative methods to automate the creation of benchmark datasets, and/or to develop reference-free evaluation metrics. These would make it possible to evaluate the outputs of LLM not against reference texts, but directly against the input data—for example by comparing a generated summary with the document(s) it summarizes.

Q: What kind of collaborations do you aim to establish, and what impact do you expect this chair to have in the short and long term?

An academic collaboration is already in place through a doctoral contract funded by the chair. The PhD student will be co-supervised with Professor Gatt from Utrecht University in the Netherlands.

On the industry side, we collaborate with the company DeuxTec, based in Luxembourg, as part of the supervision of Alejandra Lorenzo’s PhD. The company develops software to monitor clinical activities in oncology. Together, we are working on a system capable of processing clinicians’ letters in order to automatically extract specific information (sets of attribute–value pairs) and transfer it into web forms. Currently, this task is performed manually.

I also collaborate with the Nancy-based start-up Cyber4Care. The objective is to develop sovereign conversational agents tailored to the operational and regulatory needs of organizations, particularly for crisis management in a wide range of contexts—such as municipalities, hospitals, or industrial companies—while ensuring continuity of operations.

A third collaboration is underway with URS (Unified Resource Sphere), a Paris-based startup. The goal is to develop a sovereign software platform for the management, traceability, and validation of knowledge extracted from texts. Given a specific information need, such a system would make it possible to generate validated and sourced knowledge and structure it in a way that reveals connections between insights from different domains, thereby fostering serendipity.

These three projects reflect a broader ambition: to tackle the challenge of the reliability of texts produced by LLM.