Abstract:
This thesis explores the development and evaluation of automated fact-checking systems, focusing on matching claims and tweets to fact-checking articles. We assess retrieval and re-ranking methods, such as the BM25 algorithm and SBERT model. Key contributions include:
• Sentence-Level Similarity: A novel approach for SBERT re-ranking improves accuracy in tweet-article matching.
• Language-Specific Analysis: Comparative analysis of English and French claims highlights the need for language-specific models.
• FactCheckBureau Platform: A web application designed to help researchers and journalists develop accurate claim-fact check matching systems.
Our experiments reveal the strengths and limitations of various methods. While BM25 serves as a robust baseline, SBERT with sentence-level granularity enhances precision. We also explore tweet enrichment techniques like OCR and image captioning to improve tweet representation. This research advances automated fact-checking, offering tools and insights to combat misinformation. The FactCheckBureau platform enables effective claim verification, promoting accurate information online.