Inter-Rater Disagreements in Applying the Montreal Classification for Crohn's Disease: The Five-Nations Survey Study

Offir Ukashi*, Aurelien Amiot, David Laharie, Luis Menchén, Ana Gutiérrez, Samuel Fernandes, Tommaso Pessarelli, Fábio Correia, Carlos Gonzalez-Muñoza, Julia López-Cardona, Giulio Calabrese, Rocio Ferreiro-Iglesias, Natalie Tamir-Degabli, Nikolas Konstantine Dussias, Amjad Mousa, Raquel Oliveira, Nicolas Richard, Ido Veisman, Kassem Sharif, Shomron Ben-HorinCarlos Soutullo-Castiñeiras, Gabriele Dragoni, Silvia Rotulo, Agnese Favale, Louis Calméjane, Thomas Bazin, Alfonso Elosua, Sara Lopes, Carla Felice, Violeta Mauriz, Inês Coelho Rodrigues, Julia Jougon, Inês Botto, Helena Tavares de Sousa, Lorenzo Bertani, Paula Ripoll Abadía, Alice De Bernardi, Yamile Zabana, Xavier Serra-Ruiz, Anna Viola, Manuel Barreiro-de Acosta, Henit Yanai, Alessandro Armuzzi, Fernando Magro, Uri Kopylov

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: The Montreal classification has been widely used in Crohn's disease since 2005 to categorize patients by the age of onset (A), disease location (L), behavior (B), and upper gastrointestinal tract and perianal involvement. With evolving management paradigms in Crohn's disease, we aimed to assess the performance of gastroenterologists in applying the Montreal classification. Methods: An online survey was conducted among participants at an international educational conference on inflammatory bowel diseases. Participants classified 20 theoretical Crohn's disease cases using the Montreal classification. Agreement rates with the inflammatory bowel diseases board (three expert gastroenterologists whose consensus rating was considered the gold standard) were calculated for gastroenterologist specialists and fellows/specialists with ≤ 2 years of clinical experience. A majority vote < 75% among participants was considered a notable disagreement. The same cases were classified using three large language models (LLMs), ChatGPT-4, Claude-3, and Gemini-1.5, and assessed for agreement with the board and gastroenterologists. Fleiss Kappa was used to assess within-group agreement. Results: Thirty-eight participants from five countries completed the survey. In defining the Montreal classification as a whole, specialists (21/38 [55%]) had a higher agreement rate with the board compared to fellows/young specialists (17/38 [45%]) (58% vs. 49%, p = 0.012) and to LLMs (58% vs. 18%, p < 0.001). Disease behavior classification was the most challenging, with 76% agreement among specialists and fellows/young specialists and 48% among LLMs compared to the inflammatory bowel diseases board. Regarding disease behavior, within-group agreement was moderate (specialists: k = 0.522, fellows/young specialists: k = 0.532, LLMs: k = 0.577; p < 0.001 for all). Notable points of disagreement included: defining disease behavior concerning obstructive symptoms, assessing disease extent via video capsule endoscopy, and evaluating treatment-related reversibility of the disease phenotype. Conclusions: There is significant inter-rater disagreement in applying the Montreal classification, particularly for disease behavior in Crohn's disease. Improved education or revisions to phenotype criteria may be needed to enhance consensus on the Montreal classification.

Original languageEnglish
JournalUnited European Gastroenterology Journal
DOIs
StateAccepted/In press - 2025

Keywords

  • Crohn's disease
  • complicated disease phenotype
  • inflammatory bowel diseases
  • large language models
  • montreal classification

Fingerprint

Dive into the research topics of 'Inter-Rater Disagreements in Applying the Montreal Classification for Crohn's Disease: The Five-Nations Survey Study'. Together they form a unique fingerprint.

Cite this