Skip to main content

Table 1 Diagnostic accuracy and ranking distribution

From: Comparative analysis of large language models on rare disease identification

Model

Correct

Incorrect

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Chatgpt-4o

46.05%

7.24%

4.61%

2.63%

2.63%

36.84%

Claude 3.5 Sonnet

64.47%

9.21%

3.95%

1.32%

0.00%

21.05%

Gemini Advanced

51.32%

7.89%

2.63%

3.95%

1.97%

32.24%

Llama 3.1 405B

36.84%

9.21%

3.95%

3.95%

3.29%

42.76%