From: Comparative analysis of large language models on rare disease identification
Model | Correct | Incorrect | ||||
---|---|---|---|---|---|---|
Rank 1 | Rank 2 | Rank 3 | Rank 4 | Rank 5 | ||
Chatgpt-4o | 46.05% | 7.24% | 4.61% | 2.63% | 2.63% | 36.84% |
Claude 3.5 Sonnet | 64.47% | 9.21% | 3.95% | 1.32% | 0.00% | 21.05% |
Gemini Advanced | 51.32% | 7.89% | 2.63% | 3.95% | 1.97% | 32.24% |
Llama 3.1 405B | 36.84% | 9.21% | 3.95% | 3.95% | 3.29% | 42.76% |