AI might excel at certain tasks like coding or generating a podcast. But it struggles to pass a high-level history exam, a ...
A team of researchers has developed a novel benchmark to evaluate the historical knowledge of leading large language models ...