Jump to Content

Premise Order Matters in Reasoning with Large Language Models

Published
View publication

Abstract

Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, we observe that LLMs are surprisingly brittle to different premise orders, despite that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps. For example, in deductive reasoning tasks, presenting the premises in the same order as the ground truth proof in the prompt (as opposed to random ordering) drastically increases the model's accuracy. We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that even if the model performance is decent on the optimal order, permuting the premise order can cause a performance drop of over 30%. In addition, we introduce the R-GSM benchmark based on GSM8K to examine the ordering effect for math problem solving, and we again observe a significant accuracy decrease compared to the original GSM8K problems.

Authors

Xinyun Chen, Ryan A. Chi, Xuezhi Wang, Denny Zhou

Venue

arXiv