DQDF: Data-Quality-Aware Dataframes
Phanwadee Sinthong, Dhaval Patel, et al.
VLDB 2022
Developers using LLMs and LLM-based agents in their applications have provided plenty of anecdotal evidence that in-context-learning (ICL) is fragile. In addition to the quantity and quality of examples, we show that the order in which the in-context examples are listed in the prompt affects the output of the LLM and, consequently, their performance. While prior work has explored improving ICL through dataset-dependent techniques, we introduce OptiSeq, a purely inference-time, dataset-free optimization method that efficiently determines the best example order. OptiSeq leverages log probabilities of LLM-generated outputs to systematically prune the search space of possible orderings and recommend the best order(s) by distinguishing orderings that yield high levels of accuracy and those that underperform Extensive empirical evaluation on multiple LLMs, datasets, and prompts demonstrates that OptiSeq improves accuracy by 5.5 - 10.5 percentage points across multiple tasks.
Phanwadee Sinthong, Dhaval Patel, et al.
VLDB 2022
Tyler Baldwin, Wyatt Clarke, et al.
Big Data 2022
Amit Alfassy, Assaf Arbelle, et al.
NeurIPS 2022
Avirup Saha, Prerna Agarwal, et al.
CODS-COMAD 2024