Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha, Matthew R. Walter

Research output: Contribution to journalConference articlepeer-review

Abstract

In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a known Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a number of comparison queries that scales quasilinearly in the number of objectives.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume36
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

Funding

FundersFunder number
Yandex Initiative for Machine Learning
Eric and Wendy Schmidt Fund
Tel Aviv University
European Research Council
Horizon 2020882396
Horizon 2020
Defense Advanced Research Projects AgencyHR00112020003
Defense Advanced Research Projects Agency
National Science FoundationCCF-2212968, ECCS-2216899
National Science Foundation
Israel Science Foundation993/17
Israel Science Foundation

    Fingerprint

    Dive into the research topics of 'Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback'. Together they form a unique fingerprint.

    Cite this