Content

Machine learning from human preferences provides mechanisms for capturing human feedback, which is used to design reward functions that are otherwise difficult to specify quantitatively, e.g., for socio-technical applications such as algorithmic fairness and many language and robotic tasks. While learning from human preferences has emerged as an increasingly important component of modern machine learning, e.g., credited with advancing the state of the art in language modeling and reinforcement learning, existing approaches are largely reinvented independently in each subfield, with limited connections drawn among them.

This course will cover the foundations of learning from human preferences from first principles and outline connections to the growing literature on the topic. This includes but is not limited to:

This is a graduate-level course. By the end of the course, students should be able to understand and implement state-of-the-art learning from human feedback and be ready to research these topics. Given how fast this area is growing, this course will consist of weekly lectures, presentations, and discussions of papers led by students. Students will compile course notes along with a final course project. If you are a CS PhD student at Stanford, this course is counted toward the breath requirement for "Learning and Modeling" or "Human and Society".

Instructor

Course Assistant

Logistics


Schedule

The current class schedule is below (subject to change). A tentative reading list can be found here.


Date Description Course Materials Deadline
Week 1: Sep 27 [Lecture] Course Introduction. Recommended reading: None Deadline:
  1. Sign-up for Presentation and Scirbe
Week 2: Oct 2 [Lecture] Human preferences models. Recommended reading:
  1. Train. Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. MIT Press. 1985.
  2. McFadden, Train. Mixed MNL Models for Discrete Response. Journal of Applied Econometrics. 2000.
  3. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley. 1959.
Additional reading:
  1. Ben-Akiva, Lerman. Discrete Choice Analysis: Theory and Application to Travel Demand. Transportation Studies. 1985.
  2. Park, Simar, Zelenyuk. Nonparametric Estimation of Dynamic Discrete Choice Models for Time Series Data. Computational Statistics & Data Analysis. 2017.
  3. Rafailov, Sharma, Mitchell, Ermon, Manning, Finn. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. Preprint. 2023.
Deadline:
  1. Pre-class survey
Week 2: Oct 4 [Student Presentation] Interaction models Recommended reading:
  1. Cattelan. Models for Paired Comparison Data: A Review with Emphasis on Dependent Data. Statistical Science. 2012.
  2. Bhatia, Pananjady, Bartlett, Dragan, Wainwright. Preference Learning Along Multiple Criteria: A Game-Theoretic Perspective. NeurIPS. 2020.
  3. Shah, Gundotra, Abbeel, Dragan. On the Feasibility of Learning, Rather Than Assuming, Human Biases for Reward Inference. ICML. 2019.
  4. Ghosal, Zurek, Brown, Dragan. The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. AAAI. 2023.
Deadline:
  1. Presentation slide and Presentation feedback for "Interaction models".
Week 3: Oct 9 [Fireside chat] Psychology and Marketing Perspectives: Noah Goodman, Jonathan Levav, S. Christian Wheeler Additional reading:
  1. Evangelidis, Levav, Simonson. The Upscaling Effect: How the Decision Context Influences Tradeoffs Between Desirability and Feasibility. Journal of Consumer Research. 2023.
  2. Evangelidis, Levav, Simonson. A Reexamination of the Impact of Decision Conflict on Choice Deferral. Management Science. 2023.
  3. Shennib, Catapano, Levav. Preference Reversals Between Digital and Physical Goods. ACR North American Advances. 2019.
  4. Tamkin, Handa, Shrestha, Goodman. Task Ambiguity in Humans and Language Models. arXiv. 2022.
  5. Hawkins, Berdahl, Pentland, Tenenbaum, Goodman, Krafft. Flexible Social Inference Facilitates Targeted Social Learning When Rewards Are Not Observable. Nature Human Behaviour. 2023.
  6. Yu, Goodman, Mu. Characterizing Tradeoffs Between Teaching via Language and Demonstrations in Multi-Agent Systems. arXiv. 2023.
Deadline: None
Week 3: Oct 11 [Student Presentation] Human biases and Reward models Recommended reading:
  1. The Decision Lab. Biases Index. 2023.
  2. Slovic. The Construction of Preference. Shaping Entrepreneurship Research. 2020.
  3. Hogarth. Insights in Decision Making: A Tribute to Hillel J. Einhorn. University of Chicago Press. 1990.
  4. Cooke. Experts in Uncertainty: Opinion and Subjective Probability in Science. Oxford University Press. 1991.
  5. Chan, Critch, Dragan. Human Irrationality: Both Bad and Good for Reward Inference. arXiv. 2021.
  6. Bobu, Scobee, Fisac, Sastry, Dragan. Less is More: Rethinking Probabilistic Models of Human Behavior. ACM/IEEE International Conference on Human-Robot Interaction. 2020.
Deadline:
  1. Presentation slide and Presentation feedback for "Human biases and Reward models"
Week 4: Oct 16 [Student Presentation] Metric elicitation
Recommended reading:
  1. Hiranandani, Boodaghians, Mehta, Koyejo. Performance Metric Elicitation from Pairwise Classifier Comparisons. AISTATS. 2019.
  2. Hiranandani, Boodaghians, Mehta, Koyejo. Multiclass Performance Metric Elicitation. NeurIPS. 2019.
  3. Hiranandani, Narasimhan, Koyejo. Fair Performance Metric Elicitation. NeurIPS. 2020.
  4. Hiranandani, Mathur, Narasimhan, Koyejo. Quadratic Metric Elicitation with Application to Fairness. UAI. 2022.
Additional reading:
  1. Ali, Upadhyay, Hiranandani, Glassman, Koyejo. Metric Elicitation: Moving from Theory to Practice. NeurIPS Workshop on Human-Centered AI (HCAI), 2022.
  2. Riabacke, Danielson, Ekenberg, L. State-of-the-Art Prescriptive Criteria Weight Elicitation. Advances in Decision Sciences, 2012.
Deadline:
  1. Presentation slide and Presentation feedback for "Metric elicitation"
  2. Scribe for "Human preferences models"
Week 4: Oct 18 [Student Presentation] Active learning Recommended Readings:
  1. Cohn, Ghahramani, Jordan. Active Learning with Statistical Models. JAIR. 1996.
  2. Biyik, Sadigh. Batch Active Preference-Based Learning of Reward Functions. CORL. 2018.
  3. Sadigh, Dragan, Sastry, Seshia. Active Preference-Based Learning of Reward Functions. UC Berkeley. 2017.
  4. Jamieson, Nowak. Active Ranking Using Pairwise Comparisons. NeurIPS. 2011.
  5. Holladay, Javdani, Dragan, Srinivasa. Active Comparison Based Learning Incorporating User Uncertainty and Noise. RSS Workshop on Model Learning for Human-Robot Communication. 2016.
Additional Readings:
  1. Settles. Active Learning Literature Survey. University of Wisconsin-Madison. 2009.
Deadline:
  1. Presentation slide and Presentation feedback for "Active learning".
  2. Scribe for "Interaction models".
Week 5: Oct 23 [Student Presentation] Bandits and Probabilistic Methods Recommended Readings:
  1. Agarwal, Hsu, Kale, Langford, Li, Schapire. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. In International Conference on Machine Learning, pp. 1638-1646. PMLR, 2014.
  2. Bouneffouf, Rish, Aggarwal. Survey on Applications of Multi-Armed and Contextual Bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1-8. IEEE, 2020.
  3. Sui, Zoghi, Hofmann, Yue. Advancements in Dueling Bandits. In IJCAI, pp. 5502-5510. 2018.
  4. Yue, Broder, Kleinberg, Joachims. The K-Armed Dueling Bandits Problem. Journal of Computer and System Sciences 78, no. 5 (2012): 1538-1556.
Deadline:
  1. Proposal deadline
  2. Presentation slide and Presentation feedback for "Bandits and Probabilistic Methods".
  3. Scribe feedback for "Human preferences models"
Week 5: Oct 25 [Students Presentation] Multimodal rewards; Meta reward learning Recommended reading:
  1. Hejna, Sadigh. Few-Shot Preference Learning for Human-in-the-Loop RL. CoRL, 2023.
  2. Zhou, Jang, Kappler, Herzog, Khansari, Wohlhart, Bai, Kalakrishnan, Levine, Finn. Watch, Try, Learn: Meta-Learning from Demonstrations and Reward. Arxiv, 2019.
  3. Myers, Bıyık, Anari, Sadigh. Learning Multimodal Rewards from Rankings. Arxiv, 2021.
Deadline:
  1. Presentation slide for "Multimodal rewards; Meta reward learning"
  2. Scribe for "Human biases and Reward models"
  3. Scribe feedback for "Interaction models"
Week 6: Oct 30 [Guest lecture] Pat Langley (Institute for the Study of Learning and Expertise): Human computing Recommended reading: Deadline:
  1. Scribe for "Metric elicitation"
  2. Scribe rebuttal for "Human preferences models"
Week 6: Nov 1 [Student presentation] Alignment; Expert and non-expert stakeholders Recommended reading:
  1. Brown, Schneider, Dragan, Due Niekum. Value Alignment Verification. ICML. 2021.
  2. Bobu, Bajcsy, Fisac, Due Dragan. Learning under Misspecified Objective Spaces. CoRL. 2018.
  3. Jeon, Milli, Due Dragan. Reward-Rational (Implicit) Choice: A Unifying Formalism for Reward Learning. NeurIPS. 2020.
  4. Bobu, Peng, Agrawal, Shah, Due Dragan. Aligning Robot and Human Representations. arXiv. 2023.
Deadline
  1. Presentation slide and Presentation feedback for "Alignment; Expert and non-expert stakeholders"
  2. Scribe for "Active learning"
  3. Scirbe feedback for "Human biases and Reward models"
  4. Scribe rebuttal for "Interaction models"
Week 7: Nov 6 [Guest lecture] Meredith Ringel Morris (Google DeepMind): HCI considerations in learning from humans (Virtual) Recommended reading: Deadline
  1. Scribe for Pat Langley
  2. Scribe for "Bandits and Probabilistic Methods"
  3. Scribe feedback for "Metric elicitation"
Week 7: Nov 8 [Guest lecture] Vasilis Syrgkanis(Stanford): Truthfulness and mechanism design Recommended reading:
  1. Balcan, Sandholm, VitercikTutorial on Mechanism Design. 2023
  2. Roughgarden Lectures 1 & 2 on the General Mechanism Design Problem and the Idea of Incentive Compatibility.
  3. Linstone, Turoff. The Delphi Method. Reading, MA: Addison-Wesley, 1975.
  4. Prelec. A Bayesian Truth Serum for Subjective Data. Science. 2004.
Deadline
  1. Scribe for "Multimodal rewards; Meta reward learning"
  2. Scribe feedback for "Active learning"
  3. Scirbe rebuttal for "Human biases and Reward models"
Week 8: Nov 13 [Guest lecture] Jason Hartline (Northwestern): Truthfulness and mechanism design Recommended reading:
  1. Schenk, Guittard. Crowdsourcing: What Can Be Outsourced to the Crowd, and Why?. HAL Open Science. 2009.
  2. Quinn, Bederson. Human Computation: A Survey and Taxonomy of a Growing Field. SIGCHI Conference on Human Factors in Computing Systems. 2011.
  3. Kong. Dominantly Truthful Multi-Task Peer Prediction with a Constant Number of Tasks. ACM-SIAM Symposium on Discrete Algorithms. 2020.
  4. Kong, Schoenebeck. An Information Theoretic Framework for Designing Information Elicitation Mechanisms That Reward Truth-Telling. ACM Transactions on Economics and Computation. 2019.
Deadline
  1. Scribe for Meredith Ringel Morris
  2. Scribe feedback for Pat Langley
  3. Scribe feedback for "Bandits and Probabilistic Methods"
  4. Scribe rebuttal for "Metric elicitation"
Week 8: Nov 15 [Guest lecture] Dorsa Sadigh (Stanford): Inverse reinforcement learning from human feedback for robotics Recommended reading:
  1. Ng, Russell. Algorithms for Inverse Reinforcement Learning. ICML. 2000.
  2. Hadfield-Menell, Russell, Abbeel, Dragan. Cooperative Inverse Reinforcement Learning. NeurIPS. 2016.
  3. Arora, Doshi. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. Artificial Intelligence. 2021.
  4. Hadfield-Menell, Milli, Abbeel, Russell, Dragan. Inverse Reward Design. NeurIPS. 2017.
  5. Shin, Dragan, Brown. Benchmarks and Algorithms for Offline Preference-Based Reward Learning. arXiv. 2023.
  6. Ghosal, Zurek, Brown, Dragan. The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. AAAI. 2023.
  7. Bıyık, Losey, Palan, Landolfi, Shevchuk, Sadigh. Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences. The International Journal of Robotics Research. 2022.
Deadline
  1. Scribe for "Alignment; Expert and non-expert stakeholders"
  2. Scribe for Vasilis Syrgkanis
  3. Scribe feedback for "Multimodal rewards; Meta reward learning"
  4. Scribe rebuttal for Pat Langley
  5. Scribe rebuttal for "Active learning"
Week 9: Nov 20 Thanksgiving Recess (no classes)
Week 9: Nov 22 Thanksgiving Recess (no classes)
Week 10: Nov 27 [Guest Lecture] Diyi Yang (Stanford): Ethics and HCI Recommended reading:
  1. Busarovs. Ethical Aspects of Crowdsourcing, or Is It a Modern Form of Exploitation. International Journal of Economics & Business Administration. 2013.
  2. Denton, Díaz, Kivlichan, Prabhakaran, & Rosen. Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation. arXiv. 2021.
Deadline
  1. Project deadline
  2. Scribe for Jason Hartline
  3. Scribe feedback for Meredith Ringel Morris
  4. Scribe rebuttal for "Bandits and Probabilistic Methods"
Week 10: Nov 29 [Guest Lecture] Nathan Lambert (HuggingFace): Reinforcement learning from human feedback for language models Recommended Readings:
  1. Bansal, Dang, Grover. Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models. arXiv. 2023.
  2. Christiano, Leike, Brown, Martic, Legg, Amodei. Deep Reinforcement Learning from Human Preferences. NeurIPS. 2017.
  3. Ziegler, Stiennon, Wu, Brown, Radford, Amodei, Christiano, Irving. Fine-Tuning Language Models from Human Preferences. arXiv. 2019.
Deadline
  1. Scribe for Dorsa Sadigh
  2. Scirbe feedback for "Alignment; Expert and non-expert stakeholders"
  3. Scribe feedback for Vasilis Syrgkanis
  4. Scribe rebuttal for "Multimodal rewards; Meta reward learning"
  5. Scribe rebuttal for Meredith Ringel Morris
Week 11: Dec 4 [Lecture] Open Questions & Frontiers Recommended Readings:
  1. Wirth, Akrour, Neumann, Fürnkranz. A Survey of Preference-Based Reinforcement Learning Methods. JMLR, 2017.
  2. Casper et al. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Arxiv, 2023.
Deadline
  1. Scribe for Diyi Yang
  2. Scribe feedback for Jason Hartline
Week 11: Dec 6 Poster session
Recommended Readings: None Deadline
  1. Scribe for Nathan Lambert
  2. Scribe feedback for Dorsa Sadigh
  3. Scirbe rebuttal for "Alignment; Expert and non-expert stakeholders"
  4. Scribe rebuttal for Jason Hartline
  5. Scribe rebuttal for Vasilis Syrgkanis
Week 12: Dec 11 Final week: No class Deadline
  1. Scribe rebuttal for Dorsa Sadigh
  2. Scribe feedback for Diyi Yang
  3. Scribe feedback for Nathan Lambert
Week 12: Dec 13 Final week: No class Deadline
  1. Scribe rebuttal for Diyi Yang
  2. Scribe rebuttal for Nathan Lambert

Grading