CS329H: Machine Learning from Human Preference

Schedule

The current class schedule is below (subject to change). A tentative reading list can be found here.

Date	Description	Course Materials	Deadline
Week 1: Sep 27	[Lecture] Course Introduction.	Recommended reading: None	Deadline: Sign-up for Presentation and Scirbe
Week 2: Oct 2	[Lecture] Human preferences models.	Recommended reading: Train. Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. MIT Press. 1985. McFadden, Train. Mixed MNL Models for Discrete Response. Journal of Applied Econometrics. 2000. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley. 1959. Additional reading: Ben-Akiva, Lerman. Discrete Choice Analysis: Theory and Application to Travel Demand. Transportation Studies. 1985. Park, Simar, Zelenyuk. Nonparametric Estimation of Dynamic Discrete Choice Models for Time Series Data. Computational Statistics & Data Analysis. 2017. Rafailov, Sharma, Mitchell, Ermon, Manning, Finn. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. Preprint. 2023.	Deadline: Pre-class survey
Week 2: Oct 4	[Student Presentation] Interaction models	Recommended reading: Cattelan. Models for Paired Comparison Data: A Review with Emphasis on Dependent Data. Statistical Science. 2012. Bhatia, Pananjady, Bartlett, Dragan, Wainwright. Preference Learning Along Multiple Criteria: A Game-Theoretic Perspective. NeurIPS. 2020. Shah, Gundotra, Abbeel, Dragan. On the Feasibility of Learning, Rather Than Assuming, Human Biases for Reward Inference. ICML. 2019. Ghosal, Zurek, Brown, Dragan. The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. AAAI. 2023.	Deadline: Presentation slide and Presentation feedback for "Interaction models".
Week 3: Oct 9	[Fireside chat] Psychology and Marketing Perspectives: Noah Goodman, Jonathan Levav, S. Christian Wheeler	Additional reading: Evangelidis, Levav, Simonson. The Upscaling Effect: How the Decision Context Influences Tradeoffs Between Desirability and Feasibility. Journal of Consumer Research. 2023. Evangelidis, Levav, Simonson. A Reexamination of the Impact of Decision Conflict on Choice Deferral. Management Science. 2023. Shennib, Catapano, Levav. Preference Reversals Between Digital and Physical Goods. ACR North American Advances. 2019. Tamkin, Handa, Shrestha, Goodman. Task Ambiguity in Humans and Language Models. arXiv. 2022. Hawkins, Berdahl, Pentland, Tenenbaum, Goodman, Krafft. Flexible Social Inference Facilitates Targeted Social Learning When Rewards Are Not Observable. Nature Human Behaviour. 2023. Yu, Goodman, Mu. Characterizing Tradeoffs Between Teaching via Language and Demonstrations in Multi-Agent Systems. arXiv. 2023.	Deadline: None
Week 3: Oct 11	[Student Presentation] Human biases and Reward models	Recommended reading: The Decision Lab. Biases Index. 2023. Slovic. The Construction of Preference. Shaping Entrepreneurship Research. 2020. Hogarth. Insights in Decision Making: A Tribute to Hillel J. Einhorn. University of Chicago Press. 1990. Cooke. Experts in Uncertainty: Opinion and Subjective Probability in Science. Oxford University Press. 1991. Chan, Critch, Dragan. Human Irrationality: Both Bad and Good for Reward Inference. arXiv. 2021. Bobu, Scobee, Fisac, Sastry, Dragan. Less is More: Rethinking Probabilistic Models of Human Behavior. ACM/IEEE International Conference on Human-Robot Interaction. 2020.	Deadline: Presentation slide and Presentation feedback for "Human biases and Reward models"
Week 4: Oct 16	[Student Presentation] Metric elicitation	Recommended reading: Hiranandani, Boodaghians, Mehta, Koyejo. Performance Metric Elicitation from Pairwise Classifier Comparisons. AISTATS. 2019. Hiranandani, Boodaghians, Mehta, Koyejo. Multiclass Performance Metric Elicitation. NeurIPS. 2019. Hiranandani, Narasimhan, Koyejo. Fair Performance Metric Elicitation. NeurIPS. 2020. Hiranandani, Mathur, Narasimhan, Koyejo. Quadratic Metric Elicitation with Application to Fairness. UAI. 2022. Additional reading: Ali, Upadhyay, Hiranandani, Glassman, Koyejo. Metric Elicitation: Moving from Theory to Practice. NeurIPS Workshop on Human-Centered AI (HCAI), 2022. Riabacke, Danielson, Ekenberg, L. State-of-the-Art Prescriptive Criteria Weight Elicitation. Advances in Decision Sciences, 2012.	Deadline: Presentation slide and Presentation feedback for "Metric elicitation" Scribe for "Human preferences models"
Week 4: Oct 18	[Student Presentation] Active learning	Recommended Readings: Cohn, Ghahramani, Jordan. Active Learning with Statistical Models. JAIR. 1996. Biyik, Sadigh. Batch Active Preference-Based Learning of Reward Functions. CORL. 2018. Sadigh, Dragan, Sastry, Seshia. Active Preference-Based Learning of Reward Functions. UC Berkeley. 2017. Jamieson, Nowak. Active Ranking Using Pairwise Comparisons. NeurIPS. 2011. Holladay, Javdani, Dragan, Srinivasa. Active Comparison Based Learning Incorporating User Uncertainty and Noise. RSS Workshop on Model Learning for Human-Robot Communication. 2016. Additional Readings: Settles. Active Learning Literature Survey. University of Wisconsin-Madison. 2009.	Deadline: Presentation slide and Presentation feedback for "Active learning". Scribe for "Interaction models".
Week 5: Oct 23	[Student Presentation] Bandits and Probabilistic Methods	Recommended Readings: Agarwal, Hsu, Kale, Langford, Li, Schapire. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. In International Conference on Machine Learning, pp. 1638-1646. PMLR, 2014. Bouneffouf, Rish, Aggarwal. Survey on Applications of Multi-Armed and Contextual Bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1-8. IEEE, 2020. Sui, Zoghi, Hofmann, Yue. Advancements in Dueling Bandits. In IJCAI, pp. 5502-5510. 2018. Yue, Broder, Kleinberg, Joachims. The K-Armed Dueling Bandits Problem. Journal of Computer and System Sciences 78, no. 5 (2012): 1538-1556.	Deadline: Proposal deadline Presentation slide and Presentation feedback for "Bandits and Probabilistic Methods". Scribe feedback for "Human preferences models"
Week 5: Oct 25	[Students Presentation] Multimodal rewards; Meta reward learning	Recommended reading: Hejna, Sadigh. Few-Shot Preference Learning for Human-in-the-Loop RL. CoRL, 2023. Zhou, Jang, Kappler, Herzog, Khansari, Wohlhart, Bai, Kalakrishnan, Levine, Finn. Watch, Try, Learn: Meta-Learning from Demonstrations and Reward. Arxiv, 2019. Myers, Bıyık, Anari, Sadigh. Learning Multimodal Rewards from Rankings. Arxiv, 2021.	Deadline: Presentation slide for "Multimodal rewards; Meta reward learning" Scribe for "Human biases and Reward models" Scribe feedback for "Interaction models"
Week 6: Oct 30	[Guest lecture] Pat Langley (Institute for the Study of Learning and Expertise): Human computing	Recommended reading:	Deadline: Scribe for "Metric elicitation" Scribe rebuttal for "Human preferences models"
Week 6: Nov 1	[Student presentation] Alignment; Expert and non-expert stakeholders	Recommended reading: Brown, Schneider, Dragan, Due Niekum. Value Alignment Verification. ICML. 2021. Bobu, Bajcsy, Fisac, Due Dragan. Learning under Misspecified Objective Spaces. CoRL. 2018. Jeon, Milli, Due Dragan. Reward-Rational (Implicit) Choice: A Unifying Formalism for Reward Learning. NeurIPS. 2020. Bobu, Peng, Agrawal, Shah, Due Dragan. Aligning Robot and Human Representations. arXiv. 2023.	Deadline Presentation slide and Presentation feedback for "Alignment; Expert and non-expert stakeholders" Scribe for "Active learning" Scirbe feedback for "Human biases and Reward models" Scribe rebuttal for "Interaction models"
Week 7: Nov 6	[Guest lecture] Meredith Ringel Morris (Google DeepMind): HCI considerations in learning from humans (Virtual)	Recommended reading:	Deadline Scribe for Pat Langley Scribe for "Bandits and Probabilistic Methods" Scribe feedback for "Metric elicitation"
Week 7: Nov 8	[Guest lecture] Vasilis Syrgkanis(Stanford): Truthfulness and mechanism design	Recommended reading: Balcan, Sandholm, VitercikTutorial on Mechanism Design. 2023 Roughgarden Lectures 1 & 2 on the General Mechanism Design Problem and the Idea of Incentive Compatibility. Linstone, Turoff. The Delphi Method. Reading, MA: Addison-Wesley, 1975. Prelec. A Bayesian Truth Serum for Subjective Data. Science. 2004.	Deadline Scribe for "Multimodal rewards; Meta reward learning" Scribe feedback for "Active learning" Scirbe rebuttal for "Human biases and Reward models"
Week 8: Nov 13	[Guest lecture] Jason Hartline (Northwestern): Truthfulness and mechanism design	Recommended reading: Schenk, Guittard. Crowdsourcing: What Can Be Outsourced to the Crowd, and Why?. HAL Open Science. 2009. Quinn, Bederson. Human Computation: A Survey and Taxonomy of a Growing Field. SIGCHI Conference on Human Factors in Computing Systems. 2011. Kong. Dominantly Truthful Multi-Task Peer Prediction with a Constant Number of Tasks. ACM-SIAM Symposium on Discrete Algorithms. 2020. Kong, Schoenebeck. An Information Theoretic Framework for Designing Information Elicitation Mechanisms That Reward Truth-Telling. ACM Transactions on Economics and Computation. 2019.	Deadline Scribe for Meredith Ringel Morris Scribe feedback for Pat Langley Scribe feedback for "Bandits and Probabilistic Methods" Scribe rebuttal for "Metric elicitation"
Week 8: Nov 15	[Guest lecture] Dorsa Sadigh (Stanford): Inverse reinforcement learning from human feedback for robotics	Recommended reading: Ng, Russell. Algorithms for Inverse Reinforcement Learning. ICML. 2000. Hadfield-Menell, Russell, Abbeel, Dragan. Cooperative Inverse Reinforcement Learning. NeurIPS. 2016. Arora, Doshi. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. Artificial Intelligence. 2021. Hadfield-Menell, Milli, Abbeel, Russell, Dragan. Inverse Reward Design. NeurIPS. 2017. Shin, Dragan, Brown. Benchmarks and Algorithms for Offline Preference-Based Reward Learning. arXiv. 2023. Ghosal, Zurek, Brown, Dragan. The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. AAAI. 2023. Bıyık, Losey, Palan, Landolfi, Shevchuk, Sadigh. Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences. The International Journal of Robotics Research. 2022.	Deadline Scribe for "Alignment; Expert and non-expert stakeholders" Scribe for Vasilis Syrgkanis Scribe feedback for "Multimodal rewards; Meta reward learning" Scribe rebuttal for Pat Langley Scribe rebuttal for "Active learning"
Week 9: Nov 20	Thanksgiving Recess (no classes)
Week 9: Nov 22	Thanksgiving Recess (no classes)
Week 10: Nov 27	[Guest Lecture] Diyi Yang (Stanford): Ethics and HCI	Recommended reading: Busarovs. Ethical Aspects of Crowdsourcing, or Is It a Modern Form of Exploitation. International Journal of Economics & Business Administration. 2013. Denton, Díaz, Kivlichan, Prabhakaran, & Rosen. Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation. arXiv. 2021.	Deadline Project deadline Scribe for Jason Hartline Scribe feedback for Meredith Ringel Morris Scribe rebuttal for "Bandits and Probabilistic Methods"
Week 10: Nov 29	[Guest Lecture] Nathan Lambert (HuggingFace): Reinforcement learning from human feedback for language models	Recommended Readings: Bansal, Dang, Grover. Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models. arXiv. 2023. Christiano, Leike, Brown, Martic, Legg, Amodei. Deep Reinforcement Learning from Human Preferences. NeurIPS. 2017. Ziegler, Stiennon, Wu, Brown, Radford, Amodei, Christiano, Irving. Fine-Tuning Language Models from Human Preferences. arXiv. 2019.	Deadline Scribe for Dorsa Sadigh Scirbe feedback for "Alignment; Expert and non-expert stakeholders" Scribe feedback for Vasilis Syrgkanis Scribe rebuttal for "Multimodal rewards; Meta reward learning" Scribe rebuttal for Meredith Ringel Morris
Week 11: Dec 4	[Lecture] Open Questions & Frontiers	Recommended Readings: Wirth, Akrour, Neumann, Fürnkranz. A Survey of Preference-Based Reinforcement Learning Methods. JMLR, 2017. Casper et al. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Arxiv, 2023.	Deadline Scribe for Diyi Yang Scribe feedback for Jason Hartline
Week 11: Dec 6	Poster session	Recommended Readings: None	Deadline Scribe for Nathan Lambert Scribe feedback for Dorsa Sadigh Scirbe rebuttal for "Alignment; Expert and non-expert stakeholders" Scribe rebuttal for Jason Hartline Scribe rebuttal for Vasilis Syrgkanis
Week 12: Dec 11	Final week: No class		Deadline Scribe rebuttal for Dorsa Sadigh Scribe feedback for Diyi Yang Scribe feedback for Nathan Lambert
Week 12: Dec 13	Final week: No class		Deadline Scribe rebuttal for Diyi Yang Scribe rebuttal for Nathan Lambert

CS329H: Machine Learning from Human Preferences

Autumn 2023

Content

Instructor

Course Assistant

Logistics

Schedule

Grading