Christiano et al. (2017) 这篇文章的题目是 Deep reinforcement learning from human preferences,发表在 NeurIPS 2017;arxiv:
https://arxiv.org/abs/1706.03741 ,GitHub:
https://github.com/mrahtz/learning-from-human-preferences(用 TensorFlow 实现的)。