B-T simplification of Christiano preference learning - Pastebin.com

B-T simplification of Christiano preference learning - Pastebin.com

B-T simplification of Christiano preference learning a guest May 16th, 2019 33 Never b Sign Up , it unlocks many cool features! Playing with GPT-2 for various things (mostly poetry: https://www.gwern.net/GPT-2 ), I've been thinking about the potential for preference learning and I think the original architecture can be simplified & improved. The motivation for the double-critic architecture is that the data being collected from humans is pairwise, and so one trains the critic to predict c

1 mentions: @gwern
Date: 2019/05/16 15:47

Referring Tweets

@gwern Another idea: simplify by dropping the comparison part from the NN architecture, which complicates it considerably and makes it harder to use for anything else. A simple Bradley-Terry model can produce real cardinal values to train the critic w/regression: https://t.co/BvumwIauF8