Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. P

Date: 2021/01/13 00:52

@agarwl_ Yayy, my first spotlight! ICLR'21 AC: "The reviewers unanimously praised the work in terms of theory, algorithm and empirical evaluation. This is a novel and technically deep contribution that advances the SOTA for RL generalization." #tweeprint soon.
@agarwl_ @bucketofkets @rico_jski [Self-Plug] We do have a contrastive method (although with a bit of sequential aspect of RL baked in) perform quite well on this benchmark.

