In Ril

Imitation or reinforcement policy fine-tuning? You may need both! Check out IN-RIL which interleaves between two objectives for better sample-efficiency and stability.