Examples of tasks learned solely with SOLE-R1 rewards through online RL (starting from random policies)