Focused Skill Discovery

Learning to Control Specific State Variables while Minimizing Side Effects

1McGill University and Mila, 2MIT, 3UC Berkeley
Reinforcement Learning Conference (RLC) 2025

Abstract

Skills are essential for unlocking higher levels of problem solving. A common approach to discovering these skills is to learn ones that reliably reach different states, thus empowering the agent to control its environment. However, existing skill discovery algorithms often overlook the natural state variables present in many reinforcement learning problems, meaning that the discovered skills lack control of specific state variables. This can significantly hamper exploration efficiency, make skills more challenging to learn with, and lead to negative side effects in downstream tasks when the goal is under-specified. We introduce a general method that enables these skill discovery algorithms to learn focused skills---skills that target and control specific state variables. Our approach improves state space coverage by a factor of three, unlocks new learning capabilities, and automatically avoids negative side effects in downstream tasks.

Examples

FourRooms
MY ALT TEXT

Skill Trajectories in FourRooms.

ForageWorld
MY ALT TEXT

Skill Trajectories in ForageWorld.

MudWorld
MY ALT TEXT

Skill Trajectories in MudWorld.

Exploration Efficiency

MY ALT TEXT

Exploration efficiency in the FourRooms environment, as measured by the number of unique states visited for a given number of skill executions. Focused skills explore three times more efficiently than unfocused skills, as measured by the Area Under the Curve (AUC). DUSDi is not applicable to Lipschitz-constrained Skill Discovery (LSD).

Performance in Downstream Tasks

MY ALT TEXT

Learning performance in downstream tasks. Focused skills (solid lines) lead to faster learning and are the only ones which can accomplish the task in MudWorld.

Performance with Underspecified Goals

MY ALT TEXT

Task performance in ForageWorld and MudWorld when agents are trained with a proxy reward instead of the true reward. Focused skills (solid lines) are the only ones which maximize the true return when only given the proxy reward, meaning that they are the only ones that can automatically avoid making unwanted changes that are not explicit in the agent's goal.

BibTeX

@article{carr2025focused,
    title={Focused skill discovery: {L}earning to control specific state variables while minimizing side effects},
    author={Cola{\c{c}}o Carr, Jonathan and Sun, Qinyi and Allen, Cameron},
    journal={Reinforcement Learning Journal},
    volume={6},
    pages={2585--2599},
    year={2025}
}