Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions skyrl-train/docs/examples/mini_swe_agent.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ By running this workflow as a Ray task, we are also able to scale up generation
model = get_model(litellm_model_name, sweagent_config.get("model", {}))
error = None
try:
env = get_sb_environment(sweagent_config, instance, data_source)
agent = DefaultAgent(model, env, **sweagent_config.get("agent", {}))
exit_status, model_patch = agent.run(instance["problem_statement"])
eval_result = evaluate_trajectory(instance, model_patch, sweagent_config, data_source)
env = get_sb_environment(sweagent_config, instance, data_source)
agent = DefaultAgent(model, env, **sweagent_config.get("agent", {}))
exit_status, model_patch = agent.run(instance["problem_statement"])
eval_result = evaluate_trajectory(instance, model_patch, sweagent_config, data_source)
except Exception as e:
error = str(e)
return agent.messages, eval_result, error
Expand Down Expand Up @@ -90,7 +90,7 @@ Training

Prerequisites: Ensure that you have the required environment backend installed for generating trajectories with Mini-SWE-Agent. By default, we use `Podman <https://podman.io/docs>`_. This can be modified in :code_link:`examples/mini_swe_agent/swebench.yaml`

We provide two example scripts: One for Qwen3-8B model and another for the `Qwen/Qwen3-Coder-30B-A3B-Instruct <https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct>` model. While the first script for Qwen3-8B requires a single 8xH100 node, the script for the 30B model requires 2 8xH100 nodes for training.
We provide two example scripts: One for Qwen3-8B model and another for the `Qwen/Qwen3-Coder-30B-A3B-Instruct <https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct>`_ model. While the first script for Qwen3-8B requires a single 8xH100 node, the script for the 30B model requires 2 8xH100 nodes for training.

.. code-block:: bash

Expand Down
5 changes: 2 additions & 3 deletions skyrl-train/examples/mini_swe_agent/mini_swe_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,9 @@ async def minisweagent_agent_loop(
) -> Tuple[List[int], float, str, List[int], List[int], Optional[List[int]]]:

sweagent_config = yaml.safe_load(get_config_path(self.generator_cfg.miniswe_config_path).read_text())
instance: Dict[str, Dict[str, Any]] = env_extras["instance"]
# NOTE (sumanthrh): Input `prompt` is not used here because mini-swe-agent uses a similar entry from the `instance` obj
messages, reward, error = await init_and_run.remote(
instance,
env_extras["instance"],
self.litellm_model_name,
sweagent_config,
self.generator_cfg,
Expand All @@ -136,7 +135,7 @@ async def minisweagent_agent_loop(
if not len(messages):
return None, None, None, None, None, None

# TODO (sumanthrh):This is currently hardcoded for SWEBench with 2 initial messages (system and user).
# TODO (sumanthrh): This is currently hardcoded for SWEBench with 2 initial messages (system and user).
response_messages = messages[2:]

for message in messages[:2]:
Expand Down
3 changes: 1 addition & 2 deletions skyrl-train/examples/mini_swe_agent/mini_swe_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,8 @@ def get_docker_image_name(instance: dict, data_source: str) -> str:
def evaluate_trajectory(
instance: Dict[str, Any], model_patch: str, sweagent_config: dict, data_source: str
) -> MiniSWEEvaluationResult:
instance_id = instance["instance_id"]

ret = MiniSWEEvaluationResult(instance_id=instance_id, resolved=False, eval_error=None)
ret = MiniSWEEvaluationResult(instance_id=instance["instance_id"], resolved=False, eval_error=None)

env = None
try:
Expand Down
4 changes: 3 additions & 1 deletion skyrl-train/examples/mini_swe_agent/run_mini_swe_30B.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ set -x
DATA_DIR="$DATA/data/swe_gym_subset"

CKPT_PATH="$DATA/ckpts/llm_mini_swe"
# save trajectories here for debugging.

# Save trajectories here for debugging.
# NOTE: For a multi-node cluster, ensure that this is on NFS so that you can save all trajectories in the same path
MINISWE_TRAJ_DIR="$HOME/mini_swe_agent_trajs_32B"

NUM_GPUS=8
Expand Down
4 changes: 3 additions & 1 deletion skyrl-train/examples/mini_swe_agent/run_mini_swe_8B.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ set -x

DATA_DIR="$HOME/data/swe_gym_subset"
CKPT_PATH="$HOME/ckpts/llm_mini_swe"
# save trajectories here

# Save trajectories here for debugging
# NOTE: For a multi-node cluster, ensure that this is on NFS so that you can save all trajectories in the same path
MINISWE_TRAJ_DIR="$HOME/mini_swe_agent_trajs"

NUM_GPUS=8
Expand Down