Skip to content

[train][TBench] Cherrypick Terminus integration and use Harbor#637

Merged
CharlieFRuan merged 1 commit intomainfrom
pr-1106-bump-harbor
Nov 7, 2025
Merged

[train][TBench] Cherrypick Terminus integration and use Harbor#637
CharlieFRuan merged 1 commit intomainfrom
pr-1106-bump-harbor

Conversation

@CharlieFRuan
Copy link
Member

This PR cherrypicks the recent changes on the dev branch for Terminus training on https://github.com/NovaSky-AI/SkyRL/tree/dev/sandboxes, at commit 86de228

Besides, we rebase to accommodate the changes from Harbor (the new version of Sandboxes)

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully integrates Terminus training with Harbor, replacing the older sandboxes implementation. The introduction of TerminalBenchTaskDataset is a key improvement for handling task data. The changes are generally well-structured. My review focuses on improving code correctness, robustness, and maintainability. I've identified a potential infinite loop, a type mismatch when creating GeneratorInput, opportunities to improve logging, and some leftover code that can be cleaned up.

Comment on lines +73 to +96
def __getitem__(self, index: int) -> dict:
"""Get a task path by index as a dictionary with 'prompt', 'env_class', and 'env_extras' keys."""
if index >= len(self.task_paths):
raise IndexError(f"Index {index} out of range for dataset of size {len(self.task_paths)}")
return {
"prompt": str(self.task_paths[index]),
"env_class": None,
"env_extras": {"data_source": str(self.task_paths[index])},
"uid": str(index),
}

def __len__(self) -> int:
"""Return the number of tasks in the dataset."""
return len(self.task_paths)

def __iter__(self):
"""Iterate over all task paths as dictionaries."""
for index, task_path in enumerate(self.task_paths):
yield {
"prompt": str(task_path),
"env_class": None,
"env_extras": {"data_source": str(task_path)},
"uid": str(index),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's code duplication between __getitem__ and __iter__ for creating the item dictionary. This can be simplified by having __iter__ leverage __getitem__.

Additionally, env_class is set to None, which violates the List[str] type hint for env_classes in GeneratorInput. This can cause downstream errors. It's better to return an empty string "" to conform to the type.

Here's a suggested refactoring that addresses both points:

    def __getitem__(self, index: int) -> dict:
        """Get a task path by index as a dictionary with 'prompt', 'env_class', and 'env_extras' keys."""
        if index >= len(self.task_paths):
            raise IndexError(f"Index {index} out of range for dataset of size {len(self.task_paths)}")
        return {
            "prompt": str(self.task_paths[index]),
            "env_class": "",
            "env_extras": {"data_source": str(self.task_paths[index])},
            "uid": str(index),
        }

    def __len__(self) -> int:
        """Return the number of tasks in the dataset."""
        return len(self.task_paths)

    def __iter__(self):
        """Iterate over all task paths as dictionaries."""
        for i in range(len(self)):
            yield self[i]

Comment on lines 69 to 74
input_batch = GeneratorInput(
prompts=["" for _ in range(num_prompts)],
prompts=[item["prompt"] for item in self.train_dataset],
env_classes=None,
env_extras=None,
sampling_params=None,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

GeneratorInput is not being populated correctly. env_classes and env_extras are passed as None, which violates the GeneratorInput type definition. This could lead to runtime errors if the generator's implementation changes to use these fields. They should be populated from the dataset.

Suggested change
input_batch = GeneratorInput(
prompts=["" for _ in range(num_prompts)],
prompts=[item["prompt"] for item in self.train_dataset],
env_classes=None,
env_extras=None,
sampling_params=None,
)
dataset_items = list(self.train_dataset)
input_batch = GeneratorInput(
prompts=[item["prompt"] for item in dataset_items],
env_classes=[item["env_class"] for item in dataset_items],
env_extras=[item["env_extras"] for item in dataset_items],
sampling_params=None,
)

Comment on lines 125 to +140
while True:
results = await trial.run()
reward = results.verifier_result.rewards
chat_history = results.agent_result.all_messages
if len(chat_history) > 0:
break
else:
print(f"[WARNING] Agent {self.agent_name} did not return a response")
try:
results = await trial.run()
print(f"Results: {results}")
if not results.verifier_result:
print(f"[WARNING] Exception info: {results.exception_info}")
continue
reward = results.verifier_result.reward
chat_history = results.agent_result.all_messages
if len(chat_history) > 0:
break
else:
print(f"[WARNING] Agent {self.agent_name} did not return a response")
except Exception as e:
print(f"Error running trial: {e}")
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This while True loop could lead to an infinite loop if trial.run() consistently fails or returns empty results. It's safer to add a maximum number of retries to prevent the process from getting stuck. Also, print statements should be replaced with logger calls for better logging practices.

Here's a suggestion that adds a retry limit and uses logger (you'll need to add from loguru import logger at the top of the file):

Suggested change
while True:
results = await trial.run()
reward = results.verifier_result.rewards
chat_history = results.agent_result.all_messages
if len(chat_history) > 0:
break
else:
print(f"[WARNING] Agent {self.agent_name} did not return a response")
try:
results = await trial.run()
print(f"Results: {results}")
if not results.verifier_result:
print(f"[WARNING] Exception info: {results.exception_info}")
continue
reward = results.verifier_result.reward
chat_history = results.agent_result.all_messages
if len(chat_history) > 0:
break
else:
print(f"[WARNING] Agent {self.agent_name} did not return a response")
except Exception as e:
print(f"Error running trial: {e}")
continue
max_retries = 3
for attempt in range(max_retries):
try:
results = await trial.run()
logger.debug(f"Results: {results}")
if not results.verifier_result:
logger.warning(f"Exception info: {results.exception_info}")
continue
reward = results.verifier_result.reward
chat_history = results.agent_result.all_messages
if len(chat_history) > 0:
break
else:
logger.warning(f"Agent {self.agent_name} did not return a response on attempt {attempt + 1}")
except Exception as e:
logger.error(f"Error running trial on attempt {attempt + 1}: {e}")
continue
else: # no-break
raise RuntimeError(f"Failed to get a valid response from trial after {max_retries} attempts.")

Comment on lines +60 to +61
# If it's a file, treat it as a single task (files can't be valid task directories)
logger.warning(f"File {source_path} cannot be a valid task directory (missing instruction.md)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment here is misleading. It states that a file is treated "as a single task", but the code only logs a warning and then skips the file. The comment should be updated to reflect the actual behavior, which is to ignore files.

Suggested change
# If it's a file, treat it as a single task (files can't be valid task directories)
logger.warning(f"File {source_path} cannot be a valid task directory (missing instruction.md)")
# Files are not valid task directories, so log a warning and skip.
logger.warning(f"File {source_path} is not a directory and cannot be a valid task, skipping.")

Comment on lines +102 to +104
def collate_fn(self, item_list):
"""Collate function for batching task dictionaries."""
return item_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding type hints to collate_fn will improve code clarity and enable better static analysis.

    def collate_fn(self, item_list: List[dict]) -> List[dict]:
        """Collate function for batching task dictionaries."""
        return item_list

NUM_GPUS=1
LOGGER="console" # change to "console" to print to stdout
TBENCH_CONFIG_DIR="examples/terminal_bench"
SANDBOXES_DIR="sandboxes" # TODO: For now, `sandboxes` is cloned into SkyRL/skyrl-train.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SANDBOXES_DIR variable appears to be a remnant from the previous sandboxes implementation and is no longer used. It should be removed to avoid confusion.

NUM_GPUS=1
LOGGER="console" # change to "console" to print to stdout
TBENCH_CONFIG_DIR="examples/terminal_bench"
SANDBOXES_DIR="sandboxes" # TODO: For now, `sandboxes` is cloned into SkyRL/skyrl-train.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SANDBOXES_DIR variable is defined but no longer used in this script since its usage was removed in this PR. It should be removed to keep the script clean.

@CharlieFRuan CharlieFRuan merged commit b7890a0 into main Nov 7, 2025
3 checks passed
li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025
…ky-AI#637)

This PR cherrypicks the recent changes on the dev branch for Terminus
training on https://github.com/NovaSky-AI/SkyRL/tree/dev/sandboxes, at
commit
NovaSky-AI@86de228

Besides, we rebase to accommodate the changes from Harbor (the new
version of Sandboxes)
@tyler-griggs tyler-griggs deleted the pr-1106-bump-harbor branch January 9, 2026 19:33
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
…ky-AI#637)

This PR cherrypicks the recent changes on the dev branch for Terminus
training on https://github.com/NovaSky-AI/SkyRL/tree/dev/sandboxes, at
commit
NovaSky-AI@02c0f37

Besides, we rebase to accommodate the changes from Harbor (the new
version of Sandboxes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant