Skip to content

[fix] Resolve timeout and cleanup issues in GPU CI pipeline#483

Merged
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
tyler-griggs:tgriggs/ci_fixes
Oct 15, 2025
Merged

[fix] Resolve timeout and cleanup issues in GPU CI pipeline#483
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
tyler-griggs:tgriggs/ci_fixes

Conversation

@tyler-griggs
Copy link
Member

@tyler-griggs tyler-griggs commented Oct 15, 2025

The primary change was to remove the bespoke ways that different tests are setting up and tearing down resources (e.g,. ray.init(), ray.shutdown()), and in this PR we instead prefer the use of ray_init_fixture.

This also fixes an indent error in the creation of dp>1 Ray-wrapped inference engines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff looks messy, but it was mostly just removing the try/except and un-indenting the rest of the tests

**lora_kwargs,
)
inference_engine_actors.append(engine)
engine = actor_class.options(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embarrassing indent error in the dp engine creation path..

server_process = subprocess.Popen(remote_server_command, env=env)

wait_for_server(url=f"localhost:{engine_port}", health_path="health")
wait_for_server(url=f"localhost:{engine_port}", health_path="health", timeout=120)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump timeout from 60s to 120s because our tp=2,dp=2 runs were taking ~63s

@tyler-griggs tyler-griggs marked this pull request as ready for review October 15, 2025 19:15
Copy link
Collaborator

@erictang000 erictang000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!!!! 🐐🐐🐐

@erictang000 erictang000 merged commit 0c237a1 into NovaSky-AI:main Oct 15, 2025
3 checks passed
li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025
…AI#483)

The primary change was to remove the bespoke ways that different tests
are setting up and tearing down resources (e.g,. `ray.init()`,
`ray.shutdown()`), and in this PR we instead prefer the use of
`ray_init_fixture`.

This also fixes an indent error in the creation of dp>1 Ray-wrapped
inference engines.
@SumanthRH SumanthRH mentioned this pull request Dec 8, 2025
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
…AI#483)

The primary change was to remove the bespoke ways that different tests
are setting up and tearing down resources (e.g,. `ray.init()`,
`ray.shutdown()`), and in this PR we instead prefer the use of
`ray_init_fixture`.

This also fixes an indent error in the creation of dp>1 Ray-wrapped
inference engines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants