Skip to content

[train] Add per-token hard masking for off-policy correction#1264

Merged
erictang000 merged 7 commits intomainfrom
tgriggs/per-token-masking
Mar 4, 2026
Merged

[train] Add per-token hard masking for off-policy correction#1264
erictang000 merged 7 commits intomainfrom
tgriggs/per-token-masking

Conversation

@tyler-griggs
Copy link
Copy Markdown
Member

@tyler-griggs tyler-griggs commented Mar 3, 2026

Summary

  • Zeros individual divergent tokens where train/infer IS ratio exits configurable bounds
  • Unlike outlier masking (rejects entire sequences), this masks only the specific tokens
  • Configure with off_policy_correction.token_mask_eps_low/high
  • For full IS-corrected masking, combine with tis_ratio_type: "token"

Zeros individual tokens where the train/infer importance ratio falls
outside configurable bounds, while keeping the rest of the sequence.
Unlike outlier_token_mask (which rejects entire sequences), this
surgically removes only the divergent tokens.

Configure with:
    off_policy_correction.token_mask_eps_low: 0.2   # lower bound = 0.8
    off_policy_correction.token_mask_eps_high: 0.28  # upper bound = 1.28

For full IS-corrected masking, combine with tis_ratio_type: "token".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erictang000 erictang000 marked this pull request as ready for review March 4, 2026 00:54
@erictang000
Copy link
Copy Markdown
Collaborator

image

@erictang000
Copy link
Copy Markdown
Collaborator

image

gemini-code-assist[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@erictang000
Copy link
Copy Markdown
Collaborator

token metric propagated correctly:

image

@erictang000 erictang000 merged commit 4ded618 into main Mar 4, 2026
7 checks passed
@erictang000 erictang000 deleted the tgriggs/per-token-masking branch March 4, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants