Skip to content

Remove unused utility methods and inline remaining contents of Utils.swift#312

Merged
pcuenca merged 7 commits intomainfrom
mattt/remove-unused-utils
Feb 12, 2026
Merged

Remove unused utility methods and inline remaining contents of Utils.swift#312
pcuenca merged 7 commits intomainfrom
mattt/remove-unused-utils

Conversation

@mattt
Copy link
Collaborator

@mattt mattt commented Feb 12, 2026

While reviewing #308, I noticed a change from CFAbsoluteTimeGetCurrent to Date().timeIntervalSinceReferenceDate that I wanted to double-check, and noticed that we aren't actually using that anywhere.

This PR removes this and the other unused utility methods in Utils.swift, and inlines the remaining ones.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the Utils.swift file containing utility methods that are either unused or can be inlined at their call sites. The change simplifies the codebase by eliminating a utility class and directly incorporating the necessary functionality where it's used.

Changes:

  • Removed unused utility methods (time, dateNow, clamp, fakeThrowable) from Utils.swift
  • Inlined dictionary inversion logic using reduce(into:) at three call sites
  • Converted isChineseChar static function to a UnicodeScalar extension property isCJKUnifiedIdeograph
  • Inlined substring extraction logic in BertTokenizer's WordpieceTokenizer
  • Moved PUNCTUATION_REGEX constant from public Constants enum to a private file-level constant in PreTokenizer.swift

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Sources/Tokenizers/Utils.swift Entire file removed, eliminating unused utilities and those that have been inlined
Sources/Tokenizers/PreTokenizer.swift Inlined punctuation regex as a private file-level constant
Sources/Tokenizers/Normalizer.swift Added UnicodeScalar extension with isCJKUnifiedIdeograph property to replace Utils.isChineseChar
Sources/Tokenizers/ByteEncoder.swift Inlined dictionary inversion using reduce(into:)
Sources/Tokenizers/BertTokenizer.swift Inlined dictionary inversion, substring extraction, and updated to use isCJKUnifiedIdeograph extension
Sources/Tokenizers/BPETokenizer.swift Inlined dictionary inversion using reduce(into:)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, thanks for taking the time to double-check and clean up!

@pcuenca pcuenca merged commit f2ed1cd into main Feb 12, 2026
8 checks passed
@pcuenca pcuenca deleted the mattt/remove-unused-utils branch February 12, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants