Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 6 additions & 12 deletions spec/locale-sensitive-functions.html
Original file line number Diff line number Diff line change
Expand Up @@ -65,27 +65,21 @@ <h1>String.prototype.toLocaleLowerCase ( [ _locales_ ] )</h1>
1. Else,
1. Let _requestedLocale_ be DefaultLocale().
1. Let _noExtensionsLocale_ be the String value that is _requestedLocale_ with all Unicode locale extension sequences (<emu-xref href="#sec-unicode-locale-extension-sequences"></emu-xref>) removed.
1. Let _availableLocales_ be a List with the language tags of the languages for which the Unicode character database contains language sensitive case mappings.
1. Let _availableLocales_ be a List with language tags that includes the languages for which the Unicode Character Database contains language sensitive case mappings. Implementations may add additional language tags if they support case mapping for additional locales.
1. Let _locale_ be BestAvailableLocale(_availableLocales_, _noExtensionsLocale_).
1. If _locale_ is *undefined*, let _locale_ be `"und"`.
1. Let _cpList_ be a List containing in order the code points of _S_ as defined in ES2020, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>, starting at the first element of _S_.
1. For each code point _c_ in _cpList_, if the Unicode Character Database provides a lower case equivalent of _c_ that is either language insensitive or for the language _locale_, replace _c_ in _cpList_ with that/those equivalent code point(s).
1. Let _cuList_ be a new empty List.
1. For each code point _c_ in _cpList_, in order, append to _cuList_ the elements of the UTF-16 Encoding (defined in ES2020, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>) of _c_.
1. Let _L_ be a String whose elements are, in order, the elements of _cuList_.
1. Let _cuList_ be a List where the elements are the result of a lower case transformation the ordered code points in _cpList_ according to the Unicode Default Case Conversion algorithm or an implementation defined conversion algorithm. A conforming implementation's lower case transformation algorithm must always yield the same _cpList_ given the same _cuList_ and locale.
1. Let _L_ be a String whose elements are the UTF-16 Encoding (defined in ES2020, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>) of the code points of _cuList_.
1. Return _L_.
</emu-alg>

<p>
The result must be derived according to the case mappings in the Unicode character database (this explicitly includes not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it).
Lower case code point mappings may be derived according to a tailored version of the Default Case Conversion Algorithms of the Unicode Standard. Implementations may use locale specific tailoring defined in SpecialCasings.txt and/or CLDR and/or any other custom tailoring.
</p>

<emu-note>
As of Unicode 10.0, the _availableLocales_ list contains the elements `"az"`, `"lt"`, and `"tr"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be valid to leave in this line, modifying it to say that availableLocales must include those three (but may include others). What would you think of that change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more locales in unicode that have case mapping than those three -- it seems misleading to only mention them? Do you think I should include a list of all of them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I am fine with removing this list.

</emu-note>

<emu-note>
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both *toLocaleUpperCase* and *toLocaleLowerCase* have context-sensitive behaviour, the functions are not symmetrical. In other words, *s.toLocaleUpperCase().toLocaleLowerCase()* is not necessarily equal to *s.toLocaleLowerCase()*.
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both `toLocaleUpperCase` and `toLocaleLowerCase` have context-sensitive behaviour, the functions are not symmetrical. In other words, `s.toLocaleUpperCase().toLocaleLowerCase()` is not necessarily equal to `s.toLocaleLowerCase()`.
</emu-note>

<emu-note>
Expand All @@ -102,7 +96,7 @@ <h1>String.prototype.toLocaleUpperCase ( [ _locales_ ] )</h1>
</p>

<p>
This function interprets a string value as a sequence of code points, as described in ES2020, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>. This function behaves in exactly the same way as `String.prototype.toLocaleLowerCase`, except that characters are mapped to their _uppercase_ equivalents as specified in the Unicode character database.
This function interprets a string value as a sequence of code points, as described in ES2020, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>. This function behaves in exactly the same way as `String.prototype.toLocaleLowerCase`, except that characters are mapped to their _uppercase_ equivalents. A conforming implementation's upper case transformation algorithm must always yield the same result given the same sequence of code points and locale.
</p>

<emu-note>
Expand Down