doc: improve Buffer's encoding documentation

- Add a paragraph about case-insensitivity of encoding options.
- Document "utf-8", "utf-16le" and "ucs-2" aliases.
- Always use "utf8" in documentation for defaults and examples.

PR-URL: https://github.com/nodejs/node/pull/37945
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
This commit is contained in:
Michaël Zasso 2021-03-27 14:26:39 +01:00 committed by Myles Borins
parent f3fabb57cf
commit a57dc06adf
No known key found for this signature in database
GPG Key ID: 933B01F40B5CA946
3 changed files with 20 additions and 16 deletions

View File

@ -79,17 +79,21 @@ console.log(Buffer.from('fhqwhgads', 'utf16le'));
// Prints: <Buffer 66 00 68 00 71 00 77 00 68 00 67 00 61 00 64 00 73 00>
```
Node.js buffers accept all case variations of encoding strings that they
receive. For example, UTF-8 can be specified as `'utf8'`, `'UTF8'` or `'uTf8'`.
The character encodings currently supported by Node.js are the following:
* `'utf8'`: Multi-byte encoded Unicode characters. Many web pages and other
document formats use [UTF-8][]. This is the default character encoding.
When decoding a `Buffer` into a string that does not exclusively contain
valid UTF-8 data, the Unicode replacement character `U+FFFD` <20> will be used
to represent those errors.
* `'utf8'` (alias: `'utf-8'`): Multi-byte encoded Unicode characters. Many web
pages and other document formats use [UTF-8][]. This is the default character
encoding. When decoding a `Buffer` into a string that does not exclusively
contain valid UTF-8 data, the Unicode replacement character `U+FFFD` <20> will be
used to represent those errors.
* `'utf16le'`: Multi-byte encoded Unicode characters. Unlike `'utf8'`, each
character in the string will be encoded using either 2 or 4 bytes.
Node.js only supports the [little-endian][endianness] variant of [UTF-16][].
* `'utf16le'` (alias: `'utf-16le'`): Multi-byte encoded Unicode characters.
Unlike `'utf8'`, each character in the string will be encoded using either 2
or 4 bytes. Node.js only supports the [little-endian][endianness] variant of
[UTF-16][].
* `'latin1'`: Latin-1 stands for [ISO-8859-1][]. This character encoding only
supports the Unicode characters from `U+0000` to `U+00FF`. Each character is
@ -132,11 +136,11 @@ The following legacy character encodings are also supported:
* `'binary'`: Alias for `'latin1'`. See [binary strings][] for more background
on this topic. The name of this encoding can be very misleading, as all of the
encodings listed here convert between strings and binary data. For converting
between strings and `Buffer`s, typically `'utf-8'` is the right choice.
between strings and `Buffer`s, typically `'utf8'` is the right choice.
* `'ucs2'`: Alias of `'utf16le'`. UCS-2 used to refer to a variant of UTF-16
that did not support characters that had code points larger than U+FFFF.
In Node.js, these code points are always supported.
* `'ucs2'`, `'ucs-2'`: Aliases of `'utf16le'`. UCS-2 used to refer to a variant
of UTF-16 that did not support characters that had code points larger than
U+FFFF. In Node.js, these code points are always supported.
```js
Buffer.from('1ag', 'hex');
@ -900,7 +904,7 @@ Returns `true` if `encoding` is the name of a supported character encoding,
or `false` otherwise.
```js
console.log(Buffer.isEncoding('utf-8'));
console.log(Buffer.isEncoding('utf8'));
// Prints: true
console.log(Buffer.isEncoding('hex'));

View File

@ -2323,7 +2323,7 @@ changes:
-->
* `chunk` {string | Buffer}
* `encoding` {string} Optional, **Default**: `utf-8`
* `encoding` {string} Optional, **Default**: `utf8`
* `callback` {Function} Optional
* Returns: {this}
@ -2565,7 +2565,7 @@ changes:
-->
* `chunk` {string | Buffer}
* `encoding` {string} **Default**: `utf-8`
* `encoding` {string} **Default**: `utf8`
* `callback` {Function}
* Returns {boolean}

View File

@ -2759,7 +2759,7 @@ const fs = require('fs');
pipeline(
fs.createReadStream('object.json')
.setEncoding('utf-8'),
.setEncoding('utf8'),
new Transform({
decodeStrings: false, // Accept string input rather than Buffers
construct(callback) {