nodejs-buffer: Basic articles

Posted Jun 6, 20203 min read

Concept

What is stored in the buffer

The buffer is an object that manipulates bytes. Its bottom layer is a byte array, which stores hexadecimal numbers.

var str ='hello buffer'
var buffer = new Buffer(str,'utf-8')
console.log(buffer) //The output is hexadecimal numbers

Each element of the buffer is a two-digit hexadecimal number, that is, the size of each element is 0-255.
Because F X 16 + F X 16^0 = 255

What to do if it overflows

We can directly assign the elements of buffer

var buffer = new Buffer(10)
buffer[0]= 300
console.log(buffer[0])

If the assigned value is a decimal, the decimal part will be discarded directly.
When overflowing, subtract 256 until it is less than 255 if it is greater than 255, and add 256 until it is greater than 0 if it is less than 0

 if(x> 255) {
     while(x> 255) {
       x = x-256
     }
 }

 if(x <0) {
   x = x + 256
 }

Supplement:~
Negative numbers are stored in the computer as complements, the highest bit is the sign bit(0 is positive, 1 is negative), except for the sign bit, the other bits will be inverted, and the lowest bit is inverted and added by 1
Such as:-1
Before negation:1000 0001
Reverse:1111 1110
Add 1 at the end:1111 1111
Then the computer reads it is 255

Conversion of String and Buffer

The encoding types that can be directly converted between strings and buffers are:
ASCII; UTF-8; UTF-16LE/UCS-2; Base64; Binary; Hex;

For those that are not supported, you can use a third-party library to assist in encoding and decoding.
iconv:Calling c++ implementation, it needs to switch from js to c++, which consumes more performance.
iconv-lite:pure Js implementation, higher efficiency.

Basic conversion
  var buffer = new Buffer(str, [enconding])
  buffer.toString([enconding])
Segmented conversion

Note that the same encoding and decoding should use the same encoding

  var buffer = Buffer.write(str, [offset], [length], [enconding])

  buffer.toString([enconding], [start], [end])
Use of library iconv/iconv-lite

Since our commonly used GBK and GB2312 are not in the encoding list supported by Buffer by default, we need to use a third-party library for encoding and decoding.

Libraries Implementation Handling content that cannot be converted Summary
iconv-lite js implementation direct output garbled no need to switch from c++ to js, performance is better
iconv call c++ implementation provide ignore or translation processing perfect handling of garbled characters
var iconv = require('iconv-lite');
var buffer = iconv.encode('Moonlight before bed','GBK');
var str = iconv.decode(buffer,'GBK')

Buffer stitching:solving the problem of garbled characters

When reading the file, the buffer obtained in onData is also a buffer, which is directly spliced with a plus sign. When long bytes are input, garbled characters are easily generated.

var fs = require('fs')
var rs = fs.createReadStream('test.md', {highWaterMark:11})
var data =''
rs.on('data', function(chunk) {
    data += chunk
})
rs.on('end', function() {
    console.log(data) //The output is garbled, because a Chinese character is 3 bytes and cannot be divisible by 11, the third character can only be displayed by two bytes, garbled will appear
})

The correct splicing method is to use an array to store the chunks read each time, and then use buffer.concat to generate a merged Buffer object.

var fs = require('fs')
var iconv = require('iconv-lite');
var rs = fs.createReadStream('test.md', {highWaterMark:11})
var chunks = []
var size = 0
rs.on('data', function(chunk) {
    chunks.push(chunk)
    size += chunk.length;
})
rs.on('end', function() {
    var buffer = Buffer.concat(chunks, size)
    var str = iconv.decode(buffer,'utf-8')
    console.log(str)
})

Performance advantages of buffer

In network transmission, if the transmitted object is first converted into a buffer, the performance of the system can be improved.
For file operations, the file itself stores binary data, so in the scenario where the content of the file does not need to be changed, direct transmission of Buffer has the best performance. In addition, the value of highWaterMark will affect the performance. If the setting of highWaterMark is too small, it will cause too many readings and the setting is too large, and it may cause a waste of memory space when reading small files(this application for more memory can still be read for the next time Get used). For the reading of big problems, setting a larger highWaterMark can improve performance.