np.ascontiguousarray() function in numpy

Posted Jun 15, 20203 min read

Explanation in Numpy documentation :

"Return a contiguous array(ndim >= 1) in memory(C order)."

use

The ascontiguousarray function converts an array of discontinuously stored memory into an array of continuously stored memory, making it run faster.


C order vs Fortran order

  • C order refers to the row-major order(Row-major Order)**, that is, the elements of the same line in memory exist together,
  • Fortran Order refers to Column-major Order**, that is, elements in the same column in memory exist together.

Pascal, C, C++, and Python are all stored first in rows, while Fortran, MatLab are stored first in columns.


Contiguous array

contiguous array means that the address of the array stored in memory is also continuous(note that the memory address is actually one-dimensional).

The 2-dimensional array arr = np.arange(12).reshape(3,4). The array structure is as follows

The actual storage in the memory is as follows:

arr is C order, line first in memory. If you want to move down one column, you need to skip 3 blocks(for example, from 0 to 4 only need to skip 1, 2, and 3).

If transposed, arr.T does not have the C continuity feature, because the address of the element in memory does not change, and the adjacent elements in the same line are not continuous in memory:

At this time, arr.T becomes Fortran order, because the elements in adjacent columns are stored adjacently in memory.

In terms of performance, it is much faster to obtain adjacent addresses in memory than non-adjacent addresses(when reading a value from RAM, you can read the value in a block of addresses together and save it in the cache), This means that operations on contiguous arrays will be much faster.

Since arr is C continuous, row operations are faster than column operations. usually

np.sum(arr, axis=1) # Sum by line

Compare

np.sum(arr, axis=0) # Sum by column

A little faster.
Similarly, on arr.T, column operations are faster than row operations.


Use np.ascontiguousarray()

  • In Numpy, randomly initialized arrays are C continuous by default.

  • After an irregular slice operation, it will change the continuity, which may become neither C continuous nor Fortran continuous.

  • You can check whether an array is C continuous or Fortran continuous through the array's .flags property

    import numpy as np
    arr = np.arange(12).reshape(3, 4)
    arr.flags

      C_CONTIGUOUS:True
      F_CONTIGUOUS:False
      OWNDATA:False
      WRITEABLE:True
      ALIGNED:True
      WRITEBACKIFCOPY:False
      UPDATEIFCOPY:False

You can see from the output that the array arr is C continuous.
Perform column-wise slice operations on arr, without changing the value of each row, then C is continuous:

>>> arr
array([[0, 1, 2, 3],
        [4, 5, 6, 7],
        [8, 9, 10, 11]])
>>> arr1 = arr[:2, :]
>>> arr1
array([[0, 1, 2, 3],
        [4, 5, 6, 7]])
>>> arr1.flags
    C_CONTIGUOUS:True
    F_CONTIGUOUS:False
    OWNDATA:False
    WRITEABLE:True
    ALIGNED:True
    WRITEBACKIFCOPY:False
    UPDATEIFCOPY:False

If you execute slice on the line, it will change the continuity to become neither C continuous nor Fortran continuous:

>>> arr1 = arr[:, 1:3]
>>> arr1.flags
    C_CONTIGUOUS:False
    F_CONTIGUOUS:False
    OWNDATA:False
    WRITEABLE:True
    ALIGNED:True
    WRITEBACKIFCOPY:False
    UPDATEIFCOPY:False

At this time, using the ascontiguousarray function, you can make it continuous:

>>> arr2 = np.ascontiguousarray(arr1)
>>> arr2.flags
    C_CONTIGUOUS:True
    F_CONTIGUOUS:False
    OWNDATA:True
    WRITEABLE:True
    ALIGNED:True
    WRITEBACKIFCOPY:False
    UPDATEIFCOPY:False

reference

This article is automatically published by one article multi-posting platform ArtiPub