PyTorch Internals: Scalar and TensorIterator


This blog touches briefly upon Scalar and TensorIterator.


Scalar is a data type in pytorch internal implementation. It is used to represent a scalar value, for example tensors which are 0-dimensional. A 0-dimensional tensor can represent an integer, a float, a double or any other value. The type of the scalar is denoted by the attribute ScalarType. ScalarType contains mapping to C++ types. ScalarType stores the type while Scalar stores the value. Internally, Scalar stores a primitive C++ value. Both Scalar and ScalarType are implemented as a C++ classes here and here respectively. Scalars also have a couple of useful operations associated with them log, negation and then be converted to their primitive C++ type using the to operator.

Sample code on using a Scalar:

  c10::Scalar a = 5.5;
  std::cout << "a=" << a << std::endl;
  std::cout << "element size = " << c10::elementSize(a.type()) << std::endl;

  // Converting scalar to primitive c++ type
  auto b =<float>();
  auto c = a.toDouble();


Given an input tensor and output tensor, TensorIterator can be used for iterating over the tensors to perform pointwise operations on them. The TensorIterator also provides additional functionalities like shape broadcasting, type promotion which saves a lot of trouble by automatically handling edge cases.

In the below example, I would like to show an example of TensorIterator. Though this is not the best way to use it, it helps for pedagogical purpose.

  torch::Tensor x = torch::ones({2, 3});
  // TensorIterator automatically handles broadcasting
  auto y = torch::randn({3});
  auto output = torch::empty_like(x);
  // A TensorIteratorConfig is used to create a TensorIterator.
  auto iter = torch::TensorIteratorConfig()

  iter.for_each([&](char** data, const int64_t* strides, int64_t n) {
    for(int i=0; i<n; i++) {
      float* out_data = reinterpret_cast<float*>(data[0] + strides[0] * i);
      float* in_data_x = reinterpret_cast<float*>(data[1] + strides[1] * i);
      float* in_data_y = reinterpret_cast<float*>(data[2] + strides[2] * i);
      *out_data = *in_data_x + *in_data_y;

To build a TensorIterator, we use TensorIteratorConfig to specify the input and output tensors along with other additional configurations and then call TensorIteratorConfig::build to create the TensorIterator. TensorIterator provides the for_each function which can be used to iterate over the tensors in the iterator.

The for_each method accepts a loop with a signature of (char** data, const int64_t* strides, int64_t n). The first argument char** data contains a char* pointer for each of the tensor in tensor iterator. Having it as char* allows us to represent the underlying data as an array of bytes which can be reinterpreted to any other data type. In this case, we are reinterpreting the underlying bytes as float. The next argument, strides specify, for each of the tensor in tensor iterator, how much bytes should we move to access the next element. Though tensors are multidimensional, they are represented in memory as a contiguous 1-D array. The stride of a tensor says how many bytes we have to move to reach the next element along that axis. Lastly, the argument n specifies the number of elements in the tensor.

The for-loop iterates over the memory location of the output and two input tensors stored in data[0], data[1], data[2] with a step size of strides[0], strides[1], strides[2] respectively. At each location, it reinterprets the bytes as a float and performs the pointwise operation of addition here.

A working example of the above can be found here.

For more details about TensorIterator, I refer the readers to this blog post and to this pytorch-dev podcast by ezyang. The TensorIteratorConfig class is defined in TensorIterator.h and the struct TensorIterator is defined here.

In case you have any comments, questions, I would be glad to hear them. Feel free to reach out to me at arunppsg AT gmail DOT com.