@@ -357,3 +357,163 @@ pandas ``ExtensionArray``. This method should have the following signature::
357357
358358This way, you can control the conversion of a pyarrow ``Array `` of your pyarrow
359359extension type to a pandas ``ExtensionArray `` that can be stored in a DataFrame.
360+
361+
362+ Canonical extension types
363+ ~~~~~~~~~~~~~~~~~~~~~~~~~
364+
365+ You can find the official list of canonical extension types in the
366+ :ref: `format_canonical_extensions ` section. Here we add examples on how to
367+ use them in pyarrow.
368+
369+ Fixed size tensor
370+ """""""""""""""""
371+
372+ To create an array of tensors with equal shape (fixed shape tensor array) we
373+ first need to define a fixed shape tensor extension type with value type
374+ and shape:
375+
376+ .. code-block :: python
377+
378+ >> > tensor_type = pa.fixed_shape_tensor(pa.int32(), (2 , 2 ))
379+
380+ Then we need the storage array with :func: `pyarrow.list_ ` type where ``value_type` ``
381+ is the fixed shape tensor value type and list size is a product of ``tensor_type ``
382+ shape elements. Then we can create an array of tensors with
383+ ``pa.ExtensionArray.from_storage() `` method:
384+
385+ .. code-block :: python
386+
387+ >> > arr = [[1 , 2 , 3 , 4 ], [10 , 20 , 30 , 40 ], [100 , 200 , 300 , 400 ]]
388+ >> > storage = pa.array(arr, pa.list_(pa.int32(), 4 ))
389+ >> > tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
390+
391+ We can also create another array of tensors with different value type:
392+
393+ .. code-block :: python
394+
395+ >> > tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2 , 2 ))
396+ >> > storage_2 = pa.array(arr, pa.list_(pa.float32(), 4 ))
397+ >> > tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2, storage_2)
398+
399+ Extension arrays can be used as columns in ``pyarrow.Table `` or
400+ ``pyarrow.RecordBatch ``:
401+
402+ .. code-block :: python
403+
404+ >> > data = [
405+ ... pa.array([1 , 2 , 3 ]),
406+ ... pa.array([' foo' , ' bar' , None ]),
407+ ... pa.array([True , None , True ]),
408+ ... tensor_array,
409+ ... tensor_array_2
410+ ... ]
411+ >> > my_schema = pa.schema([(' f0' , pa.int8()),
412+ ... (' f1' , pa.string()),
413+ ... (' f2' , pa.bool_()),
414+ ... (' tensors_int' , tensor_type),
415+ ... (' tensors_float' , tensor_type_2)])
416+ >> > table = pa.Table.from_arrays(data, schema = my_schema)
417+ >> > table
418+ pyarrow.Table
419+ f0: int8
420+ f1: string
421+ f2: bool
422+ tensors_int: extension< arrow.fixed_size_tensor>
423+ tensors_float: extension< arrow.fixed_size_tensor>
424+ ----
425+ f0: [[1 ,2 ,3 ]]
426+ f1: [[" foo" ," bar" ,null]]
427+ f2: [[true,null,true]]
428+ tensors_int: [[[1 ,2 ,3 ,4 ],[10 ,20 ,30 ,40 ],[100 ,200 ,300 ,400 ]]]
429+ tensors_float: [[[1 ,2 ,3 ,4 ],[10 ,20 ,30 ,40 ],[100 ,200 ,300 ,400 ]]]
430+
431+ We can also convert a tensor array to a single multi-dimensional numpy ndarray.
432+ With the conversion the length of the arrow array becomes the first dimension
433+ in the numpy ndarray:
434+
435+ .. code-block :: python
436+
437+ >> > numpy_tensor = tensor_array_2.to_numpy_ndarray()
438+ >> > numpy_tensor
439+ array([[[ 1 ., 2 .],
440+ [ 3 ., 4 .]],
441+ [[ 10 ., 20 .],
442+ [ 30 ., 40 .]],
443+ [[100 ., 200 .],
444+ [300 ., 400 .]]])
445+ >> > numpy_tensor.shape
446+ (3 , 2 , 2 )
447+
448+ .. note ::
449+
450+ Both optional parameters, ``permutation `` and ``dim_names ``, are meant to provide the user
451+ with the information about the logical layout of the data compared to the physical layout.
452+
453+ The conversion to numpy ndarray is only possible for trivial permutations (``None `` or
454+ ``[0, 1, ... N-1] `` where ``N `` is the number of tensor dimensions).
455+
456+ And also the other way around, we can convert a numpy ndarray to a fixed shape tensor array:
457+
458+ .. code-block :: python
459+
460+ >> > pa.FixedShapeTensorArray.from_numpy_ndarray(numpy_tensor)
461+ < pyarrow.lib.FixedShapeTensorArray object at ... >
462+ [
463+ [
464+ 1 ,
465+ 2 ,
466+ 3 ,
467+ 4
468+ ],
469+ [
470+ 10 ,
471+ 20 ,
472+ 30 ,
473+ 40
474+ ],
475+ [
476+ 100 ,
477+ 200 ,
478+ 300 ,
479+ 400
480+ ]
481+ ]
482+
483+ With the conversion the first dimension of the ndarray becomes the length of the pyarrow extension
484+ array. We can see in the example that ndarray of shape ``(3, 2, 2) `` becomes an arrow array of
485+ length 3 with tensor elements of shape ``(2, 2) ``.
486+
487+ .. code-block :: python
488+
489+ # ndarray of shape (3, 2, 2)
490+ >> > numpy_tensor.shape
491+ (3 , 2 , 2 )
492+
493+ # arrow array of length 3 with tensor elements of shape (2, 2)
494+ >> > pyarrow_tensor_array = pa.FixedShapeTensorArray.from_numpy_ndarray(numpy_tensor)
495+ >> > len (pyarrow_tensor_array)
496+ 3
497+ >> > pyarrow_tensor_array.type.shape
498+ [2 , 2 ]
499+
500+ The extension type can also have ``permutation `` and ``dim_names `` defined. For
501+ example
502+
503+ .. code-block :: python
504+
505+ >> > tensor_type = pa.fixed_shape_tensor(pa.float64(), [2 , 2 , 3 ], permutation = [0 , 2 , 1 ])
506+
507+ or
508+
509+ .. code-block :: python
510+
511+ >> > tensor_type = pa.fixed_shape_tensor(pa.bool_(), [2 , 2 , 3 ], dim_names = [' C' , ' H' , ' W' ])
512+
513+ for ``NCHW `` format where:
514+
515+ * N: number of images which is in our case the length of an array and is always on
516+ the first dimension
517+ * C: number of channels of the image
518+ * H: height of the image
519+ * W: width of the image
0 commit comments