Skip to content

Commit 63d8d9d

Browse files
author
morelos
committed
Update base for Update on "[ET-VK][Ops] affine quantization operators registration"
# Context In order to enable dynamic quantization, especially for the source transform method using `Int8DynActInt4WeightQuantizer` we need to have vulkan versions for `quantize_affine`, `dequantize_affine`, and `choose_qparams_affine`. Currently we do not have a shader that performs block-based quantization as expected from these shaders, so we delegate to the per_tensor variant just to get unblocked. At a later stage, this will likely be developed more on in order to ensure we don't get too much accuracy loss. # Changes This creates a schema reference in the TorchAO library for out variants of these respective operators. Then there is a VK_REGISTER_OP done on them to ensure that we can properly register them when lowering the ET model with vulkan. Also the vulkan_quantizer is changed a bit in this to enable a dynamic quantization config so that we aren't purely working with just static quantization anymore. Furthermore, we have `_annotate_for_static_quantization_config` for parity/legacy reasons, and we simply create an equivalent dynamic quantization config method. We also changed `Linear.cpp`, particularly to allow a passthrough for weight_data since during dynamic quantization it's possible that it'll be a tensor_data than tensor_ref. Differential Revision: [D78035354](https://our.internmc.facebook.com/intern/diff/D78035354/) [ghstack-poisoned]
1 parent 1e309b4 commit 63d8d9d

File tree

1 file changed

+42
-0
lines changed

1 file changed

+42
-0
lines changed

backends/vulkan/test/op_tests/quantize_test.cpp

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -645,6 +645,48 @@ void test_vulkan_quantize_per_tensor_tensor(
645645
vkcompute::utils::kTexture3D);
646646
}
647647

648+
// Wrapper function to test both buffer and texture storage types
649+
void test_vulkan_quantize_per_channel(
650+
const std::vector<int>& input_sizes,
651+
const std::vector<float>& scales,
652+
const std::vector<int>& zero_points,
653+
int64_t axis,
654+
int64_t quant_min,
655+
int64_t quant_max,
656+
at::ScalarType in_dtype = at::kFloat,
657+
at::ScalarType dtype = at::kInt) {
658+
// Test with buffer storage
659+
test_vulkan_quantize_per_channel_impl(
660+
input_sizes,
661+
scales,
662+
zero_points,
663+
axis,
664+
quant_min,
665+
quant_max,
666+
in_dtype,
667+
dtype,
668+
vkcompute::utils::kBuffer,
669+
vkcompute::utils::kBuffer);
670+
671+
// If the in_dtype is a double, convert to float for texture implementation
672+
// since they don't support 64bit as inputs
673+
if (in_dtype == at::kDouble) {
674+
in_dtype = at::kFloat;
675+
}
676+
677+
test_vulkan_quantize_per_channel_impl(
678+
input_sizes,
679+
scales,
680+
zero_points,
681+
axis,
682+
quant_min,
683+
quant_max,
684+
in_dtype,
685+
dtype,
686+
vkcompute::utils::kTexture3D,
687+
vkcompute::utils::kTexture3D);
688+
}
689+
648690
void test_reference_quantize_per_tensor(
649691
const std::vector<int>& input_sizes,
650692
float scale,

0 commit comments

Comments
 (0)