Skip to content

Commit 37ab037

Browse files
author
morelos
committed
Update base for Update on "[ET-VK][Ops] affine quantization operators registration"
# Context In order to enable dynamic quantization, especially for the source transform method using `Int8DynActInt4WeightQuantizer` we need to have vulkan versions for `quantize_affine`, `dequantize_affine`, and `choose_qparams_affine`. Currently we do not have a shader that performs block-based quantization as expected from these shaders, so we delegate to the per_tensor variant just to get unblocked. At a later stage, this will likely be developed more on in order to ensure we don't get too much accuracy loss. # Changes This creates a schema reference in the TorchAO library for out variants of these respective operators. Then there is a VK_REGISTER_OP done on them to ensure that we can properly register them when lowering the ET model with vulkan. Also the vulkan_quantizer is changed a bit in this to enable a dynamic quantization config so that we aren't purely working with just static quantization anymore. Furthermore, we have `_annotate_for_static_quantization_config` for parity/legacy reasons, and we simply create an equivalent dynamic quantization config method. We also changed `Linear.cpp`, particularly to allow a passthrough for weight_data since during dynamic quantization it's possible that it'll be a tensor_data than tensor_ref. Differential Revision: [D78035354](https://our.internmc.facebook.com/intern/diff/D78035354/) [ghstack-poisoned]
1 parent 63d8d9d commit 37ab037

File tree

1 file changed

+0
-42
lines changed

1 file changed

+0
-42
lines changed

backends/vulkan/test/op_tests/quantize_test.cpp

Lines changed: 0 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -645,48 +645,6 @@ void test_vulkan_quantize_per_tensor_tensor(
645645
vkcompute::utils::kTexture3D);
646646
}
647647

648-
// Wrapper function to test both buffer and texture storage types
649-
void test_vulkan_quantize_per_channel(
650-
const std::vector<int>& input_sizes,
651-
const std::vector<float>& scales,
652-
const std::vector<int>& zero_points,
653-
int64_t axis,
654-
int64_t quant_min,
655-
int64_t quant_max,
656-
at::ScalarType in_dtype = at::kFloat,
657-
at::ScalarType dtype = at::kInt) {
658-
// Test with buffer storage
659-
test_vulkan_quantize_per_channel_impl(
660-
input_sizes,
661-
scales,
662-
zero_points,
663-
axis,
664-
quant_min,
665-
quant_max,
666-
in_dtype,
667-
dtype,
668-
vkcompute::utils::kBuffer,
669-
vkcompute::utils::kBuffer);
670-
671-
// If the in_dtype is a double, convert to float for texture implementation
672-
// since they don't support 64bit as inputs
673-
if (in_dtype == at::kDouble) {
674-
in_dtype = at::kFloat;
675-
}
676-
677-
test_vulkan_quantize_per_channel_impl(
678-
input_sizes,
679-
scales,
680-
zero_points,
681-
axis,
682-
quant_min,
683-
quant_max,
684-
in_dtype,
685-
dtype,
686-
vkcompute::utils::kTexture3D,
687-
vkcompute::utils::kTexture3D);
688-
}
689-
690648
void test_reference_quantize_per_tensor(
691649
const std::vector<int>& input_sizes,
692650
float scale,

0 commit comments

Comments
 (0)