Description
Hi,
I want to try using the C++ version of FlashInfer, but when testing the examples of the C++ API, I found that all the test cases in test_single_prefill
would fail with the error message:
Result correctness test failed
in the log. However, each individual test case within test_single_prefill
passes successfully.
This issue was observed in version v0.2.5. It seems that in the latest version, there are some compilation issues with the C++ API.
To help reproduce the bug quickly, I have modified the cmake.config
file and test_single_prefill.cu
. The changes and reproduction steps are as follows:
-
Changes in config.cmake: disable below build options
set(FLASHINFER_SAMPLING OFF)
set(FLASHINFER_NORM OFF)
set(FLASHINFER_DISTRIBUTED OFF) -
Changes for test_single_prefill.cu, remove other test cases, only test below 2 cases:
TEST(FlashInferCorrectnessTest, TestSinglePrefillKernelLongContextCorrectnessFP16) {
TestSinglePrefillKernelLongContextCorrectness<half, half>(false);
}
TEST(FlashInferCorrectnessTest, TestSinglePrefillKernelLongContextCorrectnessFP16QKHalfAccum) {
TestSinglePrefillKernelLongContextCorrectness<half, half>(true);
}
-
Reproduce steps:
mkdir build/
cp ../config.cmake build/
cd build/
cmake .. -DCMAKE_CUDA_ARCHITECTURES="89" -G Ninja -DCMAKE_BUILD_TYPE=Release
ninja
./test_single_prefill -
Finall results:
Expected: (result_accuracy) > (0.90), actual: 0.109159708 vs 0.9
Result correctness test failed.
[ FAILED ] FlashInferCorrectnessTest.TestSinglePrefillKernelLongContextCorrectnessFP16QKHalfAccum (319766 ms)
[----------] 2 tests from FlashInferCorrectnessTest (319964 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (319964 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] FlashInferCorrectnessTest.TestSinglePrefillKernelLongContextCorrectnessFP16
[ FAILED ] FlashInferCorrectnessTest.TestSinglePrefillKernelLongContextCorrectnessFP16QKHalfAccum
BRs