Skip to content

SHUSCT/ASC25-hisat-3n

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HISAT-3N

The optimization of the HISAT-3N project can be divided into two main areas: accelerating the HISAT-3N sequence alignment and rewriting the HISAT-3N-table.

  1. Acceleration of HISAT-3N Sequence Alignment
    Our optimizations for HISAT-3N include replacing SSE operations with AVX2. This change allows the SwAligner method to achieve the same results with half the number of instructions. Additionally, most of the improvements focus on optimizing memory access. When the number of threads exceeds around 30, we observed intense competition for the HISAT-3N output lock. The original mutex lock performed poorly, and the severe lock contention caused excessive CPU context switching, which negatively impacted CPU cache locality. Initially, we used the standard library’s lock, but later drew inspiration from research papers and switched to using an MCS lock implemented with TBB. This change significantly enhanced performance. When combined with OpenMP thread affinity, the program’s performance improved even further, maintaining excellent performance even with high thread parallelism.

  2. Rewriting HISAT-3N-Table
    Our second optimization focused on rewriting the HISAT-3N-table, which is a small utility that does not depend on other parts of the project and can be compiled with just one source file and three header files. After a deep dive into the program, we found that it lacked sufficient parallelism and had very poor serial performance. We redesigned the program to pre-load the reference genome and implement random access. Additionally, we replaced the original binary search with a hash table and applied other optimizations. The rewritten HISAT-3N-table.cpp is well-documented and highly readable, using only 600 lines of code, which provides nearly ten times the performance of the original version. On personal laptops with high CPU frequencies, it can complete a run in just about one minute. However, the current version is not perfect; it achieves optimal performance only in a single-producer, single-consumer scenario. There is still potential for further optimization, but we believe additional improvements may not yield significant benefits for the competition.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published