Abstract: In the field of AI inference chips, quantization can effectively reduce the parameter size and computational cost. Additionally, the design of hardware acceleration modules can significantly ...