Difference with Block-wise Int8?
#1
by
leo98xh - opened
Could you explain the difference with Block-wise Int8?
The main difference is that they have different quantization granularity. In block-wise int8, the elements in a block size 128x128 share the same quantization scale. In channel-wise int8, the elements in a column share the same quantization scale.
pkumc changed discussion status to
closed