Quantization strategy where most weights are converted to INT4, activations remain in FP16, and sensitive layers are preserved in FP16.