Fix the example onnx inference code.
The previous code uses attention_mask=np.ones_like(input_ids), which restricts the attention mask shape to only (1, 1) during the decoding stage. This causes the model to get stuck in a loop, repeatedly outputting <fake_token_around_image><row_1_col_1>... after generating the word "the".
It should be updated to np.ones_like(np.concatenate((attention_mask, input_ids), axis=-1)) instead, so that the KV cache length can grow properly as generation proceeds.The image depicts a large, historic statue of Liberty situated on a small island in a body of water. The statue is a green, cylindrical structure with a human figure at the top, which is the actual statue of Liberty. The statue is mounted on a pedestal that is supported by a cylindrical tower.