How to prevent repetitions?

#6
by BingoBird - opened

On llama.cpp (current build)

I tried q4_K_M
llama-server -m ./Marco-Mini-Instruct.Q4_K_M.gguf --no-webui --ctx-checkpoints 4 -b 512 -ub 1024 --ctx-size 16834 --cache-type-k q8_0 --cache-type-v q8_0 -t 4 --parallel 1 --spec-type ngram-mod --spec-ngram-mod-n-match 40 --spec-ngram-mod-n-min 0 --spec-ngram-mod-n-max 16 -fa on --no-mmproj --fit on --no-warmup

and q6_K
/usr/local/bin/llama-server -m /media/sda3/Models/Marco-Nano-Instruct.i1-Q6_K.gguf --no-webui --ctx-checkpoints 4 -b 512 -ub 1024 --ctx-size 16834 --cache-type-k q8_0 --cache-type-v q8_0 -t 4 --parallel 1 --spec-type ngram-mod --spec-ngram-mod-n-match 40 --spec-ngram-mod-n-min 0 --spec-ngram-mod-n-max 16 -fa on --no-mmproj --fit on --no-warmup

Very often i get repetition loops.

    # Use a more robust method: use a color histogram and threshold
    # For simplicity, use a threshold on the L channel
    # But we need to use a color space that is perceptually uniform
    # Use a more robust method: use a color histogram and threshold
    # For simplicity, use a threshold on the L channel
    # But we need to use a color space that is perceptually uniform
    # Use a more robust method: use a color histogram and threshold
    # For simplicity, use a threshold on the L channel

I use these settings:

parameters="$(jq -nR '{
        temperature: 1,
        top_p: 1,
        top_k: 1,
        min_p: 0.0,
        repeat_penalty: 1.0,
        presence_penalty: 0.5,
        jinja: true,
        flash_attn: true,
        cont_batching: true,
        repeat_last_n: 0,
        penalize_nl: false,
        n_predict: -1,
        stream: true,
        cache_prompt: true,
        thinking_budget_tokens: 240,
}')"

Is there a way to avoid the loops?

Thanks for the interesting model.

I am now testing with fewer parameters and seem to have much better success:

parameters="$(jq -nR '{
        jinja: true,
        flash_attn: true,
        cont_batching: true,
        repeat_last_n: 0,
        penalize_nl: false,
        n_predict: -1,
        stream: true,
        cache_prompt: true,
        thinking_budget_tokens: 240,
}')"

Performance on ryzen 3500u laptop (+ vulkan on Vega8) is approxL: pp:44 tg: 13.5
It is one of the smartest 10+t/s models to have run on this laptop. Congratulations.

Sign up or log in to comment