Could you see if SLMs (models with <80B, <48B, <36B, <20B, etc.) also having this meta-cognitive power?
Please duplicate this Space
https://huggingface.co/spaces/aiqtech/final-bench-Proprietary
and modify it so it runs with the SLM model path you want.
If you are not sure how to do it, just clone the Space first, then upload the app.py file to Claude, Gemini, or ChatGPT. In your prompt, tell it which model you want to use and ask it to update the code so you can run the test. It should handle it smoothly.