How were these models lowered?

#1
by saiskandadotin - opened

Hi Team, I am impressed how much faster your models are on the Private Mind app are compared to some android examples from the react native-examples repo on GitHub

I downloaded your model and ran it on the example app and I still see a slower speed.

So just curious how exactly did you lower these models, what did you use to lower it and were there any optimisations you made during the lowering to work well with the Private Mind app / react-native-executorch library

Would love to know more

Thanks

Software Mansion org

Hi! Inside Private Mind we are using the same models as those available on our huggingface repo, there are no special tricks under the hood. My best guess is that you are testing on debug build and the Private Mind is a release build.

@kopcion - Thanks, I meant what process was followed to export the model to the pte and if you could share your script / config that would be helpful

Software Mansion org

For LLMs we followed scripts from ExecuTorch github, for example this one for Qwen3 https://github.com/pytorch/executorch/tree/main/examples/models/qwen3. Hope this helps :)

Sign up or log in to comment