How were these models lowered?
Hi Team, I am impressed how much faster your models are on the Private Mind app are compared to some android examples from the react native-examples repo on GitHub
I downloaded your model and ran it on the example app and I still see a slower speed.
So just curious how exactly did you lower these models, what did you use to lower it and were there any optimisations you made during the lowering to work well with the Private Mind app / react-native-executorch library
Would love to know more
Thanks
Hi! Inside Private Mind we are using the same models as those available on our huggingface repo, there are no special tricks under the hood. My best guess is that you are testing on debug build and the Private Mind is a release build.
@kopcion - Thanks, I meant what process was followed to export the model to the pte and if you could share your script / config that would be helpful
For LLMs we followed scripts from ExecuTorch github, for example this one for Qwen3 https://github.com/pytorch/executorch/tree/main/examples/models/qwen3. Hope this helps :)