How were these models lowered?

by saiskandadotin - opened 9 days ago

9 days ago

Hi Team, I am impressed how much faster your models are on the Private Mind app are compared to some android examples from the react native-examples repo on GitHub

I downloaded your model and ran it on the example app and I still see a slower speed.

So just curious how exactly did you lower these models, what did you use to lower it and were there any optimisations you made during the lowering to work well with the Private Mind app / react-native-executorch library

Would love to know more

Thanks

kopcion

Software Mansion org 9 days ago

Hi! Inside Private Mind we are using the same models as those available on our huggingface repo, there are no special tricks under the hood. My best guess is that you are testing on debug build and the Private Mind is a release build.

saiskandadotin

9 days ago

@kopcion - Thanks, I meant what process was followed to export the model to the pte and if you could share your script / config that would be helpful

kopcion

Software Mansion org 8 days ago

For LLMs we followed scripts from ExecuTorch github, for example this one for Qwen3 https://github.com/pytorch/executorch/tree/main/examples/models/qwen3. Hope this helps :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment