| | --- |
| | license: mit |
| | language: |
| | - en |
| | - zh |
| | base_model: |
| | - OFA-Sys/chinese-clip-vit-large-patch14-336px |
| | - AXERA-TECH/cnclip |
| | tags: |
| | - CLIP |
| | - CN_CLIP |
| | pipeline_tag: zero-shot-image-classification |
| | --- |
| | |
| | # LibCLIP |
| |
|
| | This SDK enables efficient text-to-image retrieval using CLIP (Contrastive LanguageโImage Pretraining), optimized for Axeraโs NPU-based SoC platforms including AX650, AX650C, AX8850, and AX650A, or Axera's dedicated AI accelerator. |
| |
|
| | With this SDK, you can: |
| |
|
| | - Perform semantic image search by providing natural language queries. |
| | - Utilize CLIP to embed text queries and compare them against a pre-computed set of image embeddings. |
| | - Run all inference processes directly on Axera NPUs for low-latency, high-throughput performance at the edge. |
| |
|
| | This solution is well-suited for smart cameras, content filtering, AI-powered user interfaces, and other edge AI scenarios where natural language-based image retrieval is required. |
| |
|
| | ## References links: |
| |
|
| | For those who are interested in model conversion, you can try to export axmodel through |
| |
|
| | - [The github repo of libclip's open source](https://github.com/AXERA-TECH/libclip.axera) |
| |
|
| | - [Pulsar2 Link, How to Convert ONNX to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/pulsar2/introduction.html) |
| |
|
| | - https://huggingface.co/AXERA-TECH/cnclip |
| |
|
| |
|
| | ## Support Platform |
| |
|
| | - AX650 |
| | - [M4N-Dock(็ฑ่ฏๆดพPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) |
| | - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html) |
| |
|
| | ## Performance |
| |
|
| | | Model | Input Shape | Latency (ms) | CMM Usage (MB) | |
| | | ----------------------------------------- | ----------------- | ------------ | -------------- | |
| | | cnclip_vit_l14_336px_vision_u16u8.axmodel | 1 x 3 x 336 x 336 | 88.475 ms | 304 MB | |
| | | cnclip_vit_l14_336px_text_u16.axmodel | 1 x 52 | 4.576 ms | 122 MB | |
| |
|
| | ## How to use |
| |
|
| | Download all files from this repository to the device |
| |
|
| | ``` |
| | (base) axera@raspberrypi:~/samples/AXERA-TECH/libclip.axera $ tree -L 2 |
| | . |
| | โโโ cnclip |
| | โย ย โโโ cnclip_vit_l14_336px_text_u16.axmodel |
| | โย ย โโโ cnclip_vit_l14_336px_vision_u16u8.axmodel |
| | โย ย โโโ cn_vocab.txt |
| | โโโ coco_1000.tar |
| | โโโ config.json |
| | โโโ gradio_01.png |
| | โโโ install |
| | โย ย โโโ examples |
| | โย ย โโโ include |
| | โย ย โโโ lib |
| | โโโ pyclip |
| | โย ย โโโ example.py |
| | โย ย โโโ gradio_example.png |
| | โย ย โโโ gradio_example.py |
| | โย ย โโโ libclip.so |
| | โย ย โโโ __pycache__ |
| | โย ย โโโ pyclip.py |
| | โย ย โโโ requirements.txt |
| | โโโ README.md |
| | |
| | 8 directories, 13 files |
| | ``` |
| |
|
| | ### python env requirement |
| |
|
| | ``` |
| | pip install -r pyclip/requirements.txt |
| | ``` |
| |
|
| | #### Inference with AX650 Host, such as M4N-Dock(็ฑ่ฏๆดพPro) |
| |
|
| | ``` |
| | root@ax650:~/sample/LibClip# cp ./install/lib/host_650/libclip.so ./pyclip/ |
| | root@ax650:~/sample/LibClip# tar -xf coco_1000.tar |
| | root@ax650:~/sample/LibClip# python3 pyclip/gradio_example.py --ienc cnclip/cnclip_vit_l14_336px_vision_u16u8.axmodel --tenc cnclip/cnclip_vit_l14_336px_text_u16.axmodel --vocab cnclip/cn_vocab.txt --isCN 1 --db_path clip_feat_db_coco --image_folder coco_1000/ --dev_type host |
| | Trying to load: /root/sample/LibClip/pyclip/aarch64/libclip.so |
| | |
| | โ Failed to load: /root/sample/LibClip/pyclip/aarch64/libclip.so |
| | /root/sample/LibClip/pyclip/aarch64/libclip.so: cannot open shared object file: No such file or directory |
| | ๐ File not found. Please verify that libclip.so exists and the path is correct. |
| | |
| | Trying to load: /root/sample/LibClip/pyclip/libclip.so |
| | open libaxcl_rt.so failed |
| | unsupport axcl |
| | โ
Successfully loaded: /root/sample/LibClip/pyclip/libclip.so |
| | sh: line 1: axcl-smi: command not found |
| | ๅฏ็จ่ฎพๅค: {'host': {'available': True, 'version': 'V3.6.2_20250731140456', 'mem_info': {'remain': 9963, 'total': 10240}}, 'devices': {'host_version': '', 'dev_version': '', 'count': 0, 'devices_info': []}} |
| | |
| | input size: 1 |
| | name: image [unknown] [unknown] |
| | 1 x 3 x 336 x 336 |
| | |
| | |
| | output size: 1 |
| | name: unnorm_image_features |
| | 1 x 768 |
| | |
| | [I][ load_image_encoder][ 50]: nchw 336 336 |
| | [I][ load_image_encoder][ 60]: image feature len 768 |
| | |
| | input size: 1 |
| | name: text [unknown] [unknown] |
| | 1 x 52 |
| | |
| | |
| | output size: 1 |
| | name: unnorm_text_features |
| | 1 x 768 |
| | |
| | [I][ load_text_encoder][ 44]: text feature len 768 |
| | [I][ load_tokenizer][ 60]: text token len 52 |
| | 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1000/1000 [01:43<00:00, 9.70it/s] |
| | * Running on local URL: http://0.0.0.0:7860 |
| | ``` |
| | If your M4N-Dock(็ฑ่ฏๆดพPro) IP Address is 192.168.1.100, so using this URL `http://192.168.1.100:7860` with your WebApp |
| |
|
| |  |
| |
|
| | #### Inference with M.2 Accelerator card |
| | [What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5. |
| |
|
| | ``` |
| | (py312) axera@raspberrypi:~/samples/AXERA-TECH/libclip.axera $ export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libstdc++.so.6 |
| | (py312) axera@raspberrypi:~/samples/AXERA-TECH/libclip.axera $ cp install/lib/aarch64/libclip.so pyclip/ |
| | (py312) axera@raspberrypi:~/samples/AXERA-TECH/libclip.axera $ tar xf coco_1000.tar |
| | (py312) axera@raspberrypi:~/samples/AXERA-TECH/libclip.axera $ python pyclip/gradio_example.py --ienc cnclip/cnclip_vit_l14_336px_vision_u16u8.axmodel --tenc cnclip/cnclip_vit_l14_336px_text_u16.axmodel --vocab cnclip/cn_vocab.txt --isCN 1 --db_path clip_feat_db_coco --image_folder coco_1000/ --dev_type axcl |
| | Trying to load: /home/axera/samples/AXERA-TECH/libclip.axera/pyclip/aarch64/libclip.so |
| | |
| | โ Failed to load: /home/axera/samples/AXERA-TECH/libclip.axera/pyclip/aarch64/libclip.so |
| | /home/axera/samples/AXERA-TECH/libclip.axera/pyclip/aarch64/libclip.so: cannot open shared object file: No such file or directory |
| | ๐ File not found. Please verify that libclip.so exists and the path is correct. |
| | |
| | Trying to load: /home/axera/samples/AXERA-TECH/libclip.axera/pyclip/libclip.so |
| | open libax_sys.so failed |
| | open libax_engine.so failed |
| | โ
Successfully loaded: /home/axera/samples/AXERA-TECH/libclip.axera/pyclip/libclip.so |
| | ๅฏ็จ่ฎพๅค: {'host': {'available': True, 'version': '', 'mem_info': {'remain': 0, 'total': 0}}, 'devices': {'host_version': 'V3.6.2_20250603154858', 'dev_version': 'V3.6.2_20250603154858', 'count': 1, 'devices_info': [{'temp': 37, 'cpu_usage': 1, 'npu_usage': 0, 'mem_info': {'remain': 7022, 'total': 7040}}]}} |
| | [I][ run][ 31]: AXCLWorker start with devid 0 |
| | |
| | input size: 1 |
| | name: image [unknown] [unknown] |
| | 1 x 3 x 336 x 336 |
| | |
| | |
| | output size: 1 |
| | name: unnorm_image_features |
| | 1 x 768 |
| | |
| | [I][ load_image_encoder][ 50]: nchw 336 336 |
| | [I][ load_image_encoder][ 60]: image feature len 768 |
| | |
| | input size: 1 |
| | name: text [unknown] [unknown] |
| | 1 x 52 |
| | |
| | |
| | output size: 1 |
| | name: unnorm_text_features |
| | 1 x 768 |
| | |
| | [I][ load_text_encoder][ 44]: text feature len 768 |
| | [I][ load_tokenizer][ 60]: text token len 52 |
| | 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1000/1000 [01:40<00:00, 9.93it/s] |
| | * Running on local URL: http://0.0.0.0:7860 |
| | ``` |
| |
|
| | If your Raspberry PI 5 IP Address is 192.168.1.100, so using this URL `http://192.168.1.100:7860` with your WebApp. |
| |
|
| |  |