H. Aldhaheri
aenawi
AI & ML interests
LLMs Agents
Recent Activity
updated a collection 3 days ago
Speech-To-Text updated a collection 5 months ago
Token-Classification updated a collection 6 months ago
Models-Support-ArabicOrganizations
None yet
Text2Image LLMs
LLMs
Papers - Researches
Arabic Datasets
Embedding Models
-
WhereIsAI/UAE-Large-V1
Feature Extraction • Updated • 1.97M • 237 -
intfloat/multilingual-e5-large
Feature Extraction • 0.6B • Updated • 5.06M • • 1.16k -
sentence-transformers/distiluse-base-multilingual-cased-v1
Sentence Similarity • 0.1B • Updated • 1.04M • • 129 -
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Sentence Similarity • 0.1B • Updated • 25.6M • • 1.17k
Datasets
-
ahmedheakl/resume-atlas
Viewer • Updated • 13.4k • 264 • 10 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 77 - Running282
Infinite Dataset Hub
♾282Search and save datasets generated with a LLM in real time
-
IntrEx: A Dataset for Modeling Engagement in Educational Conversations
Paper • 2509.06652 • Published • 26
Train-On-Datasets
Cybersecurity Models
DeepResearch Models
Translation-Models
-
tencent/Hunyuan-MT-7B
Translation • 8B • Updated • 3.86k • 553 -
tencent/Hunyuan-MT-Chimera-7B
Translation • 8B • Updated • 591 • 90 -
swiss-ai/Apertus-8B-Instruct-2509
Text Generation • Updated • 138k • • 442 -
Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale
Paper • 2509.14008 • Published • 89
Speech-To-Text
Models-Support-Arabic
Speech-to-Speech
Token-Classification
-
hatmimoha/arabic-ner
Token Classification • 0.1B • Updated • 892k • • 22 -
Ammar-alhaj-ali/arabic-MARBERT-poetry-classification
Text Classification • Updated • 952 • 3 -
CAMeL-Lab/bert-base-arabic-camelbert-mix-ner
Token Classification • Updated • 54.4k • • 15 -
SinaLab/ArabicNER-Wojood
Token Classification • Updated • 29 • 10
Neo4j-Cypher
Coding
Animation
DeepResearch Models
Text2Image LLMs
Translation-Models
-
tencent/Hunyuan-MT-7B
Translation • 8B • Updated • 3.86k • 553 -
tencent/Hunyuan-MT-Chimera-7B
Translation • 8B • Updated • 591 • 90 -
swiss-ai/Apertus-8B-Instruct-2509
Text Generation • Updated • 138k • • 442 -
Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale
Paper • 2509.14008 • Published • 89
LLMs
Speech-To-Text
Papers - Researches
Models-Support-Arabic
Arabic Datasets
Speech-to-Speech
Embedding Models
-
WhereIsAI/UAE-Large-V1
Feature Extraction • Updated • 1.97M • 237 -
intfloat/multilingual-e5-large
Feature Extraction • 0.6B • Updated • 5.06M • • 1.16k -
sentence-transformers/distiluse-base-multilingual-cased-v1
Sentence Similarity • 0.1B • Updated • 1.04M • • 129 -
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Sentence Similarity • 0.1B • Updated • 25.6M • • 1.17k
Token-Classification
-
hatmimoha/arabic-ner
Token Classification • 0.1B • Updated • 892k • • 22 -
Ammar-alhaj-ali/arabic-MARBERT-poetry-classification
Text Classification • Updated • 952 • 3 -
CAMeL-Lab/bert-base-arabic-camelbert-mix-ner
Token Classification • Updated • 54.4k • • 15 -
SinaLab/ArabicNER-Wojood
Token Classification • Updated • 29 • 10
Datasets
-
ahmedheakl/resume-atlas
Viewer • Updated • 13.4k • 264 • 10 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 77 - Running282
Infinite Dataset Hub
♾282Search and save datasets generated with a LLM in real time
-
IntrEx: A Dataset for Modeling Engagement in Educational Conversations
Paper • 2509.06652 • Published • 26
Neo4j-Cypher
Train-On-Datasets
Coding
Cybersecurity Models