Shen-Pandi commited on
Commit
2676d41
Β·
verified Β·
1 Parent(s): cca5b74

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +43 -62
README.md CHANGED
@@ -1,8 +1,9 @@
1
  ---
2
  language:
3
  - en
4
- license: llama3.1
5
- base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 
6
  tags:
7
  - data-management
8
  - data-migration
@@ -39,11 +40,11 @@ model-index:
39
 
40
  # πŸš€ Agentic Data 1
41
 
42
- ### The First Open-Source LLM Purpose-Built for Data Operations
43
 
44
  **SQL Migration β€’ Schema Analysis β€’ Data Quality β€’ ETL Design β€’ Performance Tuning**
45
 
46
- [![License](https://img.shields.io/badge/License-Llama_3.1-blue.svg)](https://llama.meta.com/llama3/license/)
47
  [![Model Size](https://img.shields.io/badge/Parameters-8B-green.svg)]()
48
  [![Training](https://img.shields.io/badge/Training-SFT_+_GRPO-orange.svg)]()
49
  [![HuggingFace](https://img.shields.io/badge/πŸ€—-DataManagement--AI-yellow.svg)](https://huggingface.co/DataManagement-AI)
@@ -56,7 +57,7 @@ model-index:
56
 
57
  ## 🎯 What is Agentic Data 1?
58
 
59
- Agentic Data 1 is the **first open-source language model specifically designed for data management and migration tasks**. While general-purpose LLMs like GPT-4 or Claude treat data operations as just another coding task, Agentic Data 1 understands the unique challenges of enterprise data ecosystems β€” from legacy Oracle databases to modern cloud data warehouses.
60
 
61
  Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage training pipeline (Supervised Fine-Tuning + GRPO Reinforcement Learning), it delivers **specialist-grade performance** at a fraction of the cost of frontier models.
62
 
@@ -69,7 +70,7 @@ Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage
69
  | Data quality rules | Surface-level checks | **Comprehensive quality framework** (duplicates, PII, referential integrity) |
70
  | ETL pipeline design | Abstract descriptions | **Practical, implementable pipelines** with error handling and rollback |
71
  | Query performance tuning | Basic index suggestions | **Multi-strategy optimization** (partitioning, materialized views, query rewriting) |
72
- | Cost to operate | $3-30 per million tokens | **Near-zero** (self-hosted inference) |
73
 
74
  ---
75
 
@@ -258,81 +259,61 @@ Design event-driven data pipelines with:
258
 
259
  ---
260
 
261
- ## ⚑ Quick Start
262
 
263
- ### Basic Usage
264
 
265
- ```python
266
- from transformers import AutoModelForCausalLM, AutoTokenizer
267
-
268
- model = AutoModelForCausalLM.from_pretrained(
269
- "DataManagement-AI/Agentic-Data-1",
270
- device_map="auto",
271
- torch_dtype="auto",
272
- )
273
- tokenizer = AutoTokenizer.from_pretrained("DataManagement-AI/Agentic-Data-1")
274
-
275
- prompt = "Convert this Oracle SQL to PostgreSQL: SELECT NVL(salary, 0) FROM employees WHERE ROWNUM <= 10;"
276
-
277
- messages = [{"role": "user", "content": prompt}]
278
- input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
279
- inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
280
-
281
- outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
282
- print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
283
- ```
284
-
285
- ### 4-Bit Quantized (Recommended for Production)
286
 
287
  ```python
288
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
289
- import torch
290
 
291
- bnb_config = BitsAndBytesConfig(
292
- load_in_4bit=True,
293
- bnb_4bit_quant_type="nf4",
294
- bnb_4bit_compute_dtype=torch.bfloat16,
295
  )
296
 
297
- model = AutoModelForCausalLM.from_pretrained(
298
- "DataManagement-AI/Agentic-Data-1",
299
- quantization_config=bnb_config,
300
- device_map="auto",
 
 
301
  )
302
- tokenizer = AutoTokenizer.from_pretrained("DataManagement-AI/Agentic-Data-1")
303
  ```
304
 
305
- ### With vLLM (High-Throughput API Server)
306
 
307
- ```bash
308
- pip install vllm
309
- vllm serve DataManagement-AI/Agentic-Data-1 --dtype auto --max-model-len 4096
310
- ```
 
311
 
312
- ```python
313
- from openai import OpenAI
314
 
315
- client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
316
- response = client.chat.completions.create(
317
- model="DataManagement-AI/Agentic-Data-1",
318
- messages=[{"role": "user", "content": "Convert Oracle NVL to PostgreSQL equivalent"}],
319
- )
320
- ```
321
 
322
  ---
323
 
324
  ## πŸ’° Cost Comparison
325
 
326
- Running your own Agentic Data 1 vs using commercial LLM APIs:
327
 
328
- | Model | Input $/M tokens | Output $/M tokens | Monthly Cost (100 active users) |
329
  |---|---|---|---|
330
- | GPT-4 Turbo | $10.00 | $30.00 | **$11,500** |
331
- | Claude Sonnet 3.5 | $3.00 | $15.00 | **$1,015** |
332
- | Claude Haiku | $0.25 | $1.25 | **$440** |
333
- | **Agentic Data 1** (self-hosted) | **~$0.003** | **~$0.003** | **$330** (GPU only) |
334
 
335
- > **99.7% cost reduction** vs GPT-4 Turbo. **67% reduction** vs Claude Haiku. With better domain performance.
336
 
337
  ---
338
 
@@ -366,8 +347,8 @@ Agentic Data 1 powers the AI backbone of the [DataManagement.AI](https://dataman
366
  | **Base Model** | DeepSeek-R1-Distill-Llama-8B |
367
  | **Training Method** | SFT + GRPO (500 steps, NVIDIA H100) |
368
  | **Precision** | BFloat16 |
369
- | **License** | Llama 3.1 Community License |
370
- | **Model Size** | ~16 GB (FP16) / ~4 GB (4-bit quantized) |
371
 
372
  ---
373
 
 
1
  ---
2
  language:
3
  - en
4
+ license: other
5
+ license_name: datamanagement-ai-commercial
6
+ license_link: https://www.datamanagement.ai/contact-us
7
  tags:
8
  - data-management
9
  - data-migration
 
40
 
41
  # πŸš€ Agentic Data 1
42
 
43
+ ### The First Specialized Language Model Purpose-Built for Data Operations
44
 
45
  **SQL Migration β€’ Schema Analysis β€’ Data Quality β€’ ETL Design β€’ Performance Tuning**
46
 
47
+ [![License](https://img.shields.io/badge/License-Commercial-blue.svg)](https://www.datamanagement.ai/contact-us)
48
  [![Model Size](https://img.shields.io/badge/Parameters-8B-green.svg)]()
49
  [![Training](https://img.shields.io/badge/Training-SFT_+_GRPO-orange.svg)]()
50
  [![HuggingFace](https://img.shields.io/badge/πŸ€—-DataManagement--AI-yellow.svg)](https://huggingface.co/DataManagement-AI)
 
57
 
58
  ## 🎯 What is Agentic Data 1?
59
 
60
+ Agentic Data 1 is the **first specialized language model designed exclusively for data management and migration tasks**. While general-purpose LLMs like GPT-4 or Claude treat data operations as just another coding task, Agentic Data 1 understands the unique challenges of enterprise data ecosystems β€” from legacy Oracle databases to modern cloud data warehouses.
61
 
62
  Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage training pipeline (Supervised Fine-Tuning + GRPO Reinforcement Learning), it delivers **specialist-grade performance** at a fraction of the cost of frontier models.
63
 
 
70
  | Data quality rules | Surface-level checks | **Comprehensive quality framework** (duplicates, PII, referential integrity) |
71
  | ETL pipeline design | Abstract descriptions | **Practical, implementable pipelines** with error handling and rollback |
72
  | Query performance tuning | Basic index suggestions | **Multi-strategy optimization** (partitioning, materialized views, query rewriting) |
73
+ | Cost to operate | $3-30 per million tokens | **Up to 90% lower** via DataManagement.AI API |
74
 
75
  ---
76
 
 
259
 
260
  ---
261
 
262
+ ## ⚑ Get Access
263
 
264
+ Agentic Data 1 is available through the **DataManagement.AI platform** and as a **dedicated API** for enterprise teams.
265
 
266
+ ### API Access
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
267
 
268
  ```python
269
+ from openai import OpenAI
 
270
 
271
+ # Use the Agentic Data 1 API (OpenAI-compatible)
272
+ client = OpenAI(
273
+ base_url="https://api.datamanagement.ai/v1",
274
+ api_key="your-api-key",
275
  )
276
 
277
+ response = client.chat.completions.create(
278
+ model="agentic-data-1",
279
+ messages=[{
280
+ "role": "user",
281
+ "content": "Convert this Oracle SQL to PostgreSQL: SELECT NVL(salary, 0) FROM employees WHERE ROWNUM <= 10;"
282
+ }],
283
  )
284
+ print(response.choices[0].message.content)
285
  ```
286
 
287
+ ### Deployment Options
288
 
289
+ | Option | Description | Best For |
290
+ |---|---|---|
291
+ | **Platform** | Use within DataManagement.AI workflows | Teams using our full platform |
292
+ | **API** | OpenAI-compatible REST API | Developers integrating into existing apps |
293
+ | **Dedicated** | Private instance on your infrastructure | Enterprise with data residency requirements |
294
 
295
+ <div align="center">
 
296
 
297
+ ### πŸ“¬ Ready to Get Started?
298
+
299
+ [**Request API Access**](https://www.datamanagement.ai/contact-us) β€’ [**Start Free Trial**](https://dmaife.datamanagement.ai/signup) β€’ [**Schedule a Demo**](https://www.datamanagement.ai/contact-us)
300
+
301
+ </div>
 
302
 
303
  ---
304
 
305
  ## πŸ’° Cost Comparison
306
 
307
+ Agentic Data 1 delivers **specialist-grade performance at a fraction of the cost** of general-purpose frontier models:
308
 
309
+ | Model | Input $/M tokens | Output $/M tokens | Data Domain Accuracy |
310
  |---|---|---|---|
311
+ | GPT-4 Turbo | $10.00 | $30.00 | General purpose |
312
+ | Claude Sonnet 3.5 | $3.00 | $15.00 | General purpose |
313
+ | Claude Haiku | $0.25 | $1.25 | General purpose |
314
+ | **Agentic Data 1** | **$0.50** | **$2.50** | **Domain-specialized** |
315
 
316
+ > **83% cheaper than Claude Sonnet** with **better performance on data tasks**. Purpose-built beats general-purpose.
317
 
318
  ---
319
 
 
347
  | **Base Model** | DeepSeek-R1-Distill-Llama-8B |
348
  | **Training Method** | SFT + GRPO (500 steps, NVIDIA H100) |
349
  | **Precision** | BFloat16 |
350
+ | **License** | DataManagement-AI Commercial License |
351
+ | **Access** | API / Platform / Dedicated Deployment |
352
 
353
  ---
354