Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,8 +1,9 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
-
license:
|
| 5 |
-
|
|
|
|
| 6 |
tags:
|
| 7 |
- data-management
|
| 8 |
- data-migration
|
|
@@ -39,11 +40,11 @@ model-index:
|
|
| 39 |
|
| 40 |
# π Agentic Data 1
|
| 41 |
|
| 42 |
-
### The First
|
| 43 |
|
| 44 |
**SQL Migration β’ Schema Analysis β’ Data Quality β’ ETL Design β’ Performance Tuning**
|
| 45 |
|
| 46 |
-
[]()
|
| 48 |
[]()
|
| 49 |
[](https://huggingface.co/DataManagement-AI)
|
|
@@ -56,7 +57,7 @@ model-index:
|
|
| 56 |
|
| 57 |
## π― What is Agentic Data 1?
|
| 58 |
|
| 59 |
-
Agentic Data 1 is the **first
|
| 60 |
|
| 61 |
Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage training pipeline (Supervised Fine-Tuning + GRPO Reinforcement Learning), it delivers **specialist-grade performance** at a fraction of the cost of frontier models.
|
| 62 |
|
|
@@ -69,7 +70,7 @@ Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage
|
|
| 69 |
| Data quality rules | Surface-level checks | **Comprehensive quality framework** (duplicates, PII, referential integrity) |
|
| 70 |
| ETL pipeline design | Abstract descriptions | **Practical, implementable pipelines** with error handling and rollback |
|
| 71 |
| Query performance tuning | Basic index suggestions | **Multi-strategy optimization** (partitioning, materialized views, query rewriting) |
|
| 72 |
-
| Cost to operate | $3-30 per million tokens | **
|
| 73 |
|
| 74 |
---
|
| 75 |
|
|
@@ -258,81 +259,61 @@ Design event-driven data pipelines with:
|
|
| 258 |
|
| 259 |
---
|
| 260 |
|
| 261 |
-
## β‘
|
| 262 |
|
| 263 |
-
|
| 264 |
|
| 265 |
-
|
| 266 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 267 |
-
|
| 268 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 269 |
-
"DataManagement-AI/Agentic-Data-1",
|
| 270 |
-
device_map="auto",
|
| 271 |
-
torch_dtype="auto",
|
| 272 |
-
)
|
| 273 |
-
tokenizer = AutoTokenizer.from_pretrained("DataManagement-AI/Agentic-Data-1")
|
| 274 |
-
|
| 275 |
-
prompt = "Convert this Oracle SQL to PostgreSQL: SELECT NVL(salary, 0) FROM employees WHERE ROWNUM <= 10;"
|
| 276 |
-
|
| 277 |
-
messages = [{"role": "user", "content": prompt}]
|
| 278 |
-
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 279 |
-
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
| 280 |
-
|
| 281 |
-
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
|
| 282 |
-
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
|
| 283 |
-
```
|
| 284 |
-
|
| 285 |
-
### 4-Bit Quantized (Recommended for Production)
|
| 286 |
|
| 287 |
```python
|
| 288 |
-
from
|
| 289 |
-
import torch
|
| 290 |
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
)
|
| 296 |
|
| 297 |
-
|
| 298 |
-
"
|
| 299 |
-
|
| 300 |
-
|
|
|
|
|
|
|
| 301 |
)
|
| 302 |
-
|
| 303 |
```
|
| 304 |
|
| 305 |
-
###
|
| 306 |
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
|
| 310 |
-
|
|
|
|
| 311 |
|
| 312 |
-
|
| 313 |
-
from openai import OpenAI
|
| 314 |
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
```
|
| 321 |
|
| 322 |
---
|
| 323 |
|
| 324 |
## π° Cost Comparison
|
| 325 |
|
| 326 |
-
|
| 327 |
|
| 328 |
-
| Model | Input $/M tokens | Output $/M tokens |
|
| 329 |
|---|---|---|---|
|
| 330 |
-
| GPT-4 Turbo | $10.00 | $30.00 |
|
| 331 |
-
| Claude Sonnet 3.5 | $3.00 | $15.00 |
|
| 332 |
-
| Claude Haiku | $0.25 | $1.25 |
|
| 333 |
-
| **Agentic Data 1**
|
| 334 |
|
| 335 |
-
> **
|
| 336 |
|
| 337 |
---
|
| 338 |
|
|
@@ -366,8 +347,8 @@ Agentic Data 1 powers the AI backbone of the [DataManagement.AI](https://dataman
|
|
| 366 |
| **Base Model** | DeepSeek-R1-Distill-Llama-8B |
|
| 367 |
| **Training Method** | SFT + GRPO (500 steps, NVIDIA H100) |
|
| 368 |
| **Precision** | BFloat16 |
|
| 369 |
-
| **License** |
|
| 370 |
-
| **
|
| 371 |
|
| 372 |
---
|
| 373 |
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: other
|
| 5 |
+
license_name: datamanagement-ai-commercial
|
| 6 |
+
license_link: https://www.datamanagement.ai/contact-us
|
| 7 |
tags:
|
| 8 |
- data-management
|
| 9 |
- data-migration
|
|
|
|
| 40 |
|
| 41 |
# π Agentic Data 1
|
| 42 |
|
| 43 |
+
### The First Specialized Language Model Purpose-Built for Data Operations
|
| 44 |
|
| 45 |
**SQL Migration β’ Schema Analysis β’ Data Quality β’ ETL Design β’ Performance Tuning**
|
| 46 |
|
| 47 |
+
[](https://www.datamanagement.ai/contact-us)
|
| 48 |
[]()
|
| 49 |
[]()
|
| 50 |
[](https://huggingface.co/DataManagement-AI)
|
|
|
|
| 57 |
|
| 58 |
## π― What is Agentic Data 1?
|
| 59 |
|
| 60 |
+
Agentic Data 1 is the **first specialized language model designed exclusively for data management and migration tasks**. While general-purpose LLMs like GPT-4 or Claude treat data operations as just another coding task, Agentic Data 1 understands the unique challenges of enterprise data ecosystems β from legacy Oracle databases to modern cloud data warehouses.
|
| 61 |
|
| 62 |
Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage training pipeline (Supervised Fine-Tuning + GRPO Reinforcement Learning), it delivers **specialist-grade performance** at a fraction of the cost of frontier models.
|
| 63 |
|
|
|
|
| 70 |
| Data quality rules | Surface-level checks | **Comprehensive quality framework** (duplicates, PII, referential integrity) |
|
| 71 |
| ETL pipeline design | Abstract descriptions | **Practical, implementable pipelines** with error handling and rollback |
|
| 72 |
| Query performance tuning | Basic index suggestions | **Multi-strategy optimization** (partitioning, materialized views, query rewriting) |
|
| 73 |
+
| Cost to operate | $3-30 per million tokens | **Up to 90% lower** via DataManagement.AI API |
|
| 74 |
|
| 75 |
---
|
| 76 |
|
|
|
|
| 259 |
|
| 260 |
---
|
| 261 |
|
| 262 |
+
## β‘ Get Access
|
| 263 |
|
| 264 |
+
Agentic Data 1 is available through the **DataManagement.AI platform** and as a **dedicated API** for enterprise teams.
|
| 265 |
|
| 266 |
+
### API Access
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 267 |
|
| 268 |
```python
|
| 269 |
+
from openai import OpenAI
|
|
|
|
| 270 |
|
| 271 |
+
# Use the Agentic Data 1 API (OpenAI-compatible)
|
| 272 |
+
client = OpenAI(
|
| 273 |
+
base_url="https://api.datamanagement.ai/v1",
|
| 274 |
+
api_key="your-api-key",
|
| 275 |
)
|
| 276 |
|
| 277 |
+
response = client.chat.completions.create(
|
| 278 |
+
model="agentic-data-1",
|
| 279 |
+
messages=[{
|
| 280 |
+
"role": "user",
|
| 281 |
+
"content": "Convert this Oracle SQL to PostgreSQL: SELECT NVL(salary, 0) FROM employees WHERE ROWNUM <= 10;"
|
| 282 |
+
}],
|
| 283 |
)
|
| 284 |
+
print(response.choices[0].message.content)
|
| 285 |
```
|
| 286 |
|
| 287 |
+
### Deployment Options
|
| 288 |
|
| 289 |
+
| Option | Description | Best For |
|
| 290 |
+
|---|---|---|
|
| 291 |
+
| **Platform** | Use within DataManagement.AI workflows | Teams using our full platform |
|
| 292 |
+
| **API** | OpenAI-compatible REST API | Developers integrating into existing apps |
|
| 293 |
+
| **Dedicated** | Private instance on your infrastructure | Enterprise with data residency requirements |
|
| 294 |
|
| 295 |
+
<div align="center">
|
|
|
|
| 296 |
|
| 297 |
+
### π¬ Ready to Get Started?
|
| 298 |
+
|
| 299 |
+
[**Request API Access**](https://www.datamanagement.ai/contact-us) β’ [**Start Free Trial**](https://dmaife.datamanagement.ai/signup) β’ [**Schedule a Demo**](https://www.datamanagement.ai/contact-us)
|
| 300 |
+
|
| 301 |
+
</div>
|
|
|
|
| 302 |
|
| 303 |
---
|
| 304 |
|
| 305 |
## π° Cost Comparison
|
| 306 |
|
| 307 |
+
Agentic Data 1 delivers **specialist-grade performance at a fraction of the cost** of general-purpose frontier models:
|
| 308 |
|
| 309 |
+
| Model | Input $/M tokens | Output $/M tokens | Data Domain Accuracy |
|
| 310 |
|---|---|---|---|
|
| 311 |
+
| GPT-4 Turbo | $10.00 | $30.00 | General purpose |
|
| 312 |
+
| Claude Sonnet 3.5 | $3.00 | $15.00 | General purpose |
|
| 313 |
+
| Claude Haiku | $0.25 | $1.25 | General purpose |
|
| 314 |
+
| **Agentic Data 1** | **$0.50** | **$2.50** | **Domain-specialized** |
|
| 315 |
|
| 316 |
+
> **83% cheaper than Claude Sonnet** with **better performance on data tasks**. Purpose-built beats general-purpose.
|
| 317 |
|
| 318 |
---
|
| 319 |
|
|
|
|
| 347 |
| **Base Model** | DeepSeek-R1-Distill-Llama-8B |
|
| 348 |
| **Training Method** | SFT + GRPO (500 steps, NVIDIA H100) |
|
| 349 |
| **Precision** | BFloat16 |
|
| 350 |
+
| **License** | DataManagement-AI Commercial License |
|
| 351 |
+
| **Access** | API / Platform / Dedicated Deployment |
|
| 352 |
|
| 353 |
---
|
| 354 |
|