github-code

community
Activity Feed

AI & ML interests

None defined yet.

nick007xΒ 
posted an update 5 months ago
view post
Post
583
πŸ‘‹ Hey everyone!

The response to my first datasets has been insane - thank you! πŸš€

Your support made these go viral, and they're still trending on the Hugging Face datasets homepage:

πŸ† Proven Performers:
- GitHub Code 2025 (12k+ downloads, 83+ likes) - Top 10 on HF Datasets
- ArXiv Papers (8k+ downloads, 51+ likes) - Top 20 on HF Datasets

Now I'm expanding from scientific papers and code into hardware, maker culture, and engineering wisdom with three new domain-specific datasets:

πŸ”₯ New Datasets Dropped

1. Phoronix Articles
- What is Phoronix? The definitive source for Linux, open-source, and hardware performance journalism since 2004. For more info visit: https://www.phoronix.com/
- Dataset contains: articles with full text, metadata, and comment counts
- Want a Linux & hardware news AI? Train models on 50K+ articles tracking 20 years of tech evolution

πŸ”— Link: nick007x/phoronix-articles

2. Hackaday Posts
- What is Hackaday? The epicenter of maker culture - DIY projects, hardware hacks, and engineering creativity. For more info visit: https://hackaday.com/
- Dataset contains: articles with nested comment threads and engagement metrics
- Want a maker community AI? Build assistants that understand electronics projects, 3D printing, and hardware innovation

πŸ”— Link: nick007x/hackaday-posts

3. EEVblog Posts
- What is EEVblog? The largest electronics engineering forum - a popular online platform and YouTube channel for electronics enthusiasts, hobbyists, and engineers. For more info visit: https://www.eevblog.com/forum/
- Dataset contains: forum posts with author expertise levels and technical discussions
- Want an electronics expert? Train AI mentors that explain circuits, troubleshoot designs, and guide hardware projects

πŸ”— Link: nick007x/eevblog-posts
nick007xΒ 
posted an update 5 months ago
view post
Post
2018
πŸ‘‹ Hey i have Just uploaded 2 new datasets for code and scientific reasoning models:

1. ArXiv Papers (4.6TB) A massive scientific corpus with papers and metadata across all domains.Perfect for training models on academic reasoning, literature review, and scientific knowledge mining. πŸ”—Link: nick007x/arxiv-papers

2. GitHub Code 2025 (1 TB)a comprehensive code dataset for code generation and analysis tasks. mostly contains GitHub's high quality top 1 million repos above 2 stars πŸ”—Link: nick007x/github-code-2025