Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Kris Bailey's picture

Kris Bailey PRO

krisbailey
mamme123's profile picture
ยท
  • myfykris
  • krisbailey

AI & ML interests

quantization, optimization, novel model architectures, model architecture research and development, dataset construction, apple silicon optimizations

Recent Activity

posted an update 1 day ago
While doing various projects I kept running into situations where I wanted to be able to have representative samples of some of the current large SOTA datasets that were smaller so I didn't need to worry about slicing or anything else at runtime. So, I created sub datasets making sure to keep the same ratios of data sources. Each dataset card provides info for what's in it. 100M token datasets: RedPajama v2 100M Falcon RefinedWeb 100M Cosmopedia 100M 1B token datasets: Fineweb-edu 1B RedPajama v1 1B RedPajama v2 1B (use this one) Cosmopedia 1B 10B token datasets: RedPajama v1 10B Cosmopedia 10B Collection here: https://huggingface.co/collections/krisbailey/bite-size-data
updated a dataset 24 days ago
krisbailey/fineweb-edu-1B
updated a dataset 24 days ago
krisbailey/RedPajama-Data-V2-1B
View all activity

Organizations

None yet

krisbailey 's models

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs