AI & ML interests

A one-year long research workshop on large language models: the Summer of Language Models 21 🌸

Recent Activity

[SPAM] Deleted

3
#289 opened 4 days ago by
sarthak-saxena
christopher 
in bigscience/bloom 11 days ago

pretokenizer Regex issues?

8
#278 opened over 1 year ago by
hpcpony
christopher 
in bigscience/bloom 16 days ago

Test PR

#286 opened 16 days ago by
FIRSTACCOUNT69

Test discussion

#287 opened 16 days ago by
FIRSTACCOUNT69

Test discussion

#288 opened 16 days ago by
FIRSTACCOUNT69
monsoon-nlp 
posted an update 4 months ago

Bloom

#2 opened 4 months ago by
Raz-Test
monsoon-nlp 
posted an update 6 months ago
view post
Post
466
Bio LLMs train on many genomes, but can we encode differences within a species? TomatoTomato adds pangenome tokens to represent a domestic tomato and a wild tomato in one sequence 🍅 🧬
monsoon-nlp/tomatotomato-gLM2-150M-v0.1