| | --- |
| | language: |
| | - en |
| | library_name: transformers |
| | license: apache-2.0 |
| | pipeline_tag: text-generation |
| | tags: |
| | - PolyCom |
| | - PolyNorm |
| | - PolyReLU |
| | --- |
| | |
| | # Introduction |
| |
|
| | This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.** |
| | In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead. |
| |
|
| | # Datasets and Training |
| |
|
| | We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom). |
| |
|
| |
|
| | # Inference |
| |
|
| | Here is an example of how to use the PolyCom model for inference: |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True) |
| | tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True) |
| | |
| | prompt = "Hello, my name is" |
| | input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda') |
| | |
| | greedy_output = model.generate(input_ids) |
| | print(tokenizer.decode(greedy_output[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| |
|
| | # Citing this work |
| |
|
| | If you find this work helpful or use it in your research, please consider citing our paper: |
| | ```bibtex |
| | @inproceedings{zhuo2025polycom, |
| | title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models}, |
| | author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma}, |
| | booktitle={ICLR 2025}, |
| | year={2025} |
| | } |
| | ``` |