Abstract
Research examines direct modeling of integer-labeled data using discrete probability distributions with continuous parameters suitable for neural network training, evaluating Bitwise and discrete Laplace distributions against traditional regression approaches.
We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the number of bicycles available at a public rental station. While it is possible to model these as continuous values, and to apply traditional regression, this approach changes the underlying distribution on the labels from discrete to continuous. Discrete distributions have certain benefits, which leads us to the question whether such integer labels can be modeled directly by a discrete distribution, whose parameters are predicted from the features of a given instance. Moreover, we focus on the use case of output distributions of neural networks, which adds the requirement that the parameters of the distribution be continuous so that backpropagation and gradient descent may be used to learn the weights of the network. We investigate several options for such distributions, some existing and some novel, and test them on a range of tasks, including tabular learning, sequential prediction and image generation. We find that overall the best performance comes from two distributions: Bitwise, which represents the target integer in bits and places a Bernoulli distribution on each, and a discrete analogue of the Laplace distribution, which uses a distribution with exponentially decaying tails around a continuous mean.
Community
Output distributions for discrete, ordinal predictions.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond the Mean: Distribution-Aware Loss Functions for Bimodal Regression (2026)
- Sequential Regression for Continuous Value Prediction using Residual Quantization (2026)
- Eliciting Numerical Predictive Distributions of LLMs Without Autoregression (2026)
- Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means (2026)
- Model Agreement via Anchoring (2026)
- Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification (2026)
- From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2602.10751 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper