Redistributor: Transforming Empirical Data Distributions

Pavol Harar, Dennis Elbrächter, Monika Dörfler, Kory D. Johnson

Description:

We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable S and the continuous cumulative distribution function of some desired target T, it provably produces a consistent estimator of the transformation R which satisfies R(S)=T in distribution, see Figure 1. As the distribution of S or T may be unknown, we also include algorithms for efficiently estimating these distributions from samples. This allows for various interesting use cases in image processing, where Redistributor serves as a remarkably simple and easy-to-use tool that is capable of producing visually appealing result. The package is implemented in Python and is optimized to efficiently handle large datasets, making it also suitable as a preprocessing step in machine learning, see Figure 6. In addition to the package and its algorithmic details, we present theoretical results supporting the use of the method as well as a broad array of examples in which transforming distributions is beneficial. In the context of image processing, we can correct color issues in photography, see Figures 2, producing far better results than other automated procedures. We also consider further uses of matching the color distribution of an image to some reference image, e.g. translating it into a preferred color scheme for aesthetic purposes, see Figure 3, creating photographic mosaics, see Figure 4, and performing image data augmentation in color space, see Figure 5. Lastly, we discuss the use of Redistributor in signal processing and preprocessing within a machine learning pipeline. The source code is available at https://gitlab.com/paloha/redistributor.

[1] Harar P, Elbrächter D, Dörfler M, Johnson KD. Redistributor: Transforming Empirical Data Distributions. arXiv preprint arXiv:2210.14219.

[2] Faridul HS, Pouli T, Chamaret C, Stauder J, Reinhard E, Kuzovkin D, Trémeau A. Colour mapping: A review of recent methods, extensions and applications. InComputer Graphics Forum 2016 Feb (Vol. 35, No. 1, pp. 59-88).

[3] Luan F, Paris S, Shechtman E, Bala K. Deep photo style transfer. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 4990-4998).

[4] Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of big data. 2019 Dec;6(1):1-48.

[5] Box GE, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1964 Jul;26(2):211-43.

Figure 1: Applying the estimated transformation R (b) to a sample from a Double Gamma distribution (a) in order to force the values to follow a Gaussian distribution (c). Subfigures (a) and (c) display density histograms.

Figure 2: Correcting exposure using a reference image.

Figure 3: Matching colors of a reference image.

Figure 4: Mosaic effect achieved by tiling redistributed images.

Figure 5: Augmenting image data in color space using distributions of other images within the same batch. The original images are on the diagonal and are highlighted using a red border.