Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
home:Ledest:erlang:24
erlang
2967-Write-a-section-about-range-capping.patch
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File 2967-Write-a-section-about-range-capping.patch of Package erlang
From 76686e648a85ba0e0795c33ef18dd8534d2bf7da Mon Sep 17 00:00:00 2001 From: Raimo Niskanen <raimo@erlang.org> Date: Wed, 11 May 2022 14:29:28 +0200 Subject: [PATCH 7/8] Write a section about range capping Describe different approaches for how to generate numbers in a range related to the Niche algorithms API, and point to that from the algorithm descriptions. --- lib/stdlib/doc/src/rand.xml | 225 +++++++++++++++++++++++++++++++++--- 1 file changed, 209 insertions(+), 16 deletions(-) diff --git a/lib/stdlib/doc/src/rand.xml b/lib/stdlib/doc/src/rand.xml index 8b9b924366..471a23f6b9 100644 --- a/lib/stdlib/doc/src/rand.xml +++ b/lib/stdlib/doc/src/rand.xml @@ -806,6 +806,175 @@ end.</pre> <fsdescription> <marker id="niche_algorithms"/> <title>Niche algorithms API</title> + <p> + This section contains special purpose algorithms + that does not use the + <seeerl marker="#plug_in_api">plug-in framework API</seeerl>, + for example for speed reasons. + </p> + <p> + Since these algorithms lack the plug-in framework support, + generating numbers in a range other than the + generator's own generated range may become a problem. + </p> + <p> + There are at least 3 ways to do this, assuming that + the range is less than the generator's range: + </p> + <taglist> + <tag>Modulo</tag> + <item> + <p> + To generate a number <c>V</c> in the range 0..<c>Range</c>-1: + </p> + <list type="bulleted"> + <item>Generate a number <c>X</c>.</item> + <item> + Use <c>V = X rem Range</c> as your value. + </item> + </list> + <p> + This method uses <c>rem</c>, that is, the remainder of + an integer division, which is a slow operation. + </p> + <p> + Low bits from the generator propagate straight through + to the generated value, so if the generator has got + weaknesses in the low bits this method propagates + them too. + </p> + <p> + If <c>Range</c> is not a divisor of the generator range, + the generated numbers have a bias. + Example: + </p> + <p> + Say the generator generates a byte, that is, + the generator range is 0..255, + and the desired range is 0..99 (<c>Range=100</c>). + Then there are 3 generator outputs that produce the value 0, + that is; 0, 100 and 200. But there are only + 2 generator outputs that produce the value 99, + which are; 99 and 199. So the probability for + a value <c>V</c> in 0..55 is 3/2 times + the probability for the other values 56..99. + </p> + <p> + If <c>Range</c> is much smaller than the generator range, + then this bias gets hard to detect. The rule of thumb is + that if <c>Range</c> is smaller than the square root + of the generator range, the bias is small enough. + Example: + </p> + <p> + A byte generator when <c>Range=20</c>. + There are 12 (<c>256 div 20</c>) + possibilities to generate the highest numbers + and one more to generate a number + <c>V</c> < 16 (<c>256 rem 20</c>). + So the probability is 13/12 for a low number + versus a high. To detect that difference + with some confidence you would need to generate + a lot more numbers than the generator range, + 256 in this small example. + </p> + </item> + <tag>Truncated multiplication</tag> + <item> + <p> + To generate a number <c>V</c> in the range 0..<c>Range</c>-1, + when you have a generator with the range + 0..2^<c>Bits</c>-1: + </p> + <list type="bulleted"> + <item>Generate a number <c>X</c>.</item> + <item> + Use <c>V = X*Range bsr Bits</c> + as your value. + </item> + </list> + <p> + If the multiplication <c>X*Range</c> creates a bignum + this method becomes very slow. + </p> + <p> + High bits from the generator propagate through + to the generated value, so if the generator has got + weaknesses in the high bits this method propagates + them too. + </p> + <p> + If <c>Range</c> is not a divisor of the generator range, + the generated numbers have a bias, + pretty much as for the <em>Modulo</em> method above. + </p> + </item> + <tag>Shift or mask</tag> + <item> + <p> + To generate a number in the range 0..2^<c>RBits</c>-1, + when you have a generator with the range 0..2^<c>Bits</c>: + </p> + <list type="bulleted"> + <item>Generate a number <c>X</c>.</item> + <item> + Use <c>V = X band ((1 bsl RBits)-1)</c> + or <c>V = X bsr (Bits-RBits)</c> + as your value. + </item> + </list> + <p> + Masking with <c>band</c> preserves the low bits, + and right shifting with <c>bsr</c> preserves the high, + so if the generator has got weaknesses in high or low + bits; choose the right operator. + </p> + <p> + If the generator has got a range that is not a power of 2 + and this method is used anyway, it introduces bias + in the same way as for the <em>Modulo</em> method above. + </p> + </item> + <tag>Rejection</tag> + <item> + <list type="bulleted"> + <item>Generate a number <c>X</c>.</item> + <item> + If <c>X</c> is in the range, use <c>V = X</c> + as your value, otherwise reject it and repeat. + </item> + </list> + <p> + In theory it is not certain that this method + will ever complete, but in practice you ensure + that the probability of rejection is low. + Then the probability for yet another iteration + decreases exponentially so the expected mean + number of iterations will often be between 1 and 2. + Also, since the base generator is a full length generator, + a value that will break the loop must eventually + be generated. + </p> + </item> + </taglist> + <p> + Chese methods can be combined, such as using the <em>Modulo</em> + method and only if the generator value would create bias + use <em>Rejection</em>. Or using <em>Shift or mask</em> + to reduce the size of a generator value so that + <em>Truncated multiplication</em> will not create a bignum. + </p> + <p> + The recommended way to generate a floating point number + (IEEE 745 double, that has got a 53-bit mantissa) + in the range 0..1, that is + 0.0 =< <c>V</c> <1.0 + is to generate a 53-bit number <c>X</c> and then use + <c>V = X * (1.0/((1 bsl 53)))</c> + as your value. This will create a value on the form + <c>N</c>*2^-53 with equal probability for every + possible <c>N</c> for the range. + </p> </fsdescription> <func> <name name="splitmix64_next" arity="1" since="OTP 25.0"/> @@ -861,6 +1030,11 @@ end.</pre> on a selected range, nor in generating a floating point number. It is easy to accidentally mess up the fairly good statistical properties of this generator when doing either. + See the recepies at the start of this + <seeerl marker="#niche_algorithms"> + Niche algorithms API + </seeerl> + description. Note also the caveat about weak low bits that this generator suffers from. The generator is exported in this form @@ -917,8 +1091,8 @@ end.</pre> the generator state. </p> <p> - To create an output value, the quality improves much - if the state is scrambled. + The quality of the output value improves much by using + a scrambler instead of just taking the low bits. Function <seemfa marker="#mwc59_value32/1"> <c>mwc59_value32</c> @@ -934,12 +1108,17 @@ end.</pre> </p> <p> The low bits of the base generator are surprisingly good, - so the lowest 16 bits actually passes fairly strict PRNG tests, - despite the generator's weaknesses that lies in the high + so the lowest 16 bits actually pass fairly strict PRNG tests, + despite the generator's weaknesses that lie in the high bits of the 32-bit MWC "digit". It is recommended to use <c>rem</c> on the the generator state, - or bit mask on the lowest bits to produce numbers + or bit mask extracting the lowest bits to produce numbers in a range 16 bits or less. + See the recepies at the start of this + <seeerl marker="#niche_algorithms"> + Niche algorithms API + </seeerl> + description. </p> <p> On a typical 64 bit Erlang VM this generator executes @@ -993,14 +1172,25 @@ end.</pre> birthday spacing and collision tests show through. </p> <p> - To extract a power of two number it is recommended - to use the high bits which helps in hiding - the remaining base generator problems. + When using this scrambler it is in general better to use + the high bits of the value than the low. + The lowest 8 bits are of good quality and pass right through + from the base generator. They are combined with the next 8 + in the xorshift making the low 16 good quality, + but in the range 16..31 bits there are weaker bits + that you do not want to have as the high bits + of your generated values. + Therefore it is in general safer to shift out low bits. + See the recepies at the start of this + <seeerl marker="#niche_algorithms"> + Niche algorithms API + </seeerl> + description. </p> <p> - For a small arbitrary range less than about 16 bits + For a non power of 2 range less than about 16 bits (to not get too much bias and to avoid bignums) - multiply-and-shift can be used, + truncated multiplication can be used, which is much faster than using <c>rem</c>: <c>(Range*<anno>V</anno>) bsr 32</c>. </p> @@ -1024,20 +1214,23 @@ end.</pre> when handling the value <c><anno>V</anno></c>. </p> <p> - To extract a power of two number it is slightly better - to shift down the high bits than to mask the low. + It is in general general better to use the high bits + from this scrambler than the low. + See the recepies at the start of this + <seeerl marker="#niche_algorithms"> + Niche algorithms API + </seeerl> + description. </p> <p> - For an arbitrary range less than about 29 bits + For a non power of 2 range less than about 29 bits (to not get too much bias and to avoid bignums) - multiply-and-shift can be used, + truncated multiplication can be used, which is much faster than using <c>rem</c>. Example for range 1'000'000'000; the range is 30 bits, we use 29 bits from the generator, adding up to 59 bits, which is not a bignum: <c>(1000000000 * (<anno>V</anno> bsr (59-29))) bsr 29</c>. - <em> - </em> </p> </desc> </func> -- 2.35.3
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor