यादृच्छिक संख्याओं को मैन्युअल रूप से बनाना


30

उदाहरण के लिए, मानक सामान्य वितरण से 10 वास्तविकताओं के अनुसार, मैं किसी दिए गए वितरण से मैन्युअल रूप से एक यादृच्छिक संख्या कैसे उत्पन्न कर सकता हूं?


10
क्या आप बता सकते हैं कि आप ऐसा क्यों करना चाहते हैं और आप पर और क्या अड़चनें हैं?
mdewey

2
आप यादृच्छिक संख्याओं की एक तालिका प्राप्त कर सकते हैं (रैंड उन्हें दूसरों के बीच प्रकाशित करने के लिए उपयोग किया जाता है)।
बैटमैन

4
@ बैटमैन: हां वास्तव में और 10 numbers रैंडम नंबरों की रैंड बुक में अमेज़न पर 655 टिप्पणियां मिलीं। सभी पूर्वानुमान के अनुसार।
शीआन

10
(मुझे आश्चर्य है कि किसी ने भी पहले इस पर टिप्पणी नहीं की है) जब तक कि यह बिल्कुल अनिवार्य रूप से नहीं है, यादृच्छिक संख्याओं को उत्पन्न करने के लिए कस्टम कार्यान्वयन का उपयोग करने का प्रयास नहीं करना चाहिए। हां, यह जानना बहुत अच्छा है कि यह कैसे करना है और यह संभवत: पहली चीज है जिसे आपको " एमसी विधियों " वर्ग में दिखाया जाएगा (और यह बहुत अच्छा है!) लेकिन इसे किसी भी वास्तविक जीवन की परियोजनाओं में करें। विशिष्ट यादृच्छिक संख्या पीढ़ी दिनचर्या एक कारण के लिए मौजूद है, खरोंच से एक कोर एल्गोरिथ्म को लागू करना समय की हानि, बग का स्रोत और खराब क्षेत्र जागरूकता प्रयास है।
us --r11852

2
@Xi'an: Yes, I agree that is a safe presumption. As mentioned this is just a comment to caution people from using their own RNG without realising that RNG design is a very serious business. To quote von Neumann: "Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin."
usεr11852 says Reinstate Monic

जवाबों:


46

If "manually" includes "mechanical" then you have many options available to you. To simulate a Bernoulli variable with probability half, we can toss a coin: 0 for tails, 1 for heads. To simulate a geometric distribution we can count how many coin tosses are needed before we obtain heads. To simulate a binomial distribution, we can toss our coin n times (or simply toss n coins) and count the heads. The "quincunx" or "bean machine" or "Galton box" is a more kinetic alternative — why not set one into action and see for yourself? It seems there is no such thing as a "weighted coin" but if we wish to vary the probability parameter of our Bernoulli or binomial variable to values other than p=0.5, the needle of Georges-Louis Leclerc, Comte de Buffon will allow us to do so. To simulate the discrete uniform distribution on {1,2,3,4,5,6} we roll a six-sided die. Fans of role-playing games will have encountered more exotic dice, for example tetrahedral dice to sample uniformly from {1,2,3,4}, while with a spinner or roulette wheel one can go further still. (Image credit)

Variety of dice

Would we have to be mad to generate random numbers in this manner today, when it is just one command away on a computer console — or, if we have a suitable table of random numbers available, one foray to the dustier corners of the bookshelf? Well perhaps, though there is something pleasingly tactile about a physical experiment. But for people working before the Computer Age, indeed before widely available large-scale random number tables (of which more later), simulating random variables manually had more practical importance. When Buffon investigated the St. Petersburg paradox — the famous coin-tossing game where the amount the player wins doubles every time a heads is tossed, the player loses upon the first tails, and whose expected pay-off is counter-intuitively infinite — he needed to simulate the geometric distribution with p=0.5. To do so, it seems he hired a child to toss a coin to simulate 2048 plays of the St. Petersburg game, recording how many tosses before the game ended. This simulated geometric distribution is reproduced in Stigler (1991):

Tosses Frequency
1      1061
2      494
3      232
4      137
5      56
6      29
7      25
8      8
9      6

In the same essay where he published this empirical investigation into the St. Petersburg paradox, Buffon also introduced the famous "Buffon's needle". If a plane is divided into strips by parallel lines a distance d apart, and a needle of length ld is dropped onto it, the probability the needle crosses one of the lines is 2lπd.

Buffon's needle experiment

Buffon's needle can, therefore, be used to simulate a random variable XBernoulli(2lπd) or XBinomial(n,2lπd), and we can adjust the probability of success by altering the lengths of our needles or (perhaps more conveniently) the distance at which we rule the lines. An alternative use of Buffon's needles is as a terrifically inefficient way to find a probabilistic approximation for π. The image (credit) shows 17 matchsticks, of which 11 cross a line. When the distance between the ruled lines is set equal to the length of the matchstick, as here, the expected proportion of crossing matchsticks is 2π and hence we can estimate π^ as twice the reciprocal of the observed fraction: here we obtain π^=217113.1. In 1901 Mario Lazzarini claimed to have performed the experiment using 2.5 cm needles with lines 3 cm apart, and after 3408 tosses obtained π^=355113. This is a well-known rational to π, accurate to six decimal places. Badger (1994) provides convincing evidence that this was fraudulent, not least that to be 95% confident of six decimal places of accuracy using Lazzarini's apparatus, a patience-sapping 134 trillion needles must be thrown! Certainly Buffon's needle is more useful as a random number generator than it is as a method for estimating π.


Our generators so far have been disappointingly discrete. What if we want to simulate a normal distribution? One option is to obtain random digits and use them to form good discrete approximations to a uniform distribution on [0,1], then perform some calculations to transform these into random normal deviates. A spinner or roulette wheel could give decimal digits from zero to nine; a dice can generate binary digits; if our arithmetic skills can cope with a funkier base, even a standard set of dice would do. Other answers have covered this kind of transformation-based approach in more detail; I defer any further discussion of it until the end.

By the late nineteenth century the utility of the normal distribution was well-known, and so there were statisticians keen to simulate random normal deviates. Needless to say, lengthy hand calculations would not have been suitable except to set up the simulating process in the first place. Once that was established, the generation of the random numbers had to be relatively quick and easy. Stigler (1991) lists the methods employed by three statisticians of this era. All were researching smoothing techniques: random normal deviates were of obvious interest, e.g. to simulate measurement error that needed to be smoothed over.

The remarkable American statistician Erastus Lyman De Forest was interested in smoothing life tables, and encountered a problem that required the simulation of the absolute values of normal deviates. In what will prove a running theme, De Forest was really sampling from a half-normal distribution. Moreover, rather than using a standard deviation of one (the ZN(0,12) we are used to calling "standard"), De Forest wanted a "probable error" (median deviation) of one. This was the form given in the table of "Probability of Errors" in the appendices of "A Manual Of Spherical And Practical Astronomy, Volume II" by William Chauvenet. From this table, De Forest interpolated the quantiles of a half-normal distribution, from p=0.005 to p=0.995, which he deemed to be "errors of equal frequency".

De Forest table of errors of equal frequency

Should you wish to simulate the normal distribution, following De Forest, you can print this table out and cut it up. De Forest (1876) wrote that the errors "have been inscribed upon 100 bits of card-board of equal size, which were shaken up in a box and all drawn out one by one".

The astronomer and meteorologist Sir George Howard Darwin (son of the naturalist Charles) put a different spin on things, by developing what he called a "roulette" for generating random normal deviates. Darwin (1877) describes how:

A circular piece of card was graduated radially, so that a graduation marked x was 720π0xex2dx degrees distant from a fixed radius. The card was made to spin round its centre close to a fixed index. It was then spun a number of times, and on stopping it the number opposite the index was read off. [Darwin adds in a footnote: It is better to stop the disk when it is spinning so fast that the graduations are invisible, rather than to let it run its course.] From the nature of the graduation the numbers thus obtained will occur in exactly the same way as errors of observation occur in practice; but they have no signs of addition or subtraction prefixed. Then by tossing up a coin over and over again and calling heads + and tails , the signs + or are assigned by chance to this series of errors.

"Index" should be read here as "pointer" or "indicator" (c.f. "index finger"). Stigler points out that Darwin, like De Forest, was using a half-normal cumulative distribution around the disk. Subsequently using a coin to attach a sign at random renders this a full normal distribution. Stigler notes that it is unclear how finely the scale was graduated, but presumes the instruction to manually arrest the disk mid-spin was "to diminish potential bias toward one section of the disk and to speed up the procedure".

Sir Francis Galton, incidentally a half-cousin to Charles Darwin, has already been mentioned in connection with his quincunx. While this mechanically simulates a binomial distribution that, by the De Moivre–Laplace theorem bears a striking resemblance to the normal distribution (and is occasionally used as a teaching aid for that topic), Galton actually produced a far more elaborate scheme when he desired to sample from a normal distribution. Even more extraordinary than the unconventional examples at the top of this answer, Galton developed normally distributed dice — or more accurately, a set of dice that produce an excellent discrete approximation to a normal distribution with median deviation one. These dice, dating from 1890, are preserved in the Galton Collection at University College London.

Galton normal dice

In an 1890 article in Nature Galton wrote that:

As an instrument for selecting at random, I have found nothing superior to dice. It is most tedious to shuffle cards thoroughly between each successive draw, and the method of mixing and stirring up marked balls in a bag is more tedious still. A teetotum or some form of roulette is preferable to these, but dice are better than all. When they are shaken and tossed in a basket, they hurtle so variously against one another and against the ribs of the basket-work that they tumble wildly about, and their positions at the outset afford no perceptible clue to what they will be after even a single good shake and toss. The chances afforded by a die are more various than are commonly supposed; there are 24 equal possibilities, and not only 6, because each face has four edges that may be utilized, as I shall show.

It was important for Galton to be able to rapidly generate a sequence of normal deviates. After each roll Galton would line the dice up by touch alone, then record the scores along their front edges. He would initially roll several dice of type I, on whose edges were half-normal deviates, much like De Forest's cards but using 24 not 100 quantiles. For the largest deviates (actually marked as blanks on the type I dice) he would roll as many of the more sensitive type II dice (which showed large deviates only, at a finer graduation) as he needed to fill in the spaces in his sequence. To convert from half-normal to normal deviates, he would roll die III, which would allocate + or signs to his sequence in blocks of three or four deviates at a time. The dice themselves were mahogany, of side 114 inches, and pasted with thin white paper for the marking to be written on. Galton recommended to prepare three dice of type I, two of II and one of III.

Galton normal dice design

Raazesh Sainudiin's Laboratory for Mathematical Statistical Experiments includes a student project from the University of Canterbury, NZ, reproducing Galton's dice. The project includes empirical investigation from rolling the dice many times (including an empirical CDF that looks reassuringly "normal") and an adaptation of the dice scores so they follow the standard normal distribution. Using Galton's original scores, there is also a graph of the discretized normal distribution that the dice scores actually follow.

Galton dice discrete distribution


On a grand scale, if you are prepared to stretch the "mechanical" to the electrical, note that RAND's epic A Million Random Digits with 100,000 Normal Deviates was based on a kind of electronic simulation of a roulette wheel. From the technical report (by George W. Brown, originally June 1949) we find:

Thus motivated, the RAND people, with the assistance of Douglas Aircraft Company engineering personnel, designed an electro roulette wheel based on a variation of a proposal made by Cecil Hastings. For purposes of this talk a brief description will suffice. A random frequency pulse source was gated by a constant frequency pulse, about once a second, providing on the average about 100,000 pulses in one second. Pulse standardization circuits passed the pulses to a five place binary counter, so that in principle the machine is like a roulette wheel with 32 positions, making on the average about 3000 revolutions on each turn. A binary to decimal conversion was used, throwing away 12 of the 32 positions, and the resulting random digit was fed into an I.B.M. punch, yielding punched card tables of random digits. A detailed analysis of the randomness to be expected from such a machine was made by the designers and indicated that the machine should yield very high quality output.

However, before you too are tempted to assemble an electro roulette wheel, it would be a good idea to read the rest of the report! It transpired that the scheme "leaned heavily on the assumption of ideal pulse standardization to overcome natural preferences among the counter positions; later experience showed that this assumption was the weak point, and much of the later fussing with the machine was concerned with troubles originating at this point". Detailed statistical analysis revealed some problems with the output: for instance χ2 tests of the frequencies of odd and even digits revealed that some batches had a slight imbalance. This was worse in some batches than others, suggesting that "the machine had been running down in the month since its tune up ... The indications are at this machine required excessive maintenance to keep it in tip-top shape". However, a statistical way of resolving these issues was found:

At this point we had our original million digits, 20,000 I.B.M. cards with 50 digits to a card, with the small but perceptible odd-even bias disclosed by the statistical analysis. It was now decided to rerandomize the table, or at least alter it, by a little roulette playing with it, to remove the odd-even bias. We added (mod 10) the digits in each card, digit by digit, to the corresponding digits of the previous card. The derived table of one million digits was then subjected to the various standard tests, frequency tests, serial tests, poker tests, etc. These million digits have a clean bill of health and have been adopted as RAND's modern table of random digits.

There was, of course, good reason to believe that the addition process would do some good. In a general way, the underlying mechanism is the limiting approach of sums of random variables modulo the unit interval in the rectangular distribution, in the same way that unrestricted sums of random variables approach normality. This method has been used by Horton and Smith, of the Interstate Commerce Commission, to obtain some good batches of apparently random numbers from larger batches of badly non-random numbers.

Of course, this concerns generation of random decimal digits, but it easy to use these to produce random deviates sampled uniformly from [0,1], rounded to however many decimal places you saw fit to take digits. There are various lovely methods to generate deviates of other distributions from your uniform deviates, perhaps the most aesthetically pleasing of which is the ziggurat algorithm for probability distributions which are either monotone decreasing or unimodal symmetric, but conceptually the simplest and most widely applicable is the inverse CDF transform: given a deviate u from the uniform distribution on [0,1], and if your desired distribution has CDF F, then F1(u) will be a random deviate from your distribution. If you are interested specifically in random normal deviates then computationally, the Box-Muller transform is more efficient than inverse transform sampling, the Marsaglia polar method is more efficient again, and the ziggurat (image credit for the animation below) even better. Some practical issues are discussed on this StackOverflow thread if you intend to implement one or more of these methods in code.

Ziggurat for half-normal

References


() In the very same journal is von Neumann's highly-cited paper Various Techniques Used in Connection with Random Digits in which he considers the difficulties of generating random numbers for use in a computer. He rejects the idea of a physical device attached to a computer that generates random input on the fly, and considers whether some physical mechanism might be employed to generate random numbers which are then recorded for future use — essentially what RAND had done with their Million Digits. It also includes his famous quote about what we would describe as the difference between random and pseudo-random number generation: "Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number — there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method."


5
You must be kidding...(but, yeah, that's really manually +1)
nalzok

8
How much work put in this!
Richard Hardy

3
Notice that nonstandard dice can be unfair, so it is good to test them first, e.g. youtube.com/watch?v=VI3N4Qg-JZM
Tim

2
@RichardHardy Paradoxically, it's actually easier for me to get all this stuff written down while it's in front of me, and at least temporarily lodged in my memory, than to try to carry it all around in my head!
Silverfish

2
In any case, I find it impressive!
Richard Hardy

44

If you can get access to a very precise clock, you can extract the decimal part of the current time and turn it into a uniform, from which you can derive a normal simulation by the Box-Müller transform:

X=2logU1cos(2πU2)
(and even two since Y=2logU1sin(2πU2) is another normal variate independent from X).

For instance, on my Linux OS, I can check

$ date +%s.%N
1479733744.077762986
$ date +%s.%N
1479733980.615056616

hence set

U1=.077762986, U2=.615056616
and X as
> sqrt(-2*log(.077762986))*cos(2*pi*.615056616)
[1] -1.694815

Addendum: since computing logarithms and cosines may be deemed not manual enough, there exists a variant to Box-Müller that avoids using those transcendental functions (see Exercise 2.9 in our book Monte Carlo Statistical Methods):

a variant of Box-Muller

Now, one can argue against this version because of the Exponential variates. But there also exists a very clever way of simulating those variates without a call to transcendental functions, due to von Neumann, as summarised in this algorithm reproduced from Luc Devroye's Non-uniform Random Variates:

enter image description here

Admittedly, it requires the computation of 1/e, but only once.

If you do not have access to this clock, you can replace this uniform generator by a mechanistic uniform generator, like throwing a dart on a surface with a large number of unit squares (0,1)2 or rolling a ball on a unit interval (0,1) with enough bounces [as in Thomas Bayes' conceptual billiard experiment] or yet throwing matches on a wooden floor with unit width planks and counting the distance to the nearest leftmost separation [as in Buffon's experiment] or yet further to start a roulette wheel with number 1 the lowest and turn the resulting angle of 1 with its starting orientation into a uniform (0,2π) draw.

Using the CLT to approximate normality is certainly not a method I would ever advise as (1) you still need other variates to feed the average, so may as well use uniforms in the Box-Müller algorithm, and (2) the accuracy grows quite slowly with the number of simulations. Especially if using a discrete random variable like the result of a dice, even with more than six faces. To quote from Thomas et al. (2007), a survey on the pros and cons of Gaussian random generators:

The central limit theorem of course is an example of an “approximate” method—even if perfect arithmetic is used, for finite K the output will not be Gaussian.

Here is a quick experiment to illustrate the problem: I generated 100 times the average of 30 die outcomes:

dies=apply(matrix(sample(1:6,30*100,rep=TRUE),ncol=30),1,mean)

then normalised those averages into mean zero - variance one variates

stdies=(dies-3.5)/sqrt(35/12/30)

and looked at the normal fit [or lack thereof] of this sample:

qqnorm(stdies,col="gold2",pch=19);abline(a=0,b=1,col="steelblue",lwd=2,lty=2)

First, the fit is not great, especially in the tails, and second, rather obviously, the picture confirms that the number of values taken by the sample is embarrassingly finite. (In this particular experiment, there were only 34 different values taken by dies, between 76/30 and 122/30.) By comparison, if I exploit the very same 3000 die outcomes Di to create enough digits of a pseudo-uniform as

U=i=1kDi16i
with k=15 (note that 6¹⁵>10¹¹, hence I generate more than 11 truly random digits), and then apply the above Box-Müller transform to turn pairs of uniforms into pairs of N(0,1) variates,
dies=matrix(apply(matrix(sample(0:5,15*200,rep=TRUE),nrow=15)/6^(1:15),2,sum),ncol=2) 
norma=sqrt(-2*log(dies[,1]))*c(cos(2*pi*dies[,2]),sin(2*pi*dies[,2]))

the fit is as good as can be expected for a Normal sample of size 200 (just plot another one for a true normal sample, norma=rnorm(100)):

enter image description here

as further shown by a Kolmogorov-Smirnov test:

> ks.test(norma,pnorm)

        One-sample Kolmogorov-Smirnov test

data:  norma
D = 0.06439, p-value = 0.3783
alternative hypothesis: two-sided

3

This is not exactly random, but it should be close enough, as you seem to want a rough experiment.

Use your phone to setup a chronometer. After a good 10 seconds, stop it (The more you wait, the more you approach a truly "random" result, but 10 seconds are fine). Take the last digits (for instance, 10.67 sec will give you 67). Apply the percentile table for the normal distribution. In this example, you just have to search for 0.67 and you will find the number. In this case, your value is about 0.45. This is not perfectly precise, but it will give you a solid estimation.

If you roll below 50, just do 100-[Your Result] and use the table. Your result will be the same, with a minus sign, due to the symetry of N(0,1).


3

Let us flip an unbiased coin n times. Starting at zero, we count +1 if heads, 1 if tails. After n coin flips, we divide the counter by n. Using the central limit theorem, if n is sufficiently large, then we should have an "approximate realization" of the normalized Gaussian N(0,1).


Why? Let

Xk:={+1 if k-th coin flip is heads1 if k-th coin flip is tails

be i.i.d. Bernoulli random variables with P(Xk=±1)=12. Hence,

E(Xk)=0Var(Xk)=1

Let Y:=X1+X2++Xn. Hence,

E(Y)=0Var(Y)=n

Normalizing,

Z:=Yn

we obtain a random variable with unit variance

E(Z)=0Var(Z)=1

1
As discussed in my answer, I fear the CLT is not an efficient use of the randomness in the coin flips: they would be better exploited as binary digits of a pseudo-random U(0,1) variate with simple or double precision.
Xi'an

@Xi'an I read your answer before posting mine, and your objection to the CLT seemed to be based on slow convergence. Since this is a thought experiment, flipping a coin 10 billion times does not cost anything. And it truly is a manual procedure that requires no computers, no computation of logarithms, square roots or cosines. Sure, one could use a slide rule, but that may be going too far.
Rodrigo de Azevedo

1
:}}: 10 billion coin flips does not sound very manual to me...!
Xi'an

Manual = by hand. Computing logarithms and cosines by hand also takes time.
Rodrigo de Azevedo

0

It's worth noting that once you can generate a uniform(0,1), you can generate any random variable for which the inverse cdf is calculatable by simply plugging the uniform random variable into the inverse CDF.

So how might one calculate a uniform(0,1) manually? Well, as mentioned by @Silverfish, there are a variety of dice used by traditional RPG players. One of which is a ten sided die. Assuming this is a fair die, we can now generate a discrete uniform(0, 9).

We can also use this uniform(0,9) to represent a single digit of a random variable. So if we use two dice, we get a uniform random variable that can take on values 0.01,0.02,...,0.99,1.00. With three dice, we can get a uniform distribution on 0.001,0.002,...,0.999,1.000.

So we can get very close to a continuous uniform(0,1) by approximating it with a finely gridded discrete uniform distribution with a few 10 sided dice. This can then be plugged into an inverse CDF to produce the random variable of interest.

हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.