import random
#a list to store the generated random numbers
number_set = []
#Generate 100,000 random numbers
for x in range(100000):
#pick numbers between 1 and 10,000
##A list to store the leading digits
first_digit_set = []
#a method to get the leading digit
def get_leading_digit(number):
#convert the number to a string
#take the first character
#convert back to an integer and return the value
return int(str(number)[:1])
for d in number_set:
for i in list(range(1, 10)):
print("There are " + str(first_digit_set.count(i)) + " leading " + str(i) + "'s")
There are 33513 leading 1's
There are 33181 leading 2's
There are 33140 leading 3's
There are 33707 leading 4's
There are 33461 leading 5's
There are 33133 leading 6's
There are 33286 leading 7's
There are 33419 leading 8's
There are 33170 leading 9's
数字是均匀分布的!! 这有可能源于一下两个原因之一:
- 一位天才数学家错误地定义了他的定律
- 我做错了什么
而真正的答案则是:原来Python的标准库的随机模块生成的数字是均匀分布的。还记得本福德定律吗?它是一种观察,即现实生活中许多数字数据集的前导数的频率不是均匀分布的。 所以…
如何生成具有预定义分布的数据(使用 Python 3)?
如何使用本福德定律分布生成数据? 嗯,从Python 3.6开始random模块就有了一个名为random.choice的方法,它允许你指定权重和生成的项目数量……
from random import choices
from collections import Counter
#specify a list of values to generate occurrenced of
#these are the digits we was as leading digits
population = [1, 2, 3, 4, 5, 6, 7, 8, 9]
#Specify the weights
#these are the Benford Law weights)
weights = [0.301, 0.176, 0.124, 0.096, 0.079, 0.066, 0.057, 0.054, 0.047]
#generate sample first_digit set with Benford disctibution
#k = 10**6 generates 1 million values
first_digits = choices(population, weights, k=10**6)
#use the standard library's counter module to get the counts in order
values_in_order = Counter(first_digits).most_common()
[print(i) for i in values_in_order]
(1, 301193),
(2, 175999),
(3, 123747),
(4, 95958),
(5, 79342),
(6, 65449),
(7, 57246),
(8, 53951),
(9, 47115)
#Plot the result
import matplotlib.pyplot as plt
#For Jupyter notebooks uncomment the line below otherwise you will need to run the cell twice for the plot to appear
#%matplotlib inline
#Get the values for each digit
count = []
for c in values_in_order:
#sets spaces to put digit count into
y_pos = population
#set size of the whole chart
plt.figure(figsize=(10, 10))
# Label the axes and the chart
plt.xticks(y_pos, population)
plt.ylabel('Leading Digit Count')
plt.title('Benford\'s Law Distribution')
# Create bars and choose color
plt.bar(y_pos, count, color = 'pink')
# Limits for the Y axis
plt.ylim(0, int(max(count)*1.1))
#Display the Bar Graph