使用 Python 验证数据集中的体温是否符合正态分布

数据集地址：http://jse.amstat.org/datasets/normtemp.dat.txt

数据集描述：总共只有三列：体温、性别、心率

#代码from scipy import stats as st
import matplotlib.pyplot as plt
import pandas as pd#防止乱码
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False#读入数据data = pd.read_csv('http://jse.amstat.org/datasets/normtemp.dat.txt',sep='\s+',header=None,names='temperature;Gender;Heart rate'.split(';'))#数据描述data['temperature'].describe()

输出：

count    130.000000
mean      98.249231
std        0.733183
min       96.300000
25%       97.800000
50%       98.300000
75%       98.700000
max      100.800000

#四种方法验证#1 shapiro方法来检验体温是否符合正态分布print(st.shapiro(data['temperature']))#(0.9865769743919373, 0.2331680953502655)  第二个数为P值,大于0.05#2 normaltest方法验证体温是否符合正态分布print(st.normaltest(data['temperature'], axis=None))#NormaltestResult(statistic=2.703801433319236, pvalue=0.2587479863488212) 第二个数为P值,大于0.05#3 kstest方法来检验体温是否符合正态分布u = data['temperature'].mean()
std = data['temperature'].std()
print(st.kstest(data['temperature'], 'norm',(u,std)))#KstestResult(statistic=0.06472685044046644, pvalue=0.645030731743997) 第二个数为P值,大于0.05#4 anderson方法来检验体温是否符合正态分布print(st.anderson(data['temperature']))#AndersonResult(statistic=0.5201038826714353, critical_values=array([0.56 , 0.637, 0.765, 0.892, 1.061]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ])) #显著性水平为[15. , 10. ,  5. ,  2.5,  1. ]，statistic小于critical_values，该检验不能拒绝为正态分布，即该检验为正态分布。

anderson方法说明:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson.html

normal/exponenential
15%, 10%, 5%, 2.5%, 1%logistic
25%, 10%, 5%, 2.5%, 1%, 0.5%Gumbel
25%, 10%, 5%, 2.5%, 1%If the returned statistic is larger than these critical values then for the corresponding significance level, 
the null hypothesis that the data come from the chosen distribution can be rejected.

#绘图

x = data['temperature']
x = x.sort_values()
loc,scale = st.norm.fit(x)
plt.plot(x, st.norm.pdf(x,loc,scale),'b-',label = 'norm')
plt.show()

个人收藏笔记记录

开通VIP