Introduction
Cet article introduit, comment avec le langage python, obtenir
différents éléments relatifs aux statistiques descriptives à 1 variable ( moyenne, médiane, etc et les représentations graphiques usuelles). Pour illustrer l'article on a utilisé un exemple provenant d'un cours video sur une introduction aux statistiques descriptives
(voir les statistiques descriptives ).
- Télécharger le fichier de données: [attachment:203]
- Télécharger le code python: [attachment:204]
- Exécution du code: python DescriptiveStatistics_01.py
Description intrinsèque
La Moyenne
np.mean(Taille)
La Médiane
np.median(Taille)
Le mode
stats.mode(Taille,axis=0)
Le maximum minimum
max(Taille), min(Taille)
L'écart type (et la variance)
np.std(Taille)np.std(Taille, ddof=1)
Les quartiles
print 'First quartile: ', stats.scoreatpercentile(Taille, 25)print 'Second quartile: ', stats.scoreatpercentile(Taille, 50)print 'Third quartile: ', stats.scoreatpercentile(Taille, 75)
Exemple
- Moyenne (mean): 169.7
- L'écart type (standard deviation) 9.95540054443
- L'écart type non biasé (standard deviation unbiased): 10.2140254449
- La médiane (median): 167.5
- Maximum et minimum (Max and Min Value): 190.0, 150.0
- Étendue (Range): 40.0
- Mode (Mode): (array([ 164.]), array([ 3.]))
- First quartile: 163.75
- Second quartile: 167.5
- Third quartile: 175.5
Représentations graphiques
Histogramme
(Histogram)
fig = plt.figure()plt.xticks(x_pos, people,rotation=45)plt.ylabel(r'Absolute Frequency $n_i$')bar1 = plt.bar(X,AbsoluteFrequency,width=1.0,bottom=0,color='Green',alpha=0.65,label='Legend')plt.savefig('Histogram.png', bbox_inches='tight')plt.show()
Fonction de répartition
(Cumulative distribution function)
fig = plt.figure()for i in np.arange(NbClass):plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \[CumulativeFrequency[i],CumulativeFrequency[i]], 'k--')if i < NbClass - 1:plt.scatter(CumulativeFrequency_xEnd[i], CumulativeFrequency[i], \s=80, facecolors='none', edgecolors='r')if i == 0:plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \[0,CumulativeFrequency[i]], 'r--')else:plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \[CumulativeFrequency[i-1],CumulativeFrequency[i]], 'r--')plt.xlim(0,NbClass)plt.ylim(0,1)plt.xticks(x_pos, LabelList,rotation=45)plt.title("Cumulative Distribution Function")plt.savefig('CumulativeDistributionFunction.png', bbox_inches='tight')plt.show()
Boîte à moustaches
(Box Plot)
fig = plt.figure()plt.xticks([0], ['Taille'])plt.boxplot(Taille)plt.savefig('BoxPlot.png', bbox_inches='tight')plt.show()
Code Python
Code Source:

from scipy import statsimport numpy as npimport matplotlib.pyplot as pltimport math# '---------- Read Data ----------'Taille, Poids = np.loadtxt("data.txt", unpack=True, skiprows=1)Taille = np.sort(Taille)# '---------- Print Descriptive statistics: Continuous Case ----------'print Tailleprint 'Taille Dim: ', Taille.shapeprint 'mean', np.mean(Taille)print 'std', np.std(Taille)print 'std (unbiased): ', np.std(Taille, ddof=1)print 'Median: ', np.median(Taille)print 'Max and Min Value: ', max(Taille), min(Taille)print 'Range: ', max(Taille) - min(Taille)print 'Mode: ', stats.mode(Taille,axis=0)print 'First quartile: ', stats.scoreatpercentile(Taille, 25)print 'Second quartile: ', stats.scoreatpercentile(Taille, 50)print 'Third quartile: ', stats.scoreatpercentile(Taille, 75)# '---------- Discrete Case ----------'NbData = Taille.shape[0]NbClass = 4 #int( math.log(NbData,2) ) + 1Range = max(Taille) - min(Taille)ClassRange = float( Range ) / NbClassprint 'NbData: ', NbDataprint 'NbClass: ', NbClassprint 'ClassRange: ', ClassRangeX = np.arange(NbClass)AbsoluteFrequency = np.zeros(NbClass)for i in np.arange(NbData-1):c = int((Taille[i]-min(Taille))/ClassRange)AbsoluteFrequency[c] = AbsoluteFrequency[c] + 1AbsoluteFrequency[NbClass-1] = AbsoluteFrequency[NbClass-1] + 1ClassLabel = []j = round(min(Taille),2)for i in np.arange(NbClass+1):ClassLabel.append(j)j = round(j + ClassRange,2)LabelList = (ClassLabel)x_pos = np.arange(len(LabelList))# '---------- Plot Absolute Frequency Histogram ----------'fig = plt.figure()plt.xticks(x_pos, LabelList,rotation=45)plt.ylabel(r'Absolute Frequency $n_i$')bar1 = plt.bar(X,AbsoluteFrequency,\width=1.0,bottom=0,color='Green',alpha=0.65,label='Legend')plt.savefig('Histogram.png', bbox_inches='tight')plt.show()RelativeFrequency = np.zeros(NbClass)RelativeFrequency = AbsoluteFrequency / NbData# '---------- Plot Cumulative distribution function ----------'CumulativeFrequency = np.zeros(NbClass)CumulativeFrequency_xStart = np.zeros(NbClass)CumulativeFrequency_xEnd = np.zeros(NbClass)j = 0k = 0for i in np.arange(NbClass):CumulativeFrequency[i] = j + RelativeFrequency[i]j = j + RelativeFrequency[i]CumulativeFrequency_xStart[i] = kCumulativeFrequency_xEnd[i] = k + 1k += 1fig = plt.figure()for i in np.arange(NbClass):plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \[CumulativeFrequency[i],CumulativeFrequency[i]], 'k--')if i < NbClass - 1:plt.scatter(CumulativeFrequency_xEnd[i], CumulativeFrequency[i], \s=80, facecolors='none', edgecolors='r')if i == 0:plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \[0,CumulativeFrequency[i]], 'r--')else:plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \[CumulativeFrequency[i-1],CumulativeFrequency[i]], 'r--')plt.xlim(0,NbClass)plt.ylim(0,1)plt.xticks(x_pos, LabelList,rotation=45)plt.title("Cumulative Distribution Function")plt.savefig('CumulativeDistributionFunction.png', bbox_inches='tight')plt.show()# '---------- Plot Box Plot ----------'fig = plt.figure()plt.xticks([0], ['Taille'])plt.boxplot(Taille)plt.savefig('BoxPlot.png', bbox_inches='tight')plt.show()
Références
Liste non exhaustive des pages web consultées lors de la rédaction de cet article:
| Principaux Liens | Description |
|---|---|
| How to do a scatter plot with empty circles in Python? | Lien externe (stackoverflow) matplotlib |
| Inconsistent standard deviation and variance implementation in scipy vs scipy stats | Lien externe (forum) |
| Calculer une standard déviation avec numpy ? | Lien externe (numpy) |
| Calculer une moyenne avec numpy ? | Lien externe (numpy) |
| Find the most frequent number in a numpy vector | Lien externe (Question sur StackoverFlow) |
| Most efficient way to find mode in numpy array ? | Lien externe (Question sur StackoverFlow) |


