Statistiques Descriptives à 1 variable (Python)


Introduction

Cet article introduit, comment avec le langage python, obtenir
différents éléments relatifs aux statistiques descriptives à 1 variable ( moyenne, médiane, etc et les représentations graphiques usuelles). Pour illustrer l'article on a utilisé un exemple provenant d'un cours video sur une introduction aux statistiques descriptives
(voir les statistiques descriptives ).

  • Télécharger le fichier de données: [attachment:203]
  • Télécharger le code python: [attachment:204]
  • Exécution du code: python DescriptiveStatistics_01.py

Description intrinsèque

La Moyenne

np.mean(Taille)

La Médiane

np.median(Taille)

Le mode

stats.mode(Taille,axis=0)

Le maximum minimum

max(Taille), min(Taille)

L'écart type (et la variance)

np.std(Taille)
np.std(Taille, ddof=1)

Les quartiles

print 'First quartile: ', stats.scoreatpercentile(Taille, 25)
print 'Second quartile: ', stats.scoreatpercentile(Taille, 50)
print 'Third quartile: ', stats.scoreatpercentile(Taille, 75)

Exemple

  • Moyenne (mean): 169.7
  • L'écart type (standard deviation) 9.95540054443
  • L'écart type non biasé (standard deviation unbiased): 10.2140254449
  • La médiane (median): 167.5
  • Maximum et minimum (Max and Min Value): 190.0, 150.0
  • Étendue (Range): 40.0
  • Mode (Mode): (array([ 164.]), array([ 3.]))
  • First quartile: 163.75
  • Second quartile: 167.5
  • Third quartile: 175.5

Représentations graphiques

Histogramme

(Histogram)

fig = plt.figure()
plt.xticks(x_pos, people,rotation=45)
plt.ylabel(r'Absolute Frequency $n_i$')
bar1 = plt.bar(X,AbsoluteFrequency,width=1.0,bottom=0,color='Green',alpha=0.65,label='Legend')
plt.savefig('Histogram.png', bbox_inches='tight')
plt.show()

Fonction de répartition

(Cumulative distribution function)

fig = plt.figure()
for i in np.arange(NbClass):
    plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \
    [CumulativeFrequency[i],CumulativeFrequency[i]], 'k--')
    if i < NbClass - 1:
        plt.scatter(CumulativeFrequency_xEnd[i], CumulativeFrequency[i], \
        s=80, facecolors='none', edgecolors='r')
    if i == 0:
        plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \
        [0,CumulativeFrequency[i]], 'r--')
    else:
        plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \
        [CumulativeFrequency[i-1],CumulativeFrequency[i]], 'r--')    
plt.xlim(0,NbClass)
plt.ylim(0,1)
plt.xticks(x_pos, LabelList,rotation=45)
plt.title("Cumulative Distribution Function")
plt.savefig('CumulativeDistributionFunction.png', bbox_inches='tight')
plt.show()

Boîte à moustaches

(Box Plot)

fig = plt.figure()
plt.xticks([0], ['Taille'])
plt.boxplot(Taille)
plt.savefig('BoxPlot.png', bbox_inches='tight')
plt.show()

Code Python

Code Source:

Histogramme

Fonction de répartition (Cumulative distribution function)

Boîte à moustaches (Box Plot)

from scipy import stats

import numpy as np
import matplotlib.pyplot as plt
import math

# '---------- Read Data ----------'

Taille, Poids = np.loadtxt("data.txt", unpack=True, skiprows=1)

Taille = np.sort(Taille)

# '---------- Print Descriptive statistics: Continuous Case ----------'

print Taille
print 'Taille Dim: ', Taille.shape
print 'mean', np.mean(Taille)
print 'std', np.std(Taille)
print 'std (unbiased): ', np.std(Taille, ddof=1)
print 'Median: ', np.median(Taille)
print 'Max and Min Value: ', max(Taille), min(Taille)
print 'Range: ', max(Taille) - min(Taille)
print 'Mode: ', stats.mode(Taille,axis=0)
print 'First quartile: ', stats.scoreatpercentile(Taille, 25)
print 'Second quartile: ', stats.scoreatpercentile(Taille, 50)
print 'Third quartile: ', stats.scoreatpercentile(Taille, 75)

# '---------- Discrete Case ----------'

NbData = Taille.shape[0]
NbClass = 4 #int( math.log(NbData,2) ) + 1
Range = max(Taille) - min(Taille)
ClassRange = float( Range ) / NbClass

print 'NbData: ', NbData
print 'NbClass: ', NbClass
print 'ClassRange: ', ClassRange

X = np.arange(NbClass)
AbsoluteFrequency = np.zeros(NbClass)
for i in np.arange(NbData-1):
    c = int((Taille[i]-min(Taille))/ClassRange) 
    AbsoluteFrequency[c] = AbsoluteFrequency[c] + 1
AbsoluteFrequency[NbClass-1] = AbsoluteFrequency[NbClass-1] + 1

ClassLabel = []
j = round(min(Taille),2)
for i in np.arange(NbClass+1):
    ClassLabel.append(j)
    j = round(j + ClassRange,2)
LabelList = (ClassLabel)
x_pos = np.arange(len(LabelList))

# '---------- Plot Absolute Frequency Histogram ----------'

fig = plt.figure()
plt.xticks(x_pos, LabelList,rotation=45)
plt.ylabel(r'Absolute Frequency $n_i$')
bar1 = plt.bar(X,AbsoluteFrequency,\
       width=1.0,bottom=0,color='Green',alpha=0.65,label='Legend')
plt.savefig('Histogram.png', bbox_inches='tight')
plt.show()

RelativeFrequency = np.zeros(NbClass)
RelativeFrequency = AbsoluteFrequency / NbData

# '---------- Plot Cumulative distribution function ----------'

CumulativeFrequency = np.zeros(NbClass)
CumulativeFrequency_xStart = np.zeros(NbClass)
CumulativeFrequency_xEnd = np.zeros(NbClass)
j = 0 
k = 0
for i in np.arange(NbClass):
    CumulativeFrequency[i] = j + RelativeFrequency[i]
    j = j + RelativeFrequency[i]
    CumulativeFrequency_xStart[i] = k
    CumulativeFrequency_xEnd[i] = k + 1
    k += 1

fig = plt.figure()
for i in np.arange(NbClass):
    plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \
    [CumulativeFrequency[i],CumulativeFrequency[i]], 'k--')
    if i < NbClass - 1:
        plt.scatter(CumulativeFrequency_xEnd[i], CumulativeFrequency[i], \
        s=80, facecolors='none', edgecolors='r')
    if i == 0:
        plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \
        [0,CumulativeFrequency[i]], 'r--')
    else:
        plt.plot([CumulativeFrequency_xStart[i],CumulativeFrequency_xEnd[i]], \
        [CumulativeFrequency[i-1],CumulativeFrequency[i]], 'r--')    
plt.xlim(0,NbClass)
plt.ylim(0,1)
plt.xticks(x_pos, LabelList,rotation=45)
plt.title("Cumulative Distribution Function")
plt.savefig('CumulativeDistributionFunction.png', bbox_inches='tight')
plt.show()

# '---------- Plot Box Plot ----------'

fig = plt.figure()
plt.xticks([0], ['Taille'])
plt.boxplot(Taille)
plt.savefig('BoxPlot.png', bbox_inches='tight')
plt.show()

Références

Liste non exhaustive des pages web consultées lors de la rédaction de cet article:

Principaux Liens Description
How to do a scatter plot with empty circles in Python? Lien externe (stackoverflow) matplotlib
Inconsistent standard deviation and variance implementation in scipy vs scipy stats Lien externe (forum)
Calculer une standard déviation avec numpy ? Lien externe (numpy)
Calculer une moyenne avec numpy ? Lien externe (numpy)
Find the most frequent number in a numpy vector Lien externe (Question sur StackoverFlow)
Most efficient way to find mode in numpy array ? Lien externe (Question sur StackoverFlow)
Image

of