Exemple de comment sélectionner une ou plusieurs lignes de données dans une DataFrame avec pandas sous python:
Créer une DataFrame avec Pandas
Soit par exemple le fichier csv suivant train.csv (que l'on peut télécharger sur kaggle). Pour lire le fichier il existe la fonction pandas read_csv():
>>> import pandas as pd
>>> df = pd.read_csv('train.csv')
>>> df.shape
(1460, 81)
Aperçu de données avec la fonction head():
>>> df.head(10)
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
0 1 60 RL 65.0 8450 Pave NaN Reg
1 2 20 RL 80.0 9600 Pave NaN Reg
2 3 60 RL 68.0 11250 Pave NaN IR1
3 4 70 RL 60.0 9550 Pave NaN IR1
4 5 60 RL 84.0 14260 Pave NaN IR1
5 6 50 RL 85.0 14115 Pave NaN IR1
6 7 20 RL 75.0 10084 Pave NaN Reg
7 8 60 RL NaN 10382 Pave NaN IR1
8 9 50 RM 51.0 6120 Pave NaN Reg
9 10 190 RL 50.0 7420 Pave NaN Reg
LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
0 Lvl AllPub ... 0 NaN NaN NaN 0
1 Lvl AllPub ... 0 NaN NaN NaN 0
2 Lvl AllPub ... 0 NaN NaN NaN 0
3 Lvl AllPub ... 0 NaN NaN NaN 0
4 Lvl AllPub ... 0 NaN NaN NaN 0
5 Lvl AllPub ... 0 NaN MnPrv Shed 700
6 Lvl AllPub ... 0 NaN NaN NaN 0
7 Lvl AllPub ... 0 NaN NaN Shed 350
8 Lvl AllPub ... 0 NaN NaN NaN 0
9 Lvl AllPub ... 0 NaN NaN NaN 0
MoSold YrSold SaleType SaleCondition SalePrice
0 2 2008 WD Normal 208500
1 5 2007 WD Normal 181500
2 9 2008 WD Normal 223500
3 2 2006 WD Abnorml 140000
4 12 2008 WD Normal 250000
5 10 2009 WD Normal 143000
6 8 2007 WD Normal 307000
7 11 2009 WD Normal 200000
8 4 2008 WD Abnorml 129900
9 1 2008 WD Normal 118000
[10 rows x 81 columns]
Sélectionner une ligne donnée
>>> df.iloc[4,:]
Id 5
MSSubClass 60
MSZoning RL
LotFrontage 84
LotArea 14260
Street Pave
Alley NaN
LotShape IR1
LandContour Lvl
Utilities AllPub
LotConfig FR2
LandSlope Gtl
Neighborhood NoRidge
Condition1 Norm
Condition2 Norm
BldgType 1Fam
HouseStyle 2Story
OverallQual 8
OverallCond 5
YearBuilt 2000
YearRemodAdd 2000
RoofStyle Gable
RoofMatl CompShg
Exterior1st VinylSd
Exterior2nd VinylSd
MasVnrType BrkFace
MasVnrArea 350
ExterQual Gd
ExterCond TA
Foundation PConc
...
BedroomAbvGr 4
KitchenAbvGr 1
KitchenQual Gd
TotRmsAbvGrd 9
Functional Typ
Fireplaces 1
FireplaceQu TA
GarageType Attchd
GarageYrBlt 2000
GarageFinish RFn
GarageCars 3
GarageArea 836
GarageQual TA
GarageCond TA
PavedDrive Y
WoodDeckSF 192
OpenPorchSF 84
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
PoolQC NaN
Fence NaN
MiscFeature NaN
MiscVal 0
MoSold 12
YrSold 2008
SaleType WD
SaleCondition Normal
SalePrice 250000
Name: 4, dtype: object
Sélectionner plusieurs lignes
>>> df.iloc[[3,5,7],:]
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
3 4 70 RL 60.0 9550 Pave NaN IR1
5 6 50 RL 85.0 14115 Pave NaN IR1
7 8 60 RL NaN 10382 Pave NaN IR1
LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
3 Lvl AllPub ... 0 NaN NaN NaN 0
5 Lvl AllPub ... 0 NaN MnPrv Shed 700
7 Lvl AllPub ... 0 NaN NaN Shed 350
MoSold YrSold SaleType SaleCondition SalePrice
3 2 2006 WD Abnorml 140000
5 10 2009 WD Normal 143000
7 11 2009 WD Normal 200000
[3 rows x 81 columns]
Sélectionner des lignes consécutives
>>> df.iloc[2:5,:]
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
2 3 60 RL 68.0 11250 Pave NaN IR1
3 4 70 RL 60.0 9550 Pave NaN IR1
4 5 60 RL 84.0 14260 Pave NaN IR1
LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
2 Lvl AllPub ... 0 NaN NaN NaN 0
3 Lvl AllPub ... 0 NaN NaN NaN 0
4 Lvl AllPub ... 0 NaN NaN NaN 0
MoSold YrSold SaleType SaleCondition SalePrice
2 9 2008 WD Normal 223500
3 2 2006 WD Abnorml 140000
4 12 2008 WD Normal 250000
[3 rows x 81 columns]
Références
Liens | Site |
---|---|
Selecting Subsets of Data in Pandas: Part 1 | medium.com |
Select Rows & Columns by Name or Index in DataFrame using loc & iloc Python Pandas | thispointer.com |
pandas.DataFrame.loc | pandas doc |
pandas.DataFrame.iloc | pandas doc |