Un exemple de comment obtenir la valeur maximale de deux colonnes ou plus dans un dataframe pandas
Créer une simple dataframe avec pandas
Créons une simple dataframe avec pandas
import pandas as pd
import numpy as np
import random
a = [random.choice(['a','b','c']) for i in range(20)]
c1 = np.random.uniform(0,1, size=20)
c2 = np.random.uniform(0,1, size=20)
c3 = np.random.uniform(0,1, size=20)
data = {'a':a,
'c1':c1,
'c2':c2,
'c3':c3}
df = pd.DataFrame(data)
print(df)
donne par exemple
a c1 c2 c3
0 b 0.480835 0.108703 0.062893
1 b 0.469707 0.441162 0.584776
2 a 0.942992 0.254551 0.750689
3 c 0.634768 0.048194 0.810934
4 b 0.185650 0.551316 0.126457
5 c 0.661850 0.178760 0.321801
6 c 0.739558 0.878052 0.196450
7 a 0.344361 0.267702 0.401820
8 a 0.430181 0.300013 0.674959
9 b 0.412139 0.295223 0.799033
10 a 0.101919 0.810593 0.665577
11 c 0.036506 0.705999 0.183860
12 a 0.832732 0.288746 0.994192
13 b 0.474459 0.574548 0.539348
14 b 0.322733 0.015939 0.142288
15 c 0.108360 0.447548 0.474686
16 c 0.111654 0.048010 0.651970
17 c 0.360656 0.155344 0.854249
18 a 0.404405 0.034052 0.618728
19 b 0.234892 0.052290 0.889191
Trouvez la valeur maximale pour chaque ligne
Pour trouvez la valeur maximale pour chaque ligne entre les colonnes ['c1','c2','c3'] une solution est d'utiliser pandas.DataFrame.max:
df[['c1','c2','c3']].max(axis=1)
donne
0 0.480835
1 0.584776
2 0.942992
3 0.810934
4 0.551316
5 0.661850
6 0.878052
7 0.401820
8 0.674959
9 0.799033
10 0.810593
11 0.705999
12 0.994192
13 0.574548
14 0.322733
15 0.474686
16 0.651970
17 0.854249
18 0.618728
19 0.889191
dtype: float64
Pour ajouter une colonne contenant les valeurs max:
df['max value'] = df[['c1','c2','c3']].max(axis=1)
donne
a c1 c2 c3 max value
0 b 0.480835 0.108703 0.062893 0.480835
1 b 0.469707 0.441162 0.584776 0.584776
2 a 0.942992 0.254551 0.750689 0.942992
3 c 0.634768 0.048194 0.810934 0.810934
4 b 0.185650 0.551316 0.126457 0.551316
5 c 0.661850 0.178760 0.321801 0.661850
6 c 0.739558 0.878052 0.196450 0.878052
7 a 0.344361 0.267702 0.401820 0.401820
8 a 0.430181 0.300013 0.674959 0.674959
9 b 0.412139 0.295223 0.799033 0.799033
10 a 0.101919 0.810593 0.665577 0.810593
11 c 0.036506 0.705999 0.183860 0.705999
12 a 0.832732 0.288746 0.994192 0.994192
13 b 0.474459 0.574548 0.539348 0.574548
14 b 0.322733 0.015939 0.142288 0.322733
15 c 0.108360 0.447548 0.474686 0.474686
16 c 0.111654 0.048010 0.651970 0.651970
17 c 0.360656 0.155344 0.854249 0.854249
18 a 0.404405 0.034052 0.618728 0.618728
19 b 0.234892 0.052290 0.889191 0.889191
Trouver le nom de la colonne contenant la valeur maximale
Pour trouver le nom de la colonne contenant la valeur maximale pandas.DataFrame.idxmax
df[['c1','c2','c3']].idxmax(axis=1)
donne
0 c1
1 c3
2 c1
3 c3
4 c2
5 c1
6 c2
7 c3
8 c3
9 c3
10 c2
11 c2
12 c3
13 c2
14 c1
15 c3
16 c3
17 c3
18 c3
19 c3
dtype: object
Créez une nouvelle colonne pour stocker les résultats:
df['col name'] = df[['c1','c2','c3']].idxmax(axis=1)
donne
a c1 c2 c3 max value col name
0 b 0.480835 0.108703 0.062893 0.480835 c1
1 b 0.469707 0.441162 0.584776 0.584776 c3
2 a 0.942992 0.254551 0.750689 0.942992 c1
3 c 0.634768 0.048194 0.810934 0.810934 c3
4 b 0.185650 0.551316 0.126457 0.551316 c2
5 c 0.661850 0.178760 0.321801 0.661850 c1
6 c 0.739558 0.878052 0.196450 0.878052 c2
7 a 0.344361 0.267702 0.401820 0.401820 c3
8 a 0.430181 0.300013 0.674959 0.674959 c3
9 b 0.412139 0.295223 0.799033 0.799033 c3
10 a 0.101919 0.810593 0.665577 0.810593 c2
11 c 0.036506 0.705999 0.183860 0.705999 c2
12 a 0.832732 0.288746 0.994192 0.994192 c3
13 b 0.474459 0.574548 0.539348 0.574548 c2
14 b 0.322733 0.015939 0.142288 0.322733 c1
15 c 0.108360 0.447548 0.474686 0.474686 c3
16 c 0.111654 0.048010 0.651970 0.651970 c3
17 c 0.360656 0.155344 0.854249 0.854249 c3
18 a 0.404405 0.034052 0.618728 0.618728 c3
19 b 0.234892 0.052290 0.889191 0.889191 c3