Un exemple de comment obtenir la valeur maximale de deux colonnes ou plus dans un dataframe pandas
Créer une simple dataframe avec pandas
Créons une simple dataframe avec pandas
import pandas as pdimport numpy as npimport randoma = [random.choice(['a','b','c']) for i in range(20)]c1 = np.random.uniform(0,1, size=20)c2 = np.random.uniform(0,1, size=20)c3 = np.random.uniform(0,1, size=20)data = {'a':a,'c1':c1,'c2':c2,'c3':c3}df = pd.DataFrame(data)print(df)
donne par exemple
a c1 c2 c30 b 0.480835 0.108703 0.0628931 b 0.469707 0.441162 0.5847762 a 0.942992 0.254551 0.7506893 c 0.634768 0.048194 0.8109344 b 0.185650 0.551316 0.1264575 c 0.661850 0.178760 0.3218016 c 0.739558 0.878052 0.1964507 a 0.344361 0.267702 0.4018208 a 0.430181 0.300013 0.6749599 b 0.412139 0.295223 0.79903310 a 0.101919 0.810593 0.66557711 c 0.036506 0.705999 0.18386012 a 0.832732 0.288746 0.99419213 b 0.474459 0.574548 0.53934814 b 0.322733 0.015939 0.14228815 c 0.108360 0.447548 0.47468616 c 0.111654 0.048010 0.65197017 c 0.360656 0.155344 0.85424918 a 0.404405 0.034052 0.61872819 b 0.234892 0.052290 0.889191
Trouvez la valeur maximale pour chaque ligne
Pour trouvez la valeur maximale pour chaque ligne entre les colonnes ['c1','c2','c3'] une solution est d'utiliser pandas.DataFrame.max:
df[['c1','c2','c3']].max(axis=1)
donne
0 0.4808351 0.5847762 0.9429923 0.8109344 0.5513165 0.6618506 0.8780527 0.4018208 0.6749599 0.79903310 0.81059311 0.70599912 0.99419213 0.57454814 0.32273315 0.47468616 0.65197017 0.85424918 0.61872819 0.889191dtype: float64
Pour ajouter une colonne contenant les valeurs max:
df['max value'] = df[['c1','c2','c3']].max(axis=1)
donne
a c1 c2 c3 max value0 b 0.480835 0.108703 0.062893 0.4808351 b 0.469707 0.441162 0.584776 0.5847762 a 0.942992 0.254551 0.750689 0.9429923 c 0.634768 0.048194 0.810934 0.8109344 b 0.185650 0.551316 0.126457 0.5513165 c 0.661850 0.178760 0.321801 0.6618506 c 0.739558 0.878052 0.196450 0.8780527 a 0.344361 0.267702 0.401820 0.4018208 a 0.430181 0.300013 0.674959 0.6749599 b 0.412139 0.295223 0.799033 0.79903310 a 0.101919 0.810593 0.665577 0.81059311 c 0.036506 0.705999 0.183860 0.70599912 a 0.832732 0.288746 0.994192 0.99419213 b 0.474459 0.574548 0.539348 0.57454814 b 0.322733 0.015939 0.142288 0.32273315 c 0.108360 0.447548 0.474686 0.47468616 c 0.111654 0.048010 0.651970 0.65197017 c 0.360656 0.155344 0.854249 0.85424918 a 0.404405 0.034052 0.618728 0.61872819 b 0.234892 0.052290 0.889191 0.889191
Trouver le nom de la colonne contenant la valeur maximale
Pour trouver le nom de la colonne contenant la valeur maximale pandas.DataFrame.idxmax
df[['c1','c2','c3']].idxmax(axis=1)
donne
0 c11 c32 c13 c34 c25 c16 c27 c38 c39 c310 c211 c212 c313 c214 c115 c316 c317 c318 c319 c3dtype: object
Créez une nouvelle colonne pour stocker les résultats:
df['col name'] = df[['c1','c2','c3']].idxmax(axis=1)
donne
a c1 c2 c3 max value col name0 b 0.480835 0.108703 0.062893 0.480835 c11 b 0.469707 0.441162 0.584776 0.584776 c32 a 0.942992 0.254551 0.750689 0.942992 c13 c 0.634768 0.048194 0.810934 0.810934 c34 b 0.185650 0.551316 0.126457 0.551316 c25 c 0.661850 0.178760 0.321801 0.661850 c16 c 0.739558 0.878052 0.196450 0.878052 c27 a 0.344361 0.267702 0.401820 0.401820 c38 a 0.430181 0.300013 0.674959 0.674959 c39 b 0.412139 0.295223 0.799033 0.799033 c310 a 0.101919 0.810593 0.665577 0.810593 c211 c 0.036506 0.705999 0.183860 0.705999 c212 a 0.832732 0.288746 0.994192 0.994192 c313 b 0.474459 0.574548 0.539348 0.574548 c214 b 0.322733 0.015939 0.142288 0.322733 c115 c 0.108360 0.447548 0.474686 0.474686 c316 c 0.111654 0.048010 0.651970 0.651970 c317 c 0.360656 0.155344 0.854249 0.854249 c318 a 0.404405 0.034052 0.618728 0.618728 c319 b 0.234892 0.052290 0.889191 0.889191 c3
