How to Sort Data Frames in Pandas
The sorting can be based on column labels.
The sort_values method has different parameters as shown in the syntax below.
Syntax
Let us show you examples of using “sort_values” with its parameters below.
An example of sorting results by a column label in ascending order
- In the Python program below, we created a data frame with three columns and five rows of data.
- Then we displayed the data frame in its original form.
- This is followed by using the sort_values method where we only specify the column label to sort the results.
Python program:
import pandas as pd #A list to be used for Data Frame emp_data = [ ['Emp1', "Mike", 5600], ['Emp2', "Michelle", 4750], ['Emp3', "Ben", 6500], ['Emp4', "Shabee", 3600], ['Emp5', "Mina", 3250] ] #Creating data frame df_emp = pd.DataFrame (emp_data, columns = ['ID', 'Name', 'Salary']) #Display data frame before sorting print("DF in Orginal Order") print(df_emp) #Display data frame after sorting print("===================") print("DF After Sorting") print(df_emp.sort_values(by='Name'))
Output:
DF in Orginal Order ID Name Salary 0 Emp1 Mike 5600 1 Emp2 Michelle 4750 2 Emp3 Ben 6500 3 Emp4 Shabee 3600 4 Emp5 Mina 3250 =================== DF After Sorting ID Name Salary 2 Emp3 Ben 6500 1 Emp2 Michelle 4750 0 Emp1 Mike 5600 4 Emp5 Mina 3250 3 Emp4 Shabee 3600
You can see, the results are sorted by Employee names in the data frame.
Sorting data frame in descending order
As it can be seen in the syntax, the default value for sorting order is ascending i.e.
By using the False value, you may get the data sorted in descending order.
See the example below where we used the same data frame and sorted results in descending order.
Program:
import pandas as pd #A list to be used for Data Frame emp_data = [ ['Emp1', "Mike", 5600], ['Emp2', "Michelle", 4750], ['Emp3', "Ben", 6500], ['Emp4', "Shabee", 3600], ['Emp5', "Mina", 3250] ] #Creating data frame df_emp = pd.DataFrame (emp_data, columns = ['ID', 'Name', 'Salary']) #Display data frame before sorting print("DF in Orginal Order") print(df_emp) #Display data frame after ascending=False print("===================") print("DF After Sorting in Descending Order") print(df_emp.sort_values(by='Name', ascending=False))
Output:
The example of sorting by salary column
The example below sorts the results in descending order based on the salary column:
import pandas as pd #A list to be used for Data Frame emp_data = [ ['Emp1', "Mike", 5600], ['Emp2', "Michelle", 4750], ['Emp3', "Ben", 6500], ['Emp4', "Shabee", 3600], ['Emp5', "Mina", 3250] ] #Creating data frame df_emp = pd.DataFrame (emp_data, columns = ['ID', 'Name', 'Salary']) #Sort results by Highest to loweset salary print(df_emp.sort_values(by='Salary', ascending=False))
Result:
ID Name Salary
2 Emp3 Ben 6500
0 Emp1 Mike 5600
1 Emp2 Michelle 4750
3 Emp4 Shabee 3600
4 Emp5 Mina 3250
Sort the result by two columns example
You may also sort the results by providing two or more columns.
The example below sorts the results by “Name” and “Salary” columns in our example data frame:
import pandas as pd #A list to be used for Data Frame emp_data = [ ['Emp1', "Mike", 5600], ['Emp2', "Michelle", 4750], ['Emp3', "Ben", 6500], ['Emp4', "Shabee", 3600], ['Emp5', "Mina", 3250] ] #Creating data frame df_emp = pd.DataFrame (emp_data, columns = ['ID', 'Name', 'Salary']) #Sort results by multiple columns print(df_emp.sort_values(by=['Name', 'Salary'], ascending=False))
Output:
ID Name Salary 3 Emp4 Shabee 3600 4 Emp5 Mina 3250 0 Emp1 Mike 5600 1 Emp2 Michelle 4750 2 Emp3 Ben 6500
Using ignore_index parameter example
By default, the ignore_index= False.
That means the index column numbers remain in place after sorting the results.
In the above example output, you can see 3, 4, 0, 1, 2, and the numbers that are for the original DF.
See the difference in the example below:
import pandas as pd #A list to be used for Data Frame emp_data = [ ['Emp1', "Mike", 5600], ['Emp2', "Michelle", 4750], ['Emp3', "Ben", 6500], ['Emp4', "Shabee", 3600], ['Emp5', "Mina", 3250] ] #Creating data frame df_emp = pd.DataFrame (emp_data, columns = ['ID', 'Name', 'Salary']) print("Sort results with ignore_index=False (default value)") print(df_emp.sort_values(by=['Name', 'Salary'], ignore_index=False)) print("Sort results with ignore_index=True") print(df_emp.sort_values(by=['Name', 'Salary'], ignore_index=True))
Output:
Reference:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html