A pivot table is a way of summarizing data in a DataFrame for a particular purpose. It makes heavy use of this aggregation function we've been talking about. A pivot table is in itself a DataFrame, where the rows represent one variable that you're interested in, the columns another, and then the cells some aggregate value. A pivot table also tends to include marginal values as well, which are sum for each column and row. This allows you to be able to see the relationship between two variables at just a glance. Let's take a look at pivot tables in Pandas. So we'll import Pandas as pd and import NumPy as np. So here we have the Times Higher Education World University Ranking dataset, one of the most influential dataset for universities. So let's import this dataset and see what it looks like. So DataFrame is pd.read_csv and we can find this in datasets/cwurData.csv. Let's look at the head of that. So here we can see that each institutions rank, country, quality of education and other metrics, and overall score are shown. Let's say we wanted to create a new column called Rank_Level, where institutions with world rankings 1-100 are categorized as first tier, and those with world rankings 101 to 200 are second tier, and ranking 201 to 300 are third tier. Then after 301, we'll just bucket those as other top universities. Now, you actually already have enough knowledge to do this. So why don't you pause the video and give it a try? So here's my solution. I'm going to create a function called create_category which will operate on the first column in the data frame, world_rank. So I'll def create_category with some and it'll take some ranking. So since the rank is just an integer, I'll just do a bunch of if/elif statements. So if it's greater than or equal to one or less than a 100, I will make it a first tier. For greater than or equal to 101 but less than 200, we'll say it's a second top tier, and we'll do our third tier here as well, and then the default will just be return other top university. So now we can apply this to a single column of data to create a new series. So df sub Rank_Level, remember this creates a new column, equals df sub world_rank.apply. Then we're going to pass in a Lambda function. The Lambda function will just call create_category on our particular row. Let's take a look at the results, so df.head. So a pivot table allows us to pivot out one of these columns into new column headers and compare it against another column as row indices. Let's say we want to compare the rank level versus country of the university. So we want to compare it in terms of overall score. So to do this, we tell Pandas we want the values to be score, and the index to be the country, and the columns to be the rank levels. So there's three things at play here. Then we specify the aggregation function. Here we use the NumPy mean to get an average rating for universities in that country. So df. pivot_table, it's top-level function right on the DataFrame. We say the values are going to be the score, the index is going to be country, and the columns we're going to use rank level. Then we pass in our actual aggregation function which is the NumPy mean function. Let's look at the head of this. So we can see a hierarchical DataFrame, where the index or rows are by country and the columns have two levels. The top level indicating that the mean value is being used and the second level being our ranks. In this example, we only have one variable, the mean, that we're looking at. So we don't really need a hierarchical index. We notice that there are some Not a Number values. For example, the first row for Argentina. The NaN values indicate that Argentina only has observations in the other top university category. Now, pivot tables aren't limited to one function that you might want to apply. You can pass a named parameter aggfunk, which is a list of the different functions to apply, and pandas will provide you with the result using hierarchical column names. Let's try that same query, but passing the max function too. So I'm just going to do df. pivot_table, values again, we're going to call that score, index is country, and columns is Rank_Level. But now we're going to pass in aggfunc is a list and we're going to pass in two values, np.mean which is a definition of a function. So remember were not invoking the function here, but passing a reference to it and np.max. So both of these in our list, let's look at the head of that. So now we see that we have both the mean and the max. As mentioned earlier, we can also summarize the values within a given top-level group. For instance, if we wanted to see an overall average for the country with the mean and we wanted to see the max of the max, we can indicate that we want Pandas to do this by providing marginal values. So our function looks basically the same. We pass in aggfunc with our two values, but then we just add an extra parameter to the end called margins equals true. Let's take a look at what that looks like. So a pivot table is just a multi-level dataframe, and we can access series or cells in that dataframe in a similar way as a regular dataframe. So let's create a new dataframe from our previous example. So I'm just going to take the previous example and assign that to a new variable called the new_df. Then let's take a look at the index and see what that looks like. So I'll just print out the index of this dataframe, and let's take a look at the columns and just print out the column values. So we can see that the columns are hierarchical. The top-level column indices have two categories, mean and max. The lower level column indices have four categories which are the four rank levels. So how would we query this, if we wanted to get the average scores of the first top tier university levels in each country? We would just need to make to dataframe projections. The first for the mean and the second for the top tier. So for instance, we could do new_df sub mean. So this is projecting just the mean that we want. Then from that, we get another dataframe and we just project the First Top Tier University column. Let's look at the head of that. So we can see that the output here is a series object which we can confirm by printing the type. Remember that when you project a single column of values, which is what we're doing here after we make the two projections, out of the DataFrame, you get a series object. So let's just look at the type of this. We see it's a panda core. series. series. So what if we wanted to find the country that has the maximum average score on First Tier Top University level? For this, we can use the idxmax function. So again, I'll take our new df, project mean, and then project our First Tier Top University, and on the series object, we'll just call idxmax. So now, the idxmax function isn't actually a special value for pivot tables. It's a built-in function to the series object. We don't have time to go over all the pandas functions and attributes in this course. I want to really encourage you to explore the API to learn more deeply what is available to you. If you wanted to achieve a different shape of your pivot table, you can do so with the stack and unstack functions. Stacking is pivoting the lowest column index to become the innermost row index, and unstacking is just the inverse of stacking, pivoting the innermost row index to become the lowermost column index. An example will help make this a little bit more clear. So let's look at our pivot table first to refresh what it looks like. So new_df.head. That's our pivot table. So now let's try stacking, and this should move the lowermost columns. So the tiers of the university rankings to the inner most row. So we'll just create a new DataFrame and new df.stack and let's look at the head of that. So here we can see that column just transposed essentially into the rows. In the original pivot table, the rank levels are the lowermost column. After stacking, rank levels become the innermost index, appearing to the right just after the country. Now let's try unstacking this. So new_df.unstack.head. So that seems to restore our DataFrame to its original shape. So what do you think would happen if we unstack twice in a row? Let's try it, new_df.unstack.unstack.head. So we actually end up unstacking all the way to just a single column. So we get a series object being returned. This column is just a value, the meaning of which is denoted by the hierarchical index of operation, rank, and country. So that's pivot tables. This has been a pretty short description, but they're incredibly useful in dealing with numeric data, especially if you're trying to summarize the data in some form. You'll regularly be creating new pivot tables on different slices of data, whether you're exploring the data yourself or preparing that data for others to report on. Of course, you can pass any functions you want to the aggregate function, including those that you define yourself.