Recently, a colleague of mine asked me how one might go about taking a dataset that is in wide format and converting it into long format so that you could then perform some groupby operations by quarter.

Here’s a quick example to illustrate one way to go about this using the Pandas melt function.

Getting Started



Let’s import the Pandas package

import pandas as pd

Load Some Data



First, we’ll create a fake dataframe that contains the name of a state and city along with some data for each month in the year 2000. For simplicity, imagine that the data are the number of Canadians spotted eating poutine.

df = pd.DataFrame([['NY', 'New York', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 
                   ['MI', 'Ann Arbor', 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
                   ['OR', 'Portland', 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
                  ],
                  columns=['state', 'city', 
                           '2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06',
                           '2000-07', '2000-08', '2000-09', '2000-10', '2000-11', '2000-12',
                           '2001-01'
                          ])
df



state city 2000-01 2000-02 2000-03 2000-04 2000-05 2000-06 2000-07 2000-08 2000-09 2000-10 2000-11 2000-12 2001-01
0 NY New York 1 2 3 4 5 6 7 8 9 10 11 12 13
1 MI Ann Arbor 5 6 7 8 9 10 11 12 13 14 15 16 17
2 OR Portland 9 10 11 12 13 14 15 16 17 18 19 20 21



The Goal



The problem was to take this dataset and try to find the average number of Canadians eating poutine in each city and state for each quarter in the year 2000.

First, we need to convert the original dataframe into long format and then make sure that the resulting months are understood to be a datetime column.

long_df = pd.melt(df, id_vars=['state', 'city'], var_name='month', value_name='number of poutine eaters')
long_df['month'] = pd.to_datetime(long_df['month'])
long_df



state city month number of poutine eaters
0 NY New York 2000-01-01 1
1 MI Ann Arbor 2000-01-01 5
2 OR Portland 2000-01-01 9
3 NY New York 2000-02-01 2
4 MI Ann Arbor 2000-02-01 6
5 OR Portland 2000-02-01 10
6 NY New York 2000-03-01 3
7 MI Ann Arbor 2000-03-01 7
8 OR Portland 2000-03-01 11
9 NY New York 2000-04-01 4
10 MI Ann Arbor 2000-04-01 8
11 OR Portland 2000-04-01 12
12 NY New York 2000-05-01 5
13 MI Ann Arbor 2000-05-01 9
14 OR Portland 2000-05-01 13
15 NY New York 2000-06-01 6
16 MI Ann Arbor 2000-06-01 10
17 OR Portland 2000-06-01 14
18 NY New York 2000-07-01 7
19 MI Ann Arbor 2000-07-01 11
20 OR Portland 2000-07-01 15
21 NY New York 2000-08-01 8
22 MI Ann Arbor 2000-08-01 12
23 OR Portland 2000-08-01 16
24 NY New York 2000-09-01 9
25 MI Ann Arbor 2000-09-01 13
26 OR Portland 2000-09-01 17
27 NY New York 2000-10-01 10
28 MI Ann Arbor 2000-10-01 14
29 OR Portland 2000-10-01 18
30 NY New York 2000-11-01 11
31 MI Ann Arbor 2000-11-01 15
32 OR Portland 2000-11-01 19
33 NY New York 2000-12-01 12
34 MI Ann Arbor 2000-12-01 16
35 OR Portland 2000-12-01 20
36 NY New York 2001-01-01 13
37 MI Ann Arbor 2001-01-01 17
38 OR Portland 2001-01-01 21



The First Attempt



Let’s see what happens when we simply groupby the city and state:

long_df.groupby(['state', 'city']).mean()



number of poutine eaters
state city
MI Ann Arbor 11
NY New York 7
OR Portland 15



But this only gives you the annual average by city and state. Instead, we want to be able to split the months out by quarter. Now, you can do some special tricks by mapping each month into a quarter (i.e., 01, 02, and 03 are mapped to Q1 while 10, 11, and 12 are mapped to Q4) and then using that new column to groupby. However, we can make use of the fantastic built in PeriodIndex function to do all of the work for us.

The Final Step



long_df.groupby(['state', 'city', pd.PeriodIndex(long_df.month, freq='Q')]).mean()



number of poutine eaters
state city month
MI Ann Arbor 2000Q1 6
2000Q2 9
2000Q3 12
2000Q4 15
2001Q1 17
NY New York 2000Q1 2
2000Q2 5
2000Q3 8
2000Q4 11
2001Q1 13
OR Portland 2000Q1 10
2000Q2 13
2000Q3 16
2000Q4 19
2001Q1 21



That’s it! Pretty Wicked, right? Let me know what you think in the comments below.


Published

Dec 27, 2017