Since I've been mostly living abroad for the past five years, I've had to deal with airline tickets somewhat regularly. In recent times I've been especially interested in the Montreal-Frankfurt line. In general, when buying things, and especially when the said things can easily cost upwards of a thousand dollars, I like being a critical and meticulous shopper. So far however the black magic of airline ticket prices has completely eluded me. This is an attempt at analyzing these prices over a longer period of time, to see if any logic can be squeezed out of their fluctuations.

Every day since December 3rd, 2007, a script of mine has been visiting aircanada.ca and fiddling with the flight search form to build a database of ticket prices. Because the database grew very quickly in size I had to limit myself to Montreal-Frankfurt return tickets sold by Air Canada. Every day I query the site for every possible combination of departure and return dates, and record the offered prices.

Please note that this is a work in progress. Little time has been devoted to the analysis of the data so far.

You can read on for a quick analysis of the results so far, or go on and build your own graph out of the data already accumulated.

Perhaps the most telling graph so far is the one that maps departure date to average ticket price:

It quite clearly shows the two spikes of Christmas and New Year's, the smaller spike of march holidays, and the very clear-cut summer vacation plateau in the middle.

Note that data gets less reliable as we near the sides of the graph, because it is averaged over less days. For instance, since we started recording 3 weeks before Christmas, the only data we have for travel on that date are those that were available during these three weeks. These tickets would have been much cheaper (I expect) in the months before December, but we don't have these numbers.

On the other hand, the price of June 1st tickets, for instance, has been sampled every day since recording started in December. This means that the indicated value is averaged over all the offers we've seen for that ticket since then. This is more interesting since it allows us to more reliably ignore factors not shown in this graph.

Note also how data seems to get erratic towards the right of the graph. I expect that curve to flatten out as data collection progresses forward.

***

Next we have the graph of average ticket price as a function of how long in advance the ticket is bought:

Clearly the price goes up in the last month before departure, rising sharply in the last two weeks. The optimal time for buying tickets, if all other parameters are ignored, thus seems to be about 33 days in advance. This is interesting as I thought it was more around 3 months. But maybe it's the data that's misinterpreted here, I haven't thought about it much yet.

Trip length doesn't seem to influence the price much, as long as it's not shorter than a week:

Note that since both departure and return dates can only be set to less than a year in the future when performing a search I have much more data for shorter trips than for long trips. For each date of purchase, we have around 360 possible tickets for 1-day trips (one departing each day in the coming year) but only one 360-day trip (leaving tomorrow). This means that in our DB trip length is somehow correlated to how long in advance we buy the ticket, which probably skews the data somehow.

Then we have the clear indication that it's cheaper to leave on a Tuesday and return on a Monday or Tuesday:

Here completely white cells are $761 tickets, and completely black corresponds to $981. Apparently you don't want to leave on a Friday and come back on a Saturday.

Finally there's this graph, perhaps the most interesting but also perhaps the most surprising:

Here the X axis indicates the purchase date, and the Y axis how many days before departure the ticket was bought. The darkness of each pixel indicates the average price for that combination.

The two dark stripes at the bottom-left correspond to the two huge spikes of the first graph on this page. The two, fainter streaks of black towards the center-bottom also correspond to the two smaller peaks in March, and the thick gray band that goes down across the whole graph would match the summer plateau seen on the first graph. The white vertical stripes are simply missing data.

What strikes me though is how some exceptions aside, this graph seems to show that price depends very much on when you're travelling, and is relatively independent of how far in advance you buy the ticket, as long as you don't buy it a full year in advance (darker patches along the top edge of the graph). I guess this was already shown by the 2nd graph at the top of this page.

***

A more thorough analysis will be conducted when I have more time and more data. In the meantime do go and analyze the data yourself, see if you can find the trend I haven't spotted.

*May 2008*