Introduction
------------

From September 2009 to August 2012, I used as my everyday web search engine an
app that merges results from several search engines. This page shows some stats
extracted from my tracked click data, in order to help answer the question that
kind of sparked the whole project, "which engine gives the best results?"

For some background see here:

    https://saintamh.org/code/search.shtml.en


Select a period to analyze
--------------------------

Displaying all data since we started recording.

Select a different time period:
 * last year
 * last 3 months
 * last month
 * last week
 * everything


Clickthrough rates
------------------

  With engines hidden and fair blurb selection:
    google:       8458/11181 (75.65%) [4.46]
    duckduckgo:   1043/1600  (65.19%) [3.47]
    bing:         6869/11181 (61.43%) [4.08]
    yahoo:        3572/6528  (54.72%) [4.67]
    yandex:       2865/6648  (43.10%) [3.46]
    exalead:       834/3442  (24.23%) [4.02]

Each line shows, for each engine, the ratio of the number of times they were
invoked to the number of times one of their results was clicked.

If an engine always returned off-topic results, I would never click on their
results, and so their clickthrough rate would be 0. If an engine always
returned results that seemed so relevant that every result I ever clicked on
had been returned by them (along with possibly other engines), then that
engine's clickthrough rate would be 100%.

The number in square brackets is the average rank, within the engine's result
page, of the results that I clicked on. So smaller values mean I tend to click
on results that are higher in that engine's results, higher values mean I tend
to dig deeper in their results.

As the headline suggests, these numbers exclude the times that I opted to see
which engine had contributed which results, as my opinion about engines might
skew my choices (I rarely select to see the engine names, except when debugging
this app itself).

Some engines were invoked fewer times because after a while I stopped using
them.


User votes
----------

Since 2011-05-20, every search result has two links next to it that allow the
user (that is, me) to flag results that stand out as signficantly better or
worse than the rest of the lot.

Here are the tallies. The numbers next to each engine indicate the number of
upvotes ("+") and downvotes ("-") the engine has received. The percentages in
brackets indicate how many votes that engine receives on average every time it
is invoked:

  google:   +454 (10.34%)  -142 ( 3.23%)
  bing:     +274 ( 6.24%)  -215 ( 4.90%)
  yahoo:     +18 ( 3.10%)   -34 ( 5.86%)
  yandex:   +182 ( 4.11%)  -203 ( 4.59%)
  exalead:   +71 ( 2.60%)  -143 ( 5.23%)

An alternative way of scoring these votes is to give more weight to results
that are ranked higher by the engines. This seems intuitive: if for instance I
downvote a link that was returned by two engines, and the downvoted link
occupied the top position in engine A's results, but only the 30th position in
engine B's results, then it sounds reasonable that more blame should go to
engine A than to engine B: it's not so bad to return irrelevant links in 30th
position as it is in the top position.

In the table that follows, each value is the sum, for all votes on links
returned by that engine, of the inverse of the rank that the engine gave to the
link. So for instance in the previous example, the downvote would add a full
negative point to the tally for engine A, but only 1/30th of a point for
engine B.

   google:   +266.9057   -29.9074
     bing:   +154.1656   -84.9945
   yandex:   +105.1405   -88.6634
    yahoo:    +10.0413    -9.4730
  exalead:    +37.3939   -69.5285


Engine correlation matrix
-------------------------

             |               ask |                bing |         duckduckgo |            exalead |              google |               yahoo |              yandex
         ask |                 - |   [204/350]  58.29% |      [1/1] 100.00% |      [0/0]   0.00% |   [294/350]  84.00% |   [185/350]  52.86% |       [0/0]   0.00%
        bing | [204/532]  38.35% |                   - | [945/1131]  83.55% | [472/1858]  25.40% | [5754/7440]  77.34% | [3010/4877]  61.72% | [1910/3809]  50.14%
  duckduckgo |     [1/6]  16.67% |  [945/1053]  89.74% |                  - |      [0/0]   0.00% |  [778/1053]  73.88% |  [729/1053]  69.23% |     [25/60]  41.67%
     exalead |     [0/0]   0.00% |   [472/839]  56.26% |      [0/0]   0.00% |                  - |   [568/839]  67.70% |       [0/0]   0.00% |   [378/839]  45.05%
      google | [294/677]  43.43% | [5754/9186]  62.64% | [778/1199]  64.89% | [568/2719]  20.89% |                   - | [3107/5539]  56.09% | [2237/5181]  43.18%
       yahoo | [185/537]  34.45% | [3010/4125]  72.97% | [729/1001]  72.83% |      [0/0]   0.00% | [3107/4125]  75.32% |                   - |  [645/1058]  60.96%
      yandex |     [0/0]   0.00% | [1910/2884]  66.23% |    [25/38]  65.79% | [378/1217]  31.06% | [2237/2884]  77.57% |  [645/1073]  60.11% |                   -

The value in the cell at the intersection of row R and column C answers the
question "among all search results returned by search engine R that were ever
clicked, what percentage had also been returned by engine C?".

This means that if a cell has value 100%, the results returned by the engine in
that row are a strict subset of the results returned by the engine in that
column (assuming that this is independent of whether I've clicked or not on the
results).


Conclusion
----------

The results have been pretty consistent over time: Google emerges as the clear
winner no matter how you measure it. Their results are more relevant and
complete than any other engine's.

The engines complement each other quite well, however -- Bing often has
relevant results that Google doesn't have, for instance. And 1 out of 4 results
that I click didn't appear in Google at all.

Another interesting finding is that the quality of Yandex and Google results is
slowly but surely improving all the time, while other engines seem more stable.


Last recorded click: Fri Feb  1 11:30:39 2013 UTC