I got to think a bit about this topic after yet another debate about browser statistics, and how Opera may be under represented or not, and how good the statistics programs are at measuring actual use. Well, the one thing that is certain, is that the statistics collected says something about the traffic to those sites, and may give an indication about the traffic to other sites – but what sites are the numbers collected from?
I don’t know what sites are used as basis. I don’t know the location of them either – they could be spread evenly around the world, or mainly in one part of the world. Does it matter? It may.
It’s said that wile Opera doesn’t have that many users in USA, it’s popular in Europe, Russia and Japan. If the web sites that the statistics are collected from are evenly spread around the world, this doesn’t mean anything. However, if the majority of the sites are based in USA, with mainly visitors from USA, then the numbers are skewed.
I set up a small, easy case, completely at random:
There are two areas, A and B, with 100.000 users (of browsers) each.
There are 100 websites, each with 10.000 visits: 10 in area A, 90 in area B.
People visit only the sites in the area where they’re based.
There are 3 browsers: Speeder, Skimmer and Stumbler.
The percentage of users of the various browsers differ by the areas.
I set up a table:
Area: A | Area: B | Total | |
---|---|---|---|
Users | 100.000 | 100.000 | 200.000 |
Speeder | 10,0% | 1,0% | 5,5% |
Skimmer | 15,0% | 10,0% | 12,5% |
Stumbler | 75,0% | 89,0% | 82,0% |
Web sites | 10 | 90 | 100 |
Number of visits | |||
Speeder | 10.000 (10,0%) | 9.000 (1,0%) | 19.000 (1,9%) |
Skimmer | 15.000 (15,0%) | 90.000 (10,0%) | 105.000 (10,5%) |
Stumbler | 75.000 (75,0%) | 801.000 (89,0%) | 876.000 (87,6%) |
Now, we see that the statistics collected from the sites show a different percentage of users on the browsers than the actual numbers. Thus, we see it would be necessary to know more than just that number to tell what the statistics actually says.
Had the number of websites used to collect the statistics been evenly spread, with 50 in each area, the statistics would have shown the real usage.
Back in the real world, the interesting question isn’t whether Opera is under counted for some reason, or Firefox over counted, or something like that. The interesting questions are: What sites are used to collect the data for the statistics? Where are they based, and where are their visitors based? Are users from certain areas more likely to use certain browsers than visitors from other areas?
It’s a lot of questions, but necessary to see whether the figures are skewed or not.
Interesting questions. Another thing to consider is that Opera can present itself as another browser, for example, mine is impersonating IE. Correct me if I’m wrong.
Opera does present itself as IE by default, to be able to get past badly written browser checks. Admittedly not completely, as it’s possible to see that it’s Opera – but there is also another mode, where Opera can be disguised completely, as either IE or Netscape. This is rarely used though, and most statistics software does manage to figure out when Opera is used, when it’s going be the default disguise.
Still, there are other other possibilities that may cause Opera to be somewhat undercounted (like the aggressive cache) but how big role these possibilities play, I guess it’s impossible to know.
But I do wonder if the situation share some similarities as with the “Speeder” in the example over…