I got to think a bit about this topic after yet another debate about browser statistics, and how Opera may be under represented or not, and how good the statistics programs are at measuring actual use. Well, the one thing that is certain, is that the statistics collected says something about the traffic to those sites, and may give an indication about the traffic to other sites – but what sites are the numbers collected from?
I don’t know what sites are used as basis. I don’t know the location of them either – they could be spread evenly around the world, or mainly in one part of the world. Does it matter? It may.
It’s said that wile Opera doesn’t have that many users in USA, it’s popular in Europe, Russia and Japan. If the web sites that the statistics are collected from are evenly spread around the world, this doesn’t mean anything. However, if the majority of the sites are based in USA, with mainly visitors from USA, then the numbers are skewed.
I set up a small, easy case, completely at random:
There are two areas, A and B, with 100.000 users (of browsers) each.
There are 100 websites, each with 10.000 visits: 10 in area A, 90 in area B.
People visit only the sites in the area where they’re based.
There are 3 browsers: Speeder, Skimmer and Stumbler.
The percentage of users of the various browsers differ by the areas.
I set up a table:
|
Area: A |
Area: B |
Total |
Users |
100.000 |
100.000 |
200.000 |
Speeder |
10,0% |
1,0% |
5,5% |
Skimmer |
15,0% |
10,0% |
12,5% |
Stumbler |
75,0% |
89,0% |
82,0% |
Web sites |
10 |
90 |
100 |
Number of visits |
Speeder |
10.000 (10,0%) |
9.000 (1,0%) |
19.000 (1,9%) |
Skimmer |
15.000 (15,0%) |
90.000 (10,0%) |
105.000 (10,5%) |
Stumbler |
75.000 (75,0%) |
801.000 (89,0%) |
876.000 (87,6%) |
Now, we see that the statistics collected from the sites show a different percentage of users on the browsers than the actual numbers. Thus, we see it would be necessary to know more than just that number to tell what the statistics actually says.
Had the number of websites used to collect the statistics been evenly spread, with 50 in each area, the statistics would have shown the real usage.
Back in the real world, the interesting question isn’t whether Opera is under counted for some reason, or Firefox over counted, or something like that. The interesting questions are: What sites are used to collect the data for the statistics? Where are they based, and where are their visitors based? Are users from certain areas more likely to use certain browsers than visitors from other areas?
It’s a lot of questions, but necessary to see whether the figures are skewed or not.