Just a brief note: I’ve just updated my script for graphing referrals from Google, Bing and Yahoo!. I intended it to reset each first day of a month, so one can see whether each month is “better” than the one before… however, it didn’t handle stats without any referrals at all from those search engines, which is likely to happen every time the current month changes. It’s fixed now.
Tag: AWStats
AWStats tip: removing ‘chromebar’ from search keyphrases
Being still less than a month old, it’s normal that I keep looking at this blog‘s stats quite often, and one of the most interesting bits — especially if you care about SEO — is the top search keyphrases list. I use both AWStats and Google Analytics for this site, and, concerning the former, I had been curious for a while about why the top search leading people to my blog was apparently for “chromebar“.
Now, I’m pretty sure I had never mentioned that term on this blog, so I thought AWStats might have been doing something wrong. A quick grep
on my logs, and where “chromebar” came from became obvious: it’s from the StumbleUpon add-on for Google Chrome. The whole combination was necessary; StumbleUpon users with other browsers weren’t triggering anything like this.
Now, here’s what the “referrer” part of a hit from StumbleUpon, using Chrome with the SU add-on, looks like:
"https://www.stumbleupon.com/toolbar/litebar.php?device=chromebar&version=chromebar%202.9.8.1&ts=1301274701"
While a hit’s referrer from SU with another browser looks like this:
"https://www.stumbleupon.com/refer.php?url=http%3A%2F%2Fwinterdrake.com%2Fbad-comic-panels-3-its-a-membership-card-in-a-subversive-communist-front-organization%2F"
See the difference? The Chrome version has a query string: ?device=chromebar&version=chromebar%202.9.8.1&ts=1301274701
, while the other one doesn’t. And “chromebar” appears at the beginning of that query string.
Now, StumbleUpon is listed as a “search engine” in AWStats, in a file called lib/search_engines.pm
, and that file optionally specifies which part of a query string (e.g. “q=
“) from the referrer is the actual search. For some search engines, however, no such part is specified — meaning that they don’t provide it in the “referrer” part of the hit. Such is the case with StumbleUpon, where the query string part is an empty string.
But there seems to be a bug (or maybe it’s an intended feature?) in AWStats here: if no part of a query string is specified, and yet there is a query string, AWStats seems to use the first part it catches. As you can see above, that was ?device=
, and it was always set to chromebar
.
The easiest way around this problem (I don’t know if the AWStats authors will consider this a bug or not; I’ll try to report the problem in the near future) is to edit lib/search_engines.pm
and add a non-existing query string part to StumbleUpon. Open that file with a text editor and look for this line:
'stumbleupon','',
and change it to:
'stumbleupon','qwerty=',
Presto! No more “chromebar” entries in your stats in the future. (It won’t delete current entries, though, unless you clear your AWStats cache files for that month and generate new stats.)
Graphing search engine referrals with AWStats and MRTG
From my previous post, AWStats tip: creating static pages (and why it’s a good idea):
for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input
So… anyone curious? 🙂
Note: again, beware of word wrapping below. I’ve added an empty line between any “true” line of text; if you see two or more lines together, it’s supposed to be just one.
cat /root/bin/winterdrake-se.sh
#!/usr/local/bin/bash
GOOGLE=`grep \>Google\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`
if [ -z $GOOGLE ]; then
GOOGLE=0
fi
echo $GOOGLE
BING=`grep \>Microsoft\ Bing\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`
if [ -z $BING ]; then
BING=0
fi
YAHOO=`grep \>Yahoo\!\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`
if [ -z $YAHOO ]; then
YAHOO=0
fi
echo $(($BING + $YAHOO))
uptime | awk -F"up " '{print $2}'
uname -n
EDIT in April 1, 2011: fixed the script so that it deals with non-existing entries (typically on the first day of the month).
By the way, this is a FreeBSD box; that’s the reason why bash is in /usr/local/bin
. In Linux, it’s probably in /bin
.
The mrtg.cfg file is mostly standard: it invokes the script shown above, and labels “I” as hits from Google, and “O” as hits from Bing and Yahoo! put together.
If you want to see what the result looks like, look here.
If you’ve been paying attention, yes, both values will reset to zero the first day of every month. That’s exactly how I want it: I want to see if every month it “rises” faster than the month before, or slower, or whatever. 🙂
AWStats tip: creating static pages (and why it’s a good idea)
AWStats is probably the most popular free statistics package for self-hosted sites (if we don’t count “external” ones such as Google Analytics), and, as any decent Unix sysadmin probably knows, there are several ways of configuring it.
One of them is by having the CGI accessible on the web and having it analyze the logs and generate the statistics on demand. I don’t think many people use it that way, though — not only is it the slowest method, but it could theoretically be used for DOS attacks. Yes, you could put it somewhere private (and it’s probably still a good idea to do so, no matter what method you use), either by using a non-world accessible web server, or by adding authentication. But, still, there are no real advantages to this method, other than being sure you have the absolutely most recent stats. But having the stats of, say, 5 minutes or less ago is, in most cases, more than good enough.
In my experience, most people use an intermediate method: the CGI is still accessible, but is isn’t capable of analyzing logs; it just generates the stats page. The logs themselves are analyzed by the same CGI file, but through a local crontab.
And this is what I had been using until today. Yes, much like in the case of the “cd back” trick, I had been using AWStats for years… and only today did I switch to using fully static pages. A few more of these and someday I may have to turn in my geek card. 🙂
It’s pretty easy to configure AWStats this way: Here’s my old crontab line:
*/5 * * * * /usr/local/cgi-bin/awstats.pl -config=winterdrake.com -update >/dev/null 2>&1
And here’s my new one (if it word wraps, it’s supposed to be a single line):
*/5 * * * * /usr/local/bin/awstats_buildstaticpages.pl -config=winterdrake.com -update -awstatsprog=/usr/local/cgi-bin/awstats.pl -dir=/var/www/htdocs/awstats
Before that, I had to put awstats_buildstaticpages.pl
(included in the AWStats /tools
directory) in the /usr/local/bin
directory (you may prefer it somewhere else, of course), and create the /var/www/htdocs/awstats
directory so that the static files could be put there. And now, they’re accessible on https://myserver/awstats/awstats.winterdrake.com.html
. They look exactly the same as if I accessed the CGI directly (which I can still do, in order to see yearly reports, for instance — but I do that very rarely), but let’s do a little benchmarking, shall we?
CGI version:
Requests per second: 2.33 [#/sec] (mean)
Static version:
Requests per second: 4557.69 [#/sec] (mean)
Now, you may be thinking: “yes, the speed is in a completely different order of magnitude, but I don’t look at my stats all the time, and they’re private, so nobody else does… isn’t taking half a second good enough?” Yes, that’s true… but getting rid of limits is always a good thing, because you can then do so much more. Suppose you don’t have half a dozen sites on that server, but a thousand, with statistics for all of them? Suppose you want to use the AWStats stats to generate other stats (for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input) ((here, another advantage becomes obvious: I can now do this through a trivial combination of grep
and awk
on a static HTML file.))? In both these examples (and I’m sure there are many more), having stats accessible almost instantly and taking up virtually no processing power at all is obviously a Very Good Thing™.
Other advantages: you can move your stats to a (virtual) server that only serves static files, since that what they’ll be. Alternatively, if you had CGI processing enabled just for AWStats, you now can simply turn it off on your web server, improving its security.