Spam needed. Yes, really.

(first, yes, long time no post, yada yada, I really hope to change that in the near future, but no promises, as always.)

Lovely spam, wonderful spam
Lovely spam, wonderful spam

Remember how, many many ears ago, it was recommended that you should never post your email address on a web page, as spammers crawled them and added that address to their lists? Well, I need to receive some spam on my email server (among other things, to train my filters), so here’s one address for you, dear spammers: abelha@dehumanizer.com 1. Want it as a mailto: link? Here it goes: Abelha .

AWStats tip: removing ‘chromebar’ from search keyphrases

Being still less than a month old, it’s normal that I keep looking at this blog‘s stats quite often, and one of the most interesting bits — especially if you care about SEO — is the top search keyphrases list. I use both AWStats and Google Analytics for this site, and, concerning the former, I had been curious for a while about why the top search leading people to my blog was apparently for “chromebar“.

Now, I’m pretty sure I had never mentioned that term on this blog, so I thought AWStats might have been doing something wrong. A quick grep on my logs, and where “chromebar” came from became obvious: it’s from the StumbleUpon add-on for Google Chrome. The whole combination was necessary; StumbleUpon users with other browsers weren’t triggering anything like this.

Now, here’s what the “referrer” part of a hit from StumbleUpon, using Chrome with the SU add-on, looks like:

"http://www.stumbleupon.com/toolbar/litebar.php?device=chromebar&version=chromebar%202.9.8.1&ts=1301274701"

While a hit’s referrer from SU with another browser looks like this:

"http://www.stumbleupon.com/refer.php?url=http%3A%2F%2Fwinterdrake.com%2Fbad-comic-panels-3-its-a-membership-card-in-a-subversive-communist-front-organization%2F"

See the difference? The Chrome version has a query string: ?device=chromebar&version=chromebar%202.9.8.1&ts=1301274701 , while the other one doesn’t. And “chromebar” appears at the beginning of that query string.

Now, StumbleUpon is listed as a “search engine” in AWStats, in a file called lib/search_engines.pm , and that file optionally specifies which part of a query string (e.g. “q=“) from the referrer is the actual search. For some search engines, however, no such part is specified — meaning that they don’t provide it in the “referrer” part of the hit. Such is the case with StumbleUpon, where the query string part is an empty string.

But there seems to be a bug (or maybe it’s an intended feature?) in AWStats here: if no part of a query string is specified, and yet there is a query string, AWStats seems to use the first part it catches. As you can see above, that was ?device= , and it was always set to chromebar.

The easiest way around this problem (I don’t know if the AWStats authors will consider this a bug or not; I’ll try to report the problem in the near future) is to edit lib/search_engines.pm and add a non-existing query string part to StumbleUpon. Open that file with a text editor and look for this line:

'stumbleupon','',

and change it to:

'stumbleupon','qwerty=',

Presto! No more “chromebar” entries in your stats in the future. (It won’t delete current entries, though, unless you clear your AWStats cache files for that month and generate new stats.)

Graphing search engine referrals with AWStats and MRTG

From my previous post, AWStats tip: creating static pages (and why it’s a good idea):

for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input

So… anyone curious? :)

Note: again, beware of word wrapping below. I’ve added an empty line between any “true” line of text; if you see two or more lines together, it’s supposed to be just one.

cat /root/bin/winterdrake-se.sh

 

#!/usr/local/bin/bash

GOOGLE=`grep \>Google\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $GOOGLE ]; then

GOOGLE=0

fi

echo $GOOGLE

BING=`grep \>Microsoft\ Bing\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $BING ]; then

BING=0

fi

YAHOO=`grep \>Yahoo\!\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $YAHOO ]; then

YAHOO=0

fi

echo $(($BING + $YAHOO))

uptime | awk -F"up " '{print $2}'

uname -n

EDIT in April 1, 2011: fixed the script so that it deals with non-existing entries (typically on the first day of the month).

By the way, this is a FreeBSD box; that’s the reason why bash is in /usr/local/bin. In Linux, it’s probably in /bin.

The mrtg.cfg file is mostly standard: it invokes the script shown above, and labels “I” as hits from Google, and “O” as hits from Bing and Yahoo! put together.

If you want to see what the result looks like, look here.

If you’ve been paying attention, yes, both values will reset to zero the first day of every month. That’s exactly how I want it: I want to see if every month it “rises” faster than the month before, or slower, or whatever. :)

AWStats tip: creating static pages (and why it’s a good idea)

AWStats is probably the most popular free statistics package for self-hosted sites (if we don’t count “external” ones such as Google Analytics), and, as any decent Unix sysadmin probably knows, there are several ways of configuring it.

One of them is by having the CGI accessible on the web and having it analyze the logs and generate the statistics on demand. I don’t think many people use it that way, though — not only is it the slowest method, but it could theoretically be used for DOS attacks. Yes, you could put it somewhere private (and it’s probably still a good idea to do so, no matter what method you use), either by using a non-world accessible web server, or by adding authentication. But, still, there are no real advantages to this method, other than being sure you have the absolutely most recent stats. But having the stats of, say, 5 minutes or less ago is, in most cases, more than good enough.

In my experience, most people use an intermediate method: the CGI is still accessible, but is isn’t capable of analyzing logs; it just generates the stats page. The logs themselves are analyzed by the same CGI file, but through a local crontab.

And this is what I had been using until today. Yes, much like in the case of the “cd back” trick, I had been using AWStats for years… and only today did I switch to using fully static pages. A few more of these and someday I may have to turn in my geek card. :)

It’s pretty easy to configure AWStats this way: Here’s my old crontab line:

*/5 * * * * /usr/local/cgi-bin/awstats.pl -config=winterdrake.com -update >/dev/null 2>&1

And here’s my new one (if it word wraps, it’s supposed to be a single line):

*/5 * * * * /usr/local/bin/awstats_buildstaticpages.pl -config=winterdrake.com -update -awstatsprog=/usr/local/cgi-bin/awstats.pl -dir=/var/www/htdocs/awstats

Before that, I had to put awstats_buildstaticpages.pl (included in the AWStats /tools directory) in the /usr/local/bin directory (you may prefer it somewhere else, of course), and create the /var/www/htdocs/awstats directory so that the static files could be put there. And now, they’re accessible on http://myserver/awstats/awstats.winterdrake.com.html . They look exactly the same as if I accessed the CGI directly (which I can still do, in order to see yearly reports, for instance — but I do that very rarely), but let’s do a little benchmarking, shall we?

CGI version:
Requests per second: 2.33 [#/sec] (mean)

Static version:
Requests per second: 4557.69 [#/sec] (mean)

Now, you may be thinking: “yes, the speed is in a completely different order of magnitude, but I don’t look at my stats all the time, and they’re private, so nobody else does… isn’t taking half a second good enough?” Yes, that’s true… but getting rid of limits is always a good thing, because you can then do so much more. Suppose you don’t have half a dozen sites on that server, but a thousand, with statistics for all of them? Suppose you want to use the AWStats stats to generate other stats (for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input) ((here, another advantage becomes obvious: I can now do this through a trivial combination of grep and awk on a static HTML file.))? In both these examples (and I’m sure there are many more), having stats accessible almost instantly and taking up virtually no processing power at all is obviously a Very Good Thing™.

Other advantages: you can move your stats to a (virtual) server that only serves static files, since that what they’ll be. Alternatively, if you had CGI processing enabled just for AWStats, you now can simply turn it off on your web server, improving its security.

Unix/Linux trick: ‘cd’ back to the previous directory

You know when you’re in a very “deep” directory, such as
/usr/local/src/this/that/thatother, and you type “cd” and press enter by mistake ((happens a lot when you think there’s only one directory where you are, so you type "cd <tab><enter>“, but there was in fact more than one, and the tab key didn’t add anything)) and go back to your home directory, and would love to go back to where you were before, without having to type all the path again (or copy and paste it, or do a CTRL-R back search)?

Just type “cd -” (that’s a single dash, or “minus” sign). Believe it or not, I didn’t know that until today, and I’ve used Linux since 1994 or so. Slightly embarrassing, I know. :)

If you enter the command a second time, you will return to where you were before typing the first “cd -“. In other words, the command can be used to toggle between the previous directory and the current one.

It’s also not just a bash thing; I’ve tried it on FreeBSD’s sh and OpenBSD’s default ksh, and it works there as well.

(Found here, after someone asked me and I didn’t know the answer.)

P.S: – Welcome to the first technical post on Winterdrake. (Hey, it’s geeky stuff, too!)