New ‘disposable email’ service: Mail60

Mail60 is a ‘disposable email service’; perfect for receiving confirmation emails from places you don’t trust not to spam you in the future. Mail60 mailboxes are automatically erased after 60 minutes, so you can simply create one, use that email address somewhere, receive the email(s) you’re expecting, and then simply forget about the mailbox. For more detail, see the FAQ.

The idea for it came from reading the comments, about a week ago, in PZ Myers’ wonderful blog Pharyngula, where people were talking about an internet poll they wanted to vote on, but the site required registration, and it was a right-wing paranoia site, so it wasn’t a place they really wanted to be members of. One commenter, then, suggested using a “disposable email service” such as Mailinator. That was the first time I heard of those. I found the idea intriguing, and thought about how I could implement such a thing. It looked doable, so I started programming it in my free time, and Mail60 is the result.

I intentionally wanted to keep this simple, so I didn’t go for features such as “create a mailbox automatically when receiving mail on a non-existent address” or “forward email to a real mailbox for X days and then stop”. Also, since mailboxes are so ephemeral, features such as filters, address books or folders don’t really make sense. And, of course, the only way to allow instant creation of mailboxes with no verification whatsoever and yet prevent abuse was to disallow email sending. But for the most common use I foresee — receiving confirmation emails –, that’s not a problem.

For the techies out there, I’m using PHP, MySQL, Postfix, DBMail, and Hastymail for accessing mailboxes. The (virtual) server runs FreeBSD.

P.S. – yes, this is the “new project” I mentioned a few days ago. 🙂

Upgraded to Ubuntu Natty, and nginx troubles

Just upgraded my home server (where most of my sites are, though this blog is not among them) from Ubuntu Maverick (10.10) to Natty (11.04). The upgrade itself went without any trouble ((note that this server started out with Jaunty, which means that it has been successfully upgraded five times now. Try doing that in Windows… :))), but, after the new OS version came up, most of my sites were down; they just showed blank pages. And, oddly, there was nothing in any logs. I’m using nginx and php-fpm.

After pulling my hair for a while, I noticed that:

  • the problems were restricted to PHP pages, and…
  • … those didn’t work on any of my sites except for the default one (www.dehumanizer.com).

A little googling, and this post came up, the author of which had the same problem, and was able to spot the solution. In the /etc/nginx/fastcgi_params file, the upgrade had added (silently, since I had never modified that file) the following line:

fastcgi_param       SCRIPT_FILENAME         $document_root$fastcgi_script_name;

Commenting it out solved everything. My guess is that, since the file in question is included only in my default web site configuration, the $document_root variable never changes, so all my sites (except the default) were pointing the wrong way. Anyway, that line is apparently unnecessary, though I’ll see if including the /etc/nginx/fastcgi_params file in every virtual host (maybe you are supposed to do that), and uncommenting back that line, also works.

AWStats tip: removing ‘chromebar’ from search keyphrases

Being still less than a month old, it’s normal that I keep looking at this blog‘s stats quite often, and one of the most interesting bits — especially if you care about SEO — is the top search keyphrases list. I use both AWStats and Google Analytics for this site, and, concerning the former, I had been curious for a while about why the top search leading people to my blog was apparently for “chromebar“.

Now, I’m pretty sure I had never mentioned that term on this blog, so I thought AWStats might have been doing something wrong. A quick grep on my logs, and where “chromebar” came from became obvious: it’s from the StumbleUpon add-on for Google Chrome. The whole combination was necessary; StumbleUpon users with other browsers weren’t triggering anything like this.

Now, here’s what the “referrer” part of a hit from StumbleUpon, using Chrome with the SU add-on, looks like:

"https://www.stumbleupon.com/toolbar/litebar.php?device=chromebar&version=chromebar%202.9.8.1&ts=1301274701"

While a hit’s referrer from SU with another browser looks like this:

"https://www.stumbleupon.com/refer.php?url=http%3A%2F%2Fwinterdrake.com%2Fbad-comic-panels-3-its-a-membership-card-in-a-subversive-communist-front-organization%2F"

See the difference? The Chrome version has a query string: ?device=chromebar&version=chromebar%202.9.8.1&ts=1301274701 , while the other one doesn’t. And “chromebar” appears at the beginning of that query string.

Now, StumbleUpon is listed as a “search engine” in AWStats, in a file called lib/search_engines.pm , and that file optionally specifies which part of a query string (e.g. “q=“) from the referrer is the actual search. For some search engines, however, no such part is specified — meaning that they don’t provide it in the “referrer” part of the hit. Such is the case with StumbleUpon, where the query string part is an empty string.

But there seems to be a bug (or maybe it’s an intended feature?) in AWStats here: if no part of a query string is specified, and yet there is a query string, AWStats seems to use the first part it catches. As you can see above, that was ?device= , and it was always set to chromebar.

The easiest way around this problem (I don’t know if the AWStats authors will consider this a bug or not; I’ll try to report the problem in the near future) is to edit lib/search_engines.pm and add a non-existing query string part to StumbleUpon. Open that file with a text editor and look for this line:

'stumbleupon','',

and change it to:

'stumbleupon','qwerty=',

Presto! No more “chromebar” entries in your stats in the future. (It won’t delete current entries, though, unless you clear your AWStats cache files for that month and generate new stats.)

Ah, spammers, spammers…

I’m a bit torn. I’ve been looking at the pathetic attempts at automated spam in this blog, and my “I like an intellectual / technical challenge” side just wants to write a post detailing what they’re doing wrong and how they could easily make their tools much more effective.

On the other hand… we’re talking about spammers. The scum of the earth. The only creatures in the world that make lawyers and politicians look like decent, lovable human beings. Now, some may argue that what I could suggest in a couple of minutes wouldn’t be exactly rocket science, that they’d surely have thought of it already… but my point is that they haven’t. Both their methods and the comments themselves that they try to post are terrible and ridiculously easy to detect; anyone with half a brain could do a lot better.

In fact, if they’d just… ah, crap. I just can’t. It’d be like handing a loaded gun to a kid. The biggest asshole-ish jerk of a kid, and a brain-damaged, glue-eating one as well, but still, in some ways, a kid. Who could do a lot of damage with it. But, to get some idea, my “brilliant” suggestion for making comment spam tools much more effective — which would not only hugely improve the odds of beating Akismet, but would also be accepted by many less attentive blog owners — would take… a line and a half of text. This is not because I’m some sort of genius… it’s because they’re morons.

Interesting, email spams, while still typically primitive, are actually much more advanced than comment spam tools. There’s actually a bit of thought put into them. Comment spam tools, on the other hand, are the equivalent of a burglar wearing a Beagle Boys-like mask and a stereotypical black and white striped prison suit, and going in broad daylight from house to house, knocking on each door. 🙂

Graphing search engine referrals with AWStats and MRTG

From my previous post, AWStats tip: creating static pages (and why it’s a good idea):

for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input

So… anyone curious? 🙂

Note: again, beware of word wrapping below. I’ve added an empty line between any “true” line of text; if you see two or more lines together, it’s supposed to be just one.

cat /root/bin/winterdrake-se.sh

 

#!/usr/local/bin/bash

GOOGLE=`grep \>Google\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $GOOGLE ]; then

GOOGLE=0

fi

echo $GOOGLE

BING=`grep \>Microsoft\ Bing\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $BING ]; then

BING=0

fi

YAHOO=`grep \>Yahoo\!\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $YAHOO ]; then

YAHOO=0

fi

echo $(($BING + $YAHOO))

uptime | awk -F"up " '{print $2}'

uname -n

EDIT in April 1, 2011: fixed the script so that it deals with non-existing entries (typically on the first day of the month).

By the way, this is a FreeBSD box; that’s the reason why bash is in /usr/local/bin. In Linux, it’s probably in /bin.

The mrtg.cfg file is mostly standard: it invokes the script shown above, and labels “I” as hits from Google, and “O” as hits from Bing and Yahoo! put together.

If you want to see what the result looks like, look here.

If you’ve been paying attention, yes, both values will reset to zero the first day of every month. That’s exactly how I want it: I want to see if every month it “rises” faster than the month before, or slower, or whatever. 🙂

AWStats tip: creating static pages (and why it’s a good idea)

AWStats is probably the most popular free statistics package for self-hosted sites (if we don’t count “external” ones such as Google Analytics), and, as any decent Unix sysadmin probably knows, there are several ways of configuring it.

One of them is by having the CGI accessible on the web and having it analyze the logs and generate the statistics on demand. I don’t think many people use it that way, though — not only is it the slowest method, but it could theoretically be used for DOS attacks. Yes, you could put it somewhere private (and it’s probably still a good idea to do so, no matter what method you use), either by using a non-world accessible web server, or by adding authentication. But, still, there are no real advantages to this method, other than being sure you have the absolutely most recent stats. But having the stats of, say, 5 minutes or less ago is, in most cases, more than good enough.

In my experience, most people use an intermediate method: the CGI is still accessible, but is isn’t capable of analyzing logs; it just generates the stats page. The logs themselves are analyzed by the same CGI file, but through a local crontab.

And this is what I had been using until today. Yes, much like in the case of the “cd back” trick, I had been using AWStats for years… and only today did I switch to using fully static pages. A few more of these and someday I may have to turn in my geek card. 🙂

It’s pretty easy to configure AWStats this way: Here’s my old crontab line:

*/5 * * * * /usr/local/cgi-bin/awstats.pl -config=winterdrake.com -update >/dev/null 2>&1

And here’s my new one (if it word wraps, it’s supposed to be a single line):

*/5 * * * * /usr/local/bin/awstats_buildstaticpages.pl -config=winterdrake.com -update -awstatsprog=/usr/local/cgi-bin/awstats.pl -dir=/var/www/htdocs/awstats

Before that, I had to put awstats_buildstaticpages.pl (included in the AWStats /tools directory) in the /usr/local/bin directory (you may prefer it somewhere else, of course), and create the /var/www/htdocs/awstats directory so that the static files could be put there. And now, they’re accessible on https://myserver/awstats/awstats.winterdrake.com.html . They look exactly the same as if I accessed the CGI directly (which I can still do, in order to see yearly reports, for instance — but I do that very rarely), but let’s do a little benchmarking, shall we?

CGI version:
Requests per second: 2.33 [#/sec] (mean)

Static version:
Requests per second: 4557.69 [#/sec] (mean)

Now, you may be thinking: “yes, the speed is in a completely different order of magnitude, but I don’t look at my stats all the time, and they’re private, so nobody else does… isn’t taking half a second good enough?” Yes, that’s true… but getting rid of limits is always a good thing, because you can then do so much more. Suppose you don’t have half a dozen sites on that server, but a thousand, with statistics for all of them? Suppose you want to use the AWStats stats to generate other stats (for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input) ((here, another advantage becomes obvious: I can now do this through a trivial combination of grep and awk on a static HTML file.))? In both these examples (and I’m sure there are many more), having stats accessible almost instantly and taking up virtually no processing power at all is obviously a Very Good Thing™.

Other advantages: you can move your stats to a (virtual) server that only serves static files, since that what they’ll be. Alternatively, if you had CGI processing enabled just for AWStats, you now can simply turn it off on your web server, improving its security.

Unix/Linux trick: ‘cd’ back to the previous directory

You know when you’re in a very “deep” directory, such as
/usr/local/src/this/that/thatother, and you type “cd” and press enter by mistake ((happens a lot when you think there’s only one directory where you are, so you type "cd <tab><enter>“, but there was in fact more than one, and the tab key didn’t add anything)) and go back to your home directory, and would love to go back to where you were before, without having to type all the path again (or copy and paste it, or do a CTRL-R back search)?

Just type “cd -” (that’s a single dash, or “minus” sign). Believe it or not, I didn’t know that until today, and I’ve used Linux since 1994 or so. Slightly embarrassing, I know. 🙂

If you enter the command a second time, you will return to where you were before typing the first “cd -“. In other words, the command can be used to toggle between the previous directory and the current one.

It’s also not just a bash thing; I’ve tried it on FreeBSD’s sh and OpenBSD’s default ksh, and it works there as well.

(Found here, after someone asked me and I didn’t know the answer.)

P.S: – Welcome to the first technical post on Winterdrake. (Hey, it’s geeky stuff, too!)