Search engine hits: Google vs. Bing (part 2: positions)

In my previous post, Search engine hits: Google vs. Bing (part 1), I showed how three of my sites — a recent US-hosted blog in English, a popular Portugal-hosted forum in Portuguese, and a search engine-weak US-hosted aggregator in English — had a much greater proportion of search engine hits from Google than from Bing — about 100 Google hits for each 2-3 Bing hits. Naturally, the next step would be to investigate whether this is just a matter of quantity — in other words, Bing has 2-3% of the users Google has –, or whether there are other factors affecting this, such as the three sites having better rankings in Google than in Bing. This, then, is the subject of this post.

The method I’ll be using is the following: for each of the same three sites used in the last post, I’ll be searching for a number of phrases that I think should, reasonably, lead a user to the site. I won’t look at my stats to get the top phrases from there; as most hits come from Google, I’d just be using phrases that I already know work for that engine. And, for each phrase, I’ll note the position my site came in — it at all.

Remember that a lower number is better. Also, a caveat: I am not really testing how “good” each search engine is, only how good it is at sending people to my sites from appropriate queries. I have, however, avoided “gaming the system”; the queries are reasonable and not “custom-designed” to lead people to where I want, and the pages / sites are relevant to them.

Note: all queries were as written below; i.e. without quotes.

Results

Winterdrake:

For this very blog, I searched for things I know I posted about — but not the exact titles. Instead, I tried strings that I myself would use if searching for those subjects:

[table id=1 /]

DragonBall-PT:

For that MyBB forum with mostly Brazilian and Portuguese users, I didn’t go for particular posts, just some strings that a Portuguese-speaking person could possibly use, and I counted the first forum post that appeared. I tried to diversify; hence the “dragonball”, “dragon ball” and “dbz”. I also tried both Google.com and Google.pt, which seem to prioritize pages differently (perhaps because that site is hosted in Portugal); Bing.pt, however, was exactly the same as Bing.com in terms of results, only with the user interface in Portuguese.

[table id=2 /]

Planet Atheism:

And, finally, my FeedWordPress-based blog aggregator. As I explained in the previous post, PA doesn’t ever show posts individually; there are only the main page (and, page 2, page 3, etc.), author pages, and archives by date. On all of these, the post links go to the original blog posts; therefore, search engines treat all of PA as non-original content (and this is by design); because of that, searching for something in a particular post is mostly useless. My alternative was to search for more general terms:

[table id=3 /]

(1) category pages that included the post were indexed, but not the post itself
(2) not in the first 100 results, at least

Conclusions

For Winterdrake, Bing was a pleasant surprise, winning 5.5-0.5. Not only did it apparently index every post in the site, even some very recent ones, but it also returned a first page result for everything I threw at it — including two “number ones” that Google doesn’t even seem to currently index, for some reason. Therefore, a Bing user actually has a greater chance of getting to Winterdrake — assuming he or she is interested in something on it — than a Google user. In other words, Bing wins on “quality” — but still only sends me 3% of the users Google does. Are there really so few people using Bing? I had figured something like a 10-20% market share, not 2-3%…

As for the few non-indexed posts on Google, when the category pages have been indexed after the posts were written, I assume the reason for that is the fact that Winterdrake is still very new, and has almost no incoming links. In a couple of months, I expect that new posts will be quickly indexed by Google, as Bing already does. Still, there’s currently an advantage for Bing here.

DragonBall-PT gets mixed results. Compared to Google.com, Bing wins 3-2. Compared to Google.pt, however, Bing loses 1.5-3.5, meaning that “which” of the Googles people use is an important factor. Still, if we assume that most people use “their” national Google (I don’t, but I’m weird…), Google wins this time, and that was reflected in the results on the last post: Bing sent me only 2% of the people Google (all of them put together) did.

Finally, for Planet Atheism, Bing is a big disappointment, losing 0-3 to Google. πŸ™ Either because it “hates” non-original content to a much greater degree that Google does, or because it is less “intelligent” in adapting to / understanding similar terms, it failed to show PA in the first 100 results for two thirds of the search strings I used, while Google did great with two of them, and even had a reasonably decent result for an incredibly competitive, 1-word term such as “atheism”.

Another thing I noticed is that most of the first couple of results pages in Bing for my atheism-related queries actually showed anti-atheism blogs (“atheism sucks”, “stop atheism”, and several others). That’s like searching for “democratic party” and getting mostly Republican propaganda pages. πŸ™‚ I don’t want to sound paranoid… it’s probably just an algorithmic thing, instead of there being a less-than-honest religious programmer at Microsoft… πŸ˜‰

EDIT: I’ve just noticed that the three tables above look quite bad on the RSS feed. If you want them more readable, please see the original blog post. I apologize for the inconvenience.

Search engine hits: Google vs. Bing (part 1)

Having been interested in SEO for a long time, one of the things I naturally do is look at how many people are coming from the several search engines. While many people seem to care only about hits from Google, I think that 1) monopolies are bad, and 2) precisely because so many people optimize only for Google, there may be an “untapped market” of people who use other search engines. So, I’ve been doing a little investigating, and here’s what I found. Today, I will only be comparing Google with Microsoft’s Bing; I may look at Yahoo! or Ask.com some other time.

So, let’s begin with this very blog, Winterdrake. Hosted in the US, it’s a very young site, having only launched on March 3, 2011… which means that it’s just 22 days old today. Naturally, one shouldn’t expect a lot of search engine hits on such a recent site (and there aren’t), but it allows us to look at yet another interesting factor: which search engine is quicker to index and send hits to new sites? Let’s look at a (terrible-looking; I’m no graphic designer, and it shows) chart:

Winterdrake - Google vs. Bing hits, March 2011
Winterdrake - Google vs. Bing hits, March 2011
Bing hits: 3.36% of Google hits

Not very good in terms of Bing hits, is it? Let’s look at another, more popular and established site: DragonBall-PT, a Portuguese-language forum, hosted in Portugal, launched in 2007, with mostly Brazilian and Portuguese users, which has a few thousand hits per day:

DragonBall-PT - Google vs. Bing hits, March 2011
DragonBall-PT - Google vs. Bing hits, March 2011
Bing hits: 2.12% of Google hits

As you can see, while, as an absolute number, there are a lot more hits from Bing (a little below 2000), the proportion in relation to Google is even worse.

This can be explained in several ways. First, Bing seems to be more optimized for the US and/or English language sites, while Google appears to be more “international”. Another possibility, of course, is that almost no one in Portugal or Brazil uses Bing. But, to discount the “optimized for the US/English” factor, let’s look at a third site, one that is 1) relatively old, and 2) in English, and hosted in the US.

Planet Atheism is the world’s top aggregator of atheism-related blogs. It’s (by design) not good for search engines, as all its content is duplicated from the member blogs (and every member either asked to join or accepted an invitation; no one is aggregated there without permission), and PA doesn’t show individual posts ever. Click on a post title, and you are taken to the post on the original blog. In other words, PA is by design “below” every single one of its blogs in terms of search engine positions; most hits come from people who actually search for PA itself, or for “atheist blog(s)” or something like that.

On the other hand, it’s an established site (launched in 2006), hosted in the US, and in English. Let’s see how it looks like:

Planet Atheism - Google vs. Bing hits, March 2011
Planet Atheism - Google vs. Bing hits, March 2011
Bing hits: 2.20% of Google hits

Ouch. Not very good for Bing, is it?

So, we’ve shown that Bing is sending about 2-3 hits for every 100 sent by Google, a very small percentage. But now for the million dollar question: why? Logically, there are two possibilities:

  1. Bing is performing “worse”: that is, either doesn’t index sites as well as Google, or it is giving them worse positions — possibly out of the first page for results Google shows in the first 10 results; or
  2. Bing is actually performing “as well” as Google (or possibly even better), but it has only about 2-3% of the users Google has.

Note that, in either case, Google has a lot more users; the question is whether, for the same number of users, Bing is performing worse, better, or the same. And I think we can find out… but this post is getting a bit long, so I’ll leave that for Part 2.

When is nudity “art”, and when is it porn?

“Yeah, but…” Fred Colon hesitated here. He knew in his heart that spinning upside down around a pole wearing a costume you could floss with definitely was not Art, and being painted lying on a bed wearing nothing but a smile and a small bunch of grapes was good solid Art, but putting your finger on why this was the case was a bit tricky.

“No urns,” he said at last.

“What urns?” said Nobby.

“Nude women are only Art if there’s an urn in it,” said Fred Colon. This sounded weak even to him, so he added: “or a plinth ((no, I didn’t know that one either until today.)). Both is best, o’course. It’s a secret sign, see, that they put in to say that it’s Art and okay to look at.”

“What about a potted plant?”

“That’s okay if it’s in an urn.”

“What about if it’s not got an urn or a plinth or a potted plant?” said Nobby.

“Have you got one in mind, Nobby?” said Colon suspiciously.

“Yes, The Goddess Anoia ((Anoia is the Ankh-Morpork Goddess of Things That Get Stuck in Drawers.)) Arising from the Cutlery,” said Nobby. “They’ve got it here. It was painted by a bloke with three i‘s in his name, which sounds pretty artistic to me.”

“The number of i‘s is important, Nobby,” said Sergeant Colon gravely, “but in these situations you have to ask yourself: ‘Where’s the cherub? If there’s a little pink fat kid holding a mirror or a fan or similar, then it’s still okay. Even if he’s grinning. Obviously you can’t get urns everywhere.”

— Terry Pratchett, Thud!, 2005

So, you see, it’s easy. πŸ™‚

Ah, spammers, spammers…

I’m a bit torn. I’ve been looking at the pathetic attempts at automated spam in this blog, and my “I like an intellectual / technical challenge” side just wants to write a post detailing what they’re doing wrong and how they could easily make their tools much more effective.

On the other hand… we’re talking about spammers. The scum of the earth. The only creatures in the world that make lawyers and politicians look like decent, lovable human beings. Now, some may argue that what I could suggest in a couple of minutes wouldn’t be exactly rocket science, that they’d surely have thought of it already… but my point is that they haven’t. Both their methods and the comments themselves that they try to post are terrible and ridiculously easy to detect; anyone with half a brain could do a lot better.

In fact, if they’d just… ah, crap. I just can’t. It’d be like handing a loaded gun to a kid. The biggest asshole-ish jerk of a kid, and a brain-damaged, glue-eating one as well, but still, in some ways, a kid. Who could do a lot of damage with it. But, to get some idea, my “brilliant” suggestion for making comment spam tools much more effective — which would not only hugely improve the odds of beating Akismet, but would also be accepted by many less attentive blog owners — would take… a line and a half of text. This is not because I’m some sort of genius… it’s because they’re morons.

Interesting, email spams, while still typically primitive, are actually much more advanced than comment spam tools. There’s actually a bit of thought put into them. Comment spam tools, on the other hand, are the equivalent of a burglar wearing a Beagle Boys-like mask and a stereotypical black and white striped prison suit, and going in broad daylight from house to house, knocking on each door. πŸ™‚

Medieval: Total War (PC, 2002)

Note: this post is unchanged from one from 2005 in my old blog, The Games of My Life. But please see the new section at the end.

This game has a big problem. The load times. For some reason, in my Athlon XP 2000 with 1 GB of RAM and a fast hard drive, they’re huge – not “read a book”-like, but, still, 30-60 seconds to load a battle and 30-60 seconds to come back to the main map are, IMO, too much. Especially since Rome: Total War, their more recent and even more detailed game, actually has shorter load times.

That’s the problem. In almost every other respect, Medieval: Total War is virtually perfect.

Medieval: Total War - campaign map
M:TW - campaign map

M:TW, like its predecessor Shogun: Total War and its successor Rome: Total War, is a historical turn-based strategy game with fantastic real-time battles. These are really wonderful – no other game, except perhaps Close Combat, simulates a battle so well – and that one was squad-based. This one, though, can have armies of 10.000 men. On each side. And they all move, shout, fight and, possibly, die.

Continue reading Medieval: Total War (PC, 2002)

What’s so funny about ‘The Human Top’?

Some of you may recall how, in my recent post about the Wasp’s useful and insightful contribution to a discussion in a room otherwise full of men, as I mentioned that they were trying to capture the Human Top, I included this aside:

a villain whom nobody could take seriously until he later changed his name to Whirlwind, and got himself a new costume that didn’t look like he had a giant onion for a head…

Some, however, may not immediately see what’s funny — and dumb — about a supposedly “serious” villain (i.e. not one simply played for laughs) calling himself “The Human Top”. Especially in the case of non-English native speakers (not that I’m one myself, but…). Mainly because “top“, in this context, is a term whose meaning many people won’t know, mostly because 1) it’s already a common word, as the opposite of “bottom“, and 2) its meaning here refers to something that, while centuries old (if not millennia — I’ve just investigated, and it isn’t known), is relatively unseen these days — most people probably grow up without ever seeing one or even hearing it mentioned except perhaps when their parents or grandparents reminisce about the “good old days” and what they played with when they were kids, instead of these new-fangled Nintendos and Playstations.

This, then, is a top, also known as a spinning top:

Top

And, since the guy’s power was to spin around very fast, that’s naturally what Stan Lee named him after. πŸ™‚ Lee was a great creator, but from time to time he came up with very dubious names for characters or teams: did you know that his original name for the X-Men, overruled by his publisher, was “the Merry Mutants“? The prosecution rests. πŸ™‚

And, naturally, if a character was called “the Human Top”, it made sense that he looked like a top, right? So, here’s the guy:

The Human Top
Source: Tales to Astonish #50, 1964

Yup. He wore a helmet in the shape of a spinning top. Though, to me, it looks more like an onion. πŸ™‚

What’s interesting is that, as I said, this character was supposed to be serious — indeed, he was Giant-Man’s first foe after he added the “Gi” part to his name (before that, he was simply Ant-Man). And his ability — spinning around at an incredible speed — was actually very powerful and effective: he could move extremely fast, was virtually impossible to grab or hit, could “fly” up simply by spinning very fast in one place, and in one later story, he actually managed to beat Quicksilver, Marvel’s equivalent of the Flash. Not only that, he was one of the most intelligent villains at the time, being both a good planner capable of subtlety, and a quick thinker. But who could ever take seriously a guy named after an old children’s toy and who looked like an onion bulb? πŸ™‚ So, nobody can blame him for later changing his name to Whirlwind, and donning a new costume.

Now, I could tell you about the Trapster, formerly known as Paste-Pot Pete… πŸ™‚

Graphing search engine referrals with AWStats and MRTG

From my previous post, AWStats tip: creating static pages (and why it’s a good idea):

for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input

So… anyone curious? πŸ™‚

Note: again, beware of word wrapping below. I’ve added an empty line between any “true” line of text; if you see two or more lines together, it’s supposed to be just one.

cat /root/bin/winterdrake-se.sh

 

#!/usr/local/bin/bash

GOOGLE=`grep \>Google\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $GOOGLE ]; then

GOOGLE=0

fi

echo $GOOGLE

BING=`grep \>Microsoft\ Bing\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $BING ]; then

BING=0

fi

YAHOO=`grep \>Yahoo\!\</a\>\</td\>\<td\>
/var/www/htdocs/AWstats/awstats.winterdrake.com.html | awk -F"<td>" '{print $2}' | cut -d "<" -f 1`

if [ -z $YAHOO ]; then

YAHOO=0

fi

echo $(($BING + $YAHOO))

uptime | awk -F"up " '{print $2}'

uname -n

EDIT in April 1, 2011: fixed the script so that it deals with non-existing entries (typically on the first day of the month).

By the way, this is a FreeBSD box; that’s the reason why bash is in /usr/local/bin. In Linux, it’s probably in /bin.

The mrtg.cfg file is mostly standard: it invokes the script shown above, and labels “I” as hits from Google, and “O” as hits from Bing and Yahoo! put together.

If you want to see what the result looks like, look here.

If you’ve been paying attention, yes, both values will reset to zero the first day of every month. That’s exactly how I want it: I want to see if every month it “rises” faster than the month before, or slower, or whatever. πŸ™‚

AWStats tip: creating static pages (and why it’s a good idea)

AWStats is probably the most popular free statistics package for self-hosted sites (if we don’t count “external” ones such as Google Analytics), and, as any decent Unix sysadmin probably knows, there are several ways of configuring it.

One of them is by having the CGI accessible on the web and having it analyze the logs and generate the statistics on demand. I don’t think many people use it that way, though — not only is it the slowest method, but it could theoretically be used for DOS attacks. Yes, you could put it somewhere private (and it’s probably still a good idea to do so, no matter what method you use), either by using a non-world accessible web server, or by adding authentication. But, still, there are no real advantages to this method, other than being sure you have the absolutely most recent stats. But having the stats of, say, 5 minutes or less ago is, in most cases, more than good enough.

In my experience, most people use an intermediate method: the CGI is still accessible, but is isn’t capable of analyzing logs; it just generates the stats page. The logs themselves are analyzed by the same CGI file, but through a local crontab.

And this is what I had been using until today. Yes, much like in the case of the “cd back” trick, I had been using AWStats for years… and only today did I switch to using fully static pages. A few more of these and someday I may have to turn in my geek card. πŸ™‚

It’s pretty easy to configure AWStats this way: Here’s my old crontab line:

*/5 * * * * /usr/local/cgi-bin/awstats.pl -config=winterdrake.com -update >/dev/null 2>&1

And here’s my new one (if it word wraps, it’s supposed to be a single line):

*/5 * * * * /usr/local/bin/awstats_buildstaticpages.pl -config=winterdrake.com -update -awstatsprog=/usr/local/cgi-bin/awstats.pl -dir=/var/www/htdocs/awstats

Before that, I had to put awstats_buildstaticpages.pl (included in the AWStats /tools directory) in the /usr/local/bin directory (you may prefer it somewhere else, of course), and create the /var/www/htdocs/awstats directory so that the static files could be put there. And now, they’re accessible on https://myserver/awstats/awstats.winterdrake.com.html . They look exactly the same as if I accessed the CGI directly (which I can still do, in order to see yearly reports, for instance — but I do that very rarely), but let’s do a little benchmarking, shall we?

CGI version:
Requests per second: 2.33 [#/sec] (mean)

Static version:
Requests per second: 4557.69 [#/sec] (mean)

Now, you may be thinking: “yes, the speed is in a completely different order of magnitude, but I don’t look at my stats all the time, and they’re private, so nobody else does… isn’t taking half a second good enough?” Yes, that’s true… but getting rid of limits is always a good thing, because you can then do so much more. Suppose you don’t have half a dozen sites on that server, but a thousand, with statistics for all of them? Suppose you want to use the AWStats stats to generate other stats (for instance, I’m currently using MRTG to plot a graph of Google and Bing referrals, using the AWstats-generated static pages as input) ((here, another advantage becomes obvious: I can now do this through a trivial combination of grep and awk on a static HTML file.))? In both these examples (and I’m sure there are many more), having stats accessible almost instantly and taking up virtually no processing power at all is obviously a Very Good Thingβ„’.

Other advantages: you can move your stats to a (virtual) server that only serves static files, since that what they’ll be. Alternatively, if you had CGI processing enabled just for AWStats, you now can simply turn it off on your web server, improving its security.

Bad Comic Panels #2: “If there’s one thing I like, it’s being in a room full of men!”

Wasp: "If there's one thing I like, it's being in a room full of MEN!"
Source: Tales to Astonish #51, 1964

Sometimes, things that were common and acceptable at one time become unintentionally funny decades later. A great example is the panel above, in which Giant-Man (Hank Pym) and several government types are discussing how they’ll frustrate the Human Top ((a villain whom nobody could take seriously until he later changed his name to Whirlwind, and got himself a new costume that didn’t look like he had a giant onion for a head…))’s plans, what does the Wasp, a.k.a. Janet Van Dyne, Giant-Man’s girlfriend and sidekick, co-founder of the Avengers, who in the future would get to be one of the most successful leaders of that group ((in Roger Stern’s excellent run)), think of the entire situation, and what insight will she add to the discussion?

"Mmmm, if there's one thing I like, it's being in a room full of MEN!"

Yup. πŸ™‚

And this was in a comic by the top creative team at the time, Stan Lee and Jack Kirby.

Now, stuff like this was actually common at the time, and nobody blinked an eye at it, or even saw any possible implications in a young woman claiming to… ahem… love being in a room full of men. πŸ™‚ Those were indeed sexist times, and that included comics; a woman’s goal was, basically, to get married and settle down, and a “proper” woman looked up to men, depended on them, and remained silent while the males discussed the “important stuff”. Even some earlier, innovative female characters weren’t much better: remember than, when Wonder Woman joined the Justice League, she was the secretary of the group (though, of course, that’s been retconned since then). There wouldn’t be real independent women in mainstream comics until the 70s, with Ms. Marvel (Carol Danvers) being an early example; she was actually billed at the time as Marvel’s first feminist heroine.

Deathstalker II: Duel of the Titans

Deathstalker II: Duel of the TitansIn my teen ears (read “the Eighties”), I remember watching a movie, rented on a VHS tape (remember those?) from a video club (remember those?). What I remembered from it was that it was a fantasy movie, made with a low budget, with some female nudity, and which didn’t take itself too seriously. Its name was Deathstalker II: Duel of the Titans.

After more than 20 years, I found it again (not that I kept looking for it for all this time; I remember looking for it on DVD in Amazon.com some years ago, but I think at the time it wasn’t available, and that was it), re-watched it, and, yes, it’s better than I remembered. πŸ™‚ From what I learned by reading Wikipedia and TV Tropes, this is the only movie from this low-budget series (there are 4 in total) where they didn’t take it too seriously, but instead made fun of the fantasy / “Sandals & Sorcery / Conan-like” genre, and in general had a good time. And it shows — the actors are clearly having fun. πŸ™‚

It’s not a “deep” or brilliant movie, of course, but it’s funny, entertaining, and it’s a perfect “beer with friends” film, in my opinion. Try to find it; if you liked the description so far, you won’t regret it.

And it includes this scene, which makes me laugh every time I see it, and which, incidentally, is just my second upload to YouTube ever: