The War on Spam: Google Fights Back

Google is engaged in a war. It is a war on spam.war." It's ironic that Google used this word when it
With new strategies and filters ready to put intodefined Internet Spam.Google trademarked the
place, the search engine is adding new firepowerterm "TrustRank" and is working on a new spam
to its arsenal almost daily. Webmasters and SEOremoving model that they explain in what forum
Consultants alike are terrified; fearing what theposters are referring to as the Stanford White
future holds for them. But for those of us thatPaper. "Web spam pages use various techniques
believe in the cause, the future isn't scary. In fact,to achieve higher-than-deserved rankings in a
the future looks very bright.My ten year old sonsearch engine's results. While human experts can
is fascinated with war. He has a dozen buckets fullidentify spam, it is too expensive to manually
of army men, and makes everything aevaluate a large number of pages. Instead, we
battlefield-the kitchen, my bedroom, and even thepropose techniques to semi-automatically separate
bathroom. He has a new bicycle helmet that'sreputable, good pages from spam. We first select
army green. For Halloween, when other kids werea small set of seed pages to be evaluated by an
Spiderman and Batman, he was a soldier. Heexpert. Once we manually identify the reputable
constantly plays computer games like Soldiers ofseed pages, we use the link structure of the web
WWII and Battlefield 1942; he even turns broomsto discover other pages that are likely to be
and mops into weapons to combat the invisiblegood. In this paper we discuss possible ways to
enemy. War is all he talks about. He loves moviesimplement the seed selection and the discovery
like Saving Private Ryan, Pearl Harbor, and Platoon.of good pages. We present results of
He knows more about both World Wars andexperiments run on the World Wide Web indexed
Vietnam then I'll ever hope to, or care to, know.by AltaVista and evaluate the performance of our
His obsession with war got me thinking about howtechniques. Our results show that we can
it applied to what I do every day. What doeseffectively filter out spam from a significant
SEO and war have in common? More to thefraction of the web, based on a good seed set of
point, how does Google implement strategies thatless than 200 sites." This comes from a 12 page
declare war on spam?SEO is a constant struggleabstract, called "Combating Spam with
to get our clients' websites to the top. WeTrustRank", on Stanford University's website that
combat lousy SEO companies that give us a badoutlines the methodology of TrustRank.In
name, flagrant ads that claim they can do whatsummary, TrustRank is a way to cut down on
we do for only $29 by submitting your site to aspam and filter out content that is not relevant to
thousand search engines, and other littlethe searcher in order to bring them results they
annoyances that pop up every day. Even still, myreally want, by branding good sites with a high
small battles are really nothing when you comparetrust rating, and by stamping the spam sites as
it to the war that Google is waging. Google'suntrustworthy, including any site that links to
number one goal is to bring the visitor the mostthese delineated sites. Google's abstract says,
relevant results possible in a search engine. This"Human editors help search engines combat
means filtering and sorting through all the junk outsearch engine spam, but reviewing all content is
there, so that you, the visitor, doesn't haveimpractical. TrustRank places a core vote of trust
to."It's an arms race," Steve Linford, director ofon a seed set of reviewed sites to help search
the London-based SpamHaus Project, said. "Theengines identify pages that would be considered
more we lock (spammers) down, the moreuseful from pages that would be considered
techniques they try to get around us." Thespam. This trust is attenuated to other sites
SpamHaus Project is a nonprofit organization thatthrough links from the seed sites." Google's
posts information about the groups behind thefamous PageRank seems to have lost meaning,
majority of unsolicited e-mail, and maintains aas sites are easily able to produce back links or
"black hole" list of domains from which spammerspurchase them, which defeats the purpose of
operate. Spam accounted for at least one in fourPageRank. In my opinion, TrustRank makes more
email messages a business received in 2002. Thesense. It makes a webmaster more careful with
U.S. Attorney General's website has an entirewhom he or she links to in the first place, making
page on the subject. "Almost 45 percent of allback links harder to get, but well worth the
email is now spam and that number is growingreward once they are earned.Another way
each year. Nearly three trillion spam messages areGoogle is fighting Internet spam is called the
sent each year - 13 times the total snail mail"Sandbox Effect". The Sandbox Effect is
delivered by the U.S. Postal service. The averageessentially a delay of a few months once a site is
wired American is hit with nearly 2,200 spamspidered before it is indexed. Sometimes, a new
messages annually - this after most ISPs havesite may initially receive a high ranking in the
filtered 80-90 percent of the junk messages.search engines, and then drop into search engine
Some reports indicate that these numbers couldobscurity. They may receive no page rank, and
increase by five times in the near future."Marketcan be virtually invisible in the search engines for
research firm, Gartner Inc., estimates that theirup to 120 days. While this may seem like a
company of over 10,000 employees sufferspenalty to new website owners, especially if they
more than $13 million worth of lost productivityare unaware of the new filters or how they work
because of internally generated spam. This is justand why, it is Google's way of fighting spam. Their
email spam. Throw in the spam on the internet,methodology is that in the "sandbox" (named such
and it's a huge productivity drain. It causesfor the analogy of a bunch of new kids playing in
companies financial losses because they have tothe sandbox together away from the grownups),
purchase more high tech software like spamspammers won't see the results of their efforts
blockers and spy-ware removers, and it's a strainin the search engine, and may possibly be fooled
on system servers and bandwidth.Google definesinto thinking they've either been caught, or their
Internet Spam as any unwanted information orefforts have been futile. Google hopes the
propaganda that may have been received throughspammers will then simply give up and go away.
deceptive measures on the part of the sender.In war, we call this technique flanking, hoping to
To a search engine, spam is hyperlinked pagescatch the enemy off guard by coming around
that are intent on misleading the search engine. Itbehind their line, causing them to panic or
is estimated that 80% of search results for anywithdraw. The desired result of the Sandbox
keyword phrases entered into a search engineEffect is that the spammers most likely will do
are considered spam.During World War II, theboth: panic and withdraw; or better yet,
term propaganda earned the negative connotationsurrender. Flanking is one of the most effective
because of intended deceptions used to dispiritplan of attack, and the most difficult to achieve,
those on the front lines by Nazi Germany. Soldiersas it requires finesse, secrecy, and being able to
and citizens were constantly bombarded with thisknow your enemy's moves before they do.As in
new psychological weapon. Most propaganda inany war, it can be long, bloody, and both sides
Germany was produced by the Ministry for Publiccan sustain heavy casualties. While spammers are
Enlightenment and Propaganda, or PROMI. Josephfiltered out, some legitimate websites can be
Goebbels was placed in charge of this ministryannihilated as well, due to inadequate SEO,
shortly after Adolf Hitler took power in 1933. Hitlermistakes in their pages (like broken links), or just
was impressed by the power of Allied propagandasimple ignorance to the way search engines work.
during World War I and believed that it had beenIt is the responsibility of your five-star General to
a primary cause of the collapse of morale andguide you and develop your strategy. Your SEO
revolts in the German home front and Navy inconsultant can lead you through the minefield of
1918. Nazis had no moral qualms about spreadingsearch engine optimization techniques without
propaganda which they themselves knew to thetriggering any of the mines, and keeping you safe.
false and indeed spreading deliberately falseIf you inadvertently set off a mine, you lose your
information was part of a doctrine known as thehard earned ranking, the traffic that goes with it,
"Big Lie", the theory he wrote about in his book,and the resulting sales from that traffic. You will
Mein Kampf. In Mein Kampf, Hitler wrote thatthen fall into the multitudes of spam casualties;
people came to believe that Germany waspossibly earning a Google ban forever. Will the
defeated in the First World War in the field due tocasual observer see these casualties? No. On the
a propaganda technique used by Jews who weresurface, everything feels peaceful. In fact, the
influential in the German press."British and Alliedwar only helps the average citizens and their
fliers were depicted as cowardly murderers andrelevant search results, and in the end, brings a
Americans in particular as gangsters in the stylebetter search environment for all. This is, after all,
of Al Capone. At the same time, Germanwhat Google really wants. Peace.Jennifer E. Sullivan
propaganda sought to alienate Americans andis an Internet Business Consultant who specializes
British from each other, and both these Westernin search engine optimization and web marketing.
belligerents from the Soviets." --World War 2Her emphasis is on small to medium business
Propaganda ( The propaganda was effective to amarketing. She has written several web
degree; however, it was repudiated by the Alliedmarketing articles, including "Hiring an SEO
Powers' own positive and truthful doctrine.Now,Consultant: 10 Reasons Why You Should", "Let's
the term propaganda has come to mean,Not Forget About the Little Guy", and "PageRank
"information that is spread for the purpose offor Websites: Is There More To the Web?".
promoting some cause, such as a doctrine in a