Archive for the ‘Lessons Learned’ Category

The Buzz Effect

|

A couple of days ago, an article featuring TasteKid made it up to Digg’s main page. This triggered an “emergency landing” on Delicious’s main page, too. Around the same time, this happen. Then this. Then, others followed.

Pending Update

|

A couple of weeks ago I’ve started doing some changes on Emmy’s learning processes. What am I trying to do is basically increasing Emmy’s knowledge base and providing better quality recommendations.

This new redesigned engine is currently in testing, and it will take some time until its knowledge will be used on the public version of TasteKid. If everything goes well, I expect this to happen in about a month or so.

I will describe in further depth these updates when that time will come. Until then, Emmy continues to improve herself using the existing engine. Some of the time constrains are related to the limitations of the available hardware, so I think I will open myself a little bit more to sponsorship opportunities in the near future.

Conclusions on AdSense and Amazon Experiments

|

About a week ago I’ve started experimenting with Google AdSense and the Amazon Affiliate program. Today the experiment ends, and I have decided to turn off, at least for now, both programs.

TasteKid is an unusual website from Google’s point of view. The AdSense targeting algorithms focus on analyzing the content of the page on which the ads are delivered, but TasteKid’s content is just a list of seemingly random internal links (the suggested items). Despite this, Google has managed to provide related ads in some cases (e.g.: concert tickets for one of the suggested bands), but, unfortunately, many times it fails to come up with relevant ads. This is hardly Google’s fault, as it only tries to match TasteKid’s content with items from its pool of ads, and not so many advertisers set as keywords for their ads band names or movie titles.

The consequence of non-relevant ads being displayed is a very low click through rate, that not only reflects negatively on AdSense earning, but also suggests that users are not interested in these ads. This furthermore has a negative impact on user experience, like any other element on a web page that is displayed despite the fact the user is not interested in it. In conclusion, I have decided that, at least for now, TasteKid is better off AdSense.

As for the Amazon Affiliate program, things are a little bit different. The ads were presented within the tooltip (the one that appears for each resource) and the advertised product was a direct result of the parameters sent to Amazon (i.e. band name or movie title). This meant that, for known resources (and the resources for which the ads triggered were quite popular, because I was displaying them only for resources on which Emmy has a Wikipedia description), the relevancy of the advertised product was pretty good. Even so, few users seemed interested in following these ads. The reason for this is that it’s quite unlikely for somebody to rush and actually buy through Amazon a whole album of a band that they’ve just discovered – they would probably first try to learn more about the band, or listen to more songs performed by that particular band that may be freely available on the web. As for movies, I think this option is actually quite useful, but, despite this, the overall usefulness of this whole Amazon Affiliate program isn’t very clear to me just yet. I have decided to stop this program together with Google AdSense for now.

Nevertheless, this was an interesting experience for me, that I will probably use in the future, when I will be more preoccupied on monetizing TasteKid’s traffic while maintaining a pleasant user experience.

37signals’s “Getting Real”

|

Getting Real is one of the best resources out there for all the people participating in designing, implementing, launching, marketing and pretty much everything related to a new web-based product or service.

The book (which is free to read online and is divided in small, easy to grasp chapters) dates back to 2006, but it is probably more actual now than ever. To give just one quote:

“The first priority of many startups is acquiring funding from investors. But remember, if you turn to outsiders for funding, you’ll have to answer to them too. Expectations are raised. Investors want their money back — and quickly. The sad fact is cashing in often begins to trump building a quality product.

These days it doesn’t take much to get rolling. Hardware is cheap and plenty of great infrastructure software is open source and free. And passion doesn’t come with a price tag.”

37signals has become a cult company, and, although some may consider they’ve broken their own rules when they took external funding, many of their advices and philosophies are valuable pieces of Internet business wisdom.

Webstock Awards

|

Today, TasteKid won third prize at Webstock Awards 2008, a Romanian Web 2.0 contest, within the “Utility” category. Thank you Cristian Manafu for suggesting me to participate in this contest. It was a pleasent experience :)

Rewriting the URLs

|

Whether we call it rewriting the URLs or changing the permalink structure, it is a common practice to use “nice” URLs on dynamic content websites, instead of the default ones. For example, this post has the URL http://tastekid.com/blog/?p=75, and if I was rewriting the URLs it would have probably looked like this: “http://tastekid.com/blog/2008/10/rewriting-the-urls”.

Nowadays, noble arguments are used in favor of rewriting the URLs, like “improving the aesthetics, usability, and forward-compatibility of your links” [Wordpress]. But let’s face it, these weren’t the most important arguments back when this whole practice begun. Instead, this was (and still is) one of the basic SEO techniques used in order to improve the SERP performances of web pages.

It is a known fact that one of the measures used by search engines when evaluating a page is the URL relevancy. For example, “http://example.com/how-to-rewrite-urls.html” will be seen as more relevant than “http://example.com/?articleID=2435″, for a search performed on “how to rewrite URLs”, given the same article content, page title, inbound links etc.

Besides the lack of relevancy, there was (and still is, to a certain point) a belief that search engines have something against dynamic URLs, that is, URLs that are using GET variables (e.g.: “?x=1&y=2″).

A long time ago, a large part of the web was consisted of static pages. That meant that if the URL of a page was “http://www.example.com/how-to-rewrite-urls.html”, a real physical HTML file called “how-to-rewrite-urls.html” presumably existed on the disk in the root of the example.com domain. The idea was that these static pages where considered to be more relevant than dynamic pages, that can be artificially generated in large numbers and don’t necessary contain relevant information. This idea is obviously obsolete now, and I am sure that for some time now search engines don’t even bother to consider that a “nice” URL may address a real physical file.

I don’t think that today search engines have anything against dynamic URLs. The only (but important) factor that contributes to the SERP performances is the URL relevancy. To explain my point of view by giving an example, I do think that “http://example.com/how-to-rewrite-urls.html” will perform better than “http://example.com/?articleID=2435″, but I don’t think it is considered any better by search engines than “http://example.com/?q=how-to-rewrite-urls”.

Rewriting the URLs is a good practice in most cases. Besides improving SERP performances, it also provides a sort of teaser for the page in the URL (if you give the URL to a friend through a messaging system, he or she will have an idea what it is about). Rewriting the URLs for this blog would be a good idea. For TasteKid though, I have decided not to do it. One of the reasons is that I consider a link like http://www.tastekid.com/ask?q=Radiohead to be sufficiently relevant. Another reason is a sort of statement against rewriting the URLs when it is not necessary, and proving that search engines don’t have preconceptions with dynamic URLs (Google has indexed most of Taste Kid’s pages). Also, rewriting the URLs comes with a (small but greater than 0) processing overhead, and for a search discovery engine, I think classic dynamic URLs are more appropriate.

Google PageRank Updates

|

As a way of marking the recently increase to PageRank 5 of TasteKid at the last Google PageRank update a couple of days ago, I have decided to write this short article about my opinion on what these PageRank updates really are.

While most of us are familiar with the concept of PageRank, there is a certain degree of uncertainty surrounding the actual meaning of the 0 to 10 value displayed by the Google Toolbar. What does this value reflect? Why an update on this value doesn’t necessary reflect in user traffic? And, probably the most important, why is that in such a dynamic environment like the web, where it is clear that the importance of many pages (viewed as the quantity and the quality of inbound links) can change substantially in short periods of time, the Google PageRank updates are performed only once every few months?

For answering these questions we’ll have to first understand what PageRank is and how it is computed. The basic concept is probably best explained by Google itself:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important”.

This basic concept, translated in a formula, looks like this:

meaning that the PageRank value for a page u is dependent on the PageRank values for each page v out of the set Bu (this set contains all pages linking to page u), divided by the number L(v) of links from page v.

The way this formula works makes it clear that the PageRank of a particular web page is computed using the PageRank of all the pages that link towards that specific page. Now, the PageRank of each of those pages has to be already computed in order to perform such a task, but the structure of the web doesn’t provide the possibility to establish a sequence of pages in such a way that, for every evaluated page, all the pages that link towards it have already been evaluated before. In other words, the web has cycles (and it has lots of them).

For that reason, The PageRank computations require several passes, called “iterations”, through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.

These steps can be viewed like this:

Step 0: all pages have equal PageRank.

Step 1: every page gets its PageRank computed. Note that at this point, all inbound links have the same quality, because all pages still have the same PageRank determined at step 0.

Step 2: every page gets its PageRank computed again. This time, the inbound links have different qualities, the ones determined at step 1.

These steps succeed each other indefinitely, thus creating a better and better approximation of the real (actually, theoretical) PageRank value.

There is also a dumping factor involved that prevents the inflation of PageRank. An interesting (and beautifully simple) fact is that, despite the billions of web pages from the web and all the rises and falls of thousands of websites each day, “the sum of all PageRanks is 1” [The Anatomy of a Large-Scale Hypertextual Web Search Engine, Brin, S.; Page, L.].

Considering these facts, I’ll go back to my initial questions and try to answer them based on my limited knowledge regarding this process.

Why is that the PageRank updates are performed only once every few months?

First of all, I would like to emphasize my belief that, even though PageRank updates are published every few months, they are actually happening all the time. Google constantly crawls the web, and, while doing so, also extracts links and computes PageRank using the basic idea of the algorithm described above. This process takes time though. While I’m sure Google puts a lot of effort into computing relevant PageRanks, a lot of web sites, sometimes situated in the more suburban areas of the Internet, are prone to sudden apparent shifts in PageRank caused by spikes of popularity, spamdexing or other events. Regardless of the computing power available, a certain amount of time is necessary in order to dump the effects of such events and establish a more timeless, thus relevant importance given by the web to one of its members, that is, that particular web page.

So what I’m saying is that I don’t think that the published PageRanks are a snap-shot of the actual instant PageRanks of all web sites. In order to prevent as much as possible abnormalities that happen all the time in the more juvenile areas of the web interfering with the relevancy of the published values, I think Google established that it has to perform an evaluation over a longer period of time and then publish some values that reflect in a way the average behavior of those websites in the given window.

In order to keep the consistency of the provided values, this analysis has to be preformed on the same data set. Even though this data set (Internet states) extends over a longer period of time, it has to be the same for all the web sites involved, and this is why I think that PageRank updates are performed the way they are, once every few months, for all the websites at once.

What does the toolbar PageRank value reflect?

As I was saying, in theory, the sum of all PageRanks is 1. That means that the value of real PageRank for a specific web page is most often incredibly small, and I would suspect that in practice a much grater value than 1 is used in order to save all those exponent bits of the floating point representation of real numbers (it is probably possible to automatically determine and adjust this number to an optimal value). Regardless the actual scale used for real PageRank, these values are then rescaled using (what it is thought to be) a logarithmic scale, between 0 and 10. This logarithmic scale basically means that it requires much more incoming links to get from PageRank 4 to PageRank 5 than getting from PageRank 3 to PageRank 4 (where 5, 4 and 3 are values displayed by the toolbar).

Why the logarithmic scale? Well, just think that without it, in order to have a PageRank of 1, your site should have 1/10 from the PageRank of google.com (google.com has a PageRank of 10). This would mean that the majority of web sites would have a displayed page rank of 0 and this would have made the option of displaying the PageRank useless.

So, in my opinion, the toolbar PageRank reflects the logarithmic scaled average (actually, more complex measurements techniques are probably used than preforming a simple average) of the instant PageRank values for a given period of time in the past.

Why an update on this value doesn’t necessary reflect in user traffic?

Although the SERP performances of a web page are obviously driven by PageRank, there is no reason for the publishing of the historical PageRank behavior of that particular page for the last period of time to influence these performances. The Google PageRank updates are just passive reports, and, if a site receives and increase in PageRank on such an update, it has most probably already gradually felt that increase in SERP terms and thus in traffic.

Major Change in Taste Kid’s Results

|

Sixteen years old Ian wrote to Emmy a few hours ago:

What happened with the recommendations? They used to be so much better. Whenever I would look for Forrest Gump recommendations my 4 other favorite movies came up (The Shawshank Redemption, Gladiator, Braveheart, The Green Mile) on the list, proving that the system you were using before was working since those are my 5 favorite movies. Now only The Green Mile came up and movies like The Terminal and Cast Away are up top. Did you change how you did this? If so, the old way worked much better.

I would like to publicly answer to this feedback and explain a little bit what is happening.

Dear Ian, first of all I would like to thank you for using this service and, moreover, for proving me with this feedback. Yes, you are right, a major change has happened, a couple of days ago. You see, Taste Kid’s main goal is to be a discovery engine, to help people explore their taste by finding out about new bands, artists, movies and books. Many people that are using Emmy (including myself) felt that the suggestions where becoming more and more oriented towards popular stuff. Your personal example is great for that matter: you where searching for “Forrest Gump”, and you where given recommendations like The Shawshank Redemption, Gladiator, Braveheart and The Green Mile. Although these are all great movies, it is very unlikely that you haven’t already seen them. I was myself searching for some of my favorite bands or movies, and even though the recommendations where good, I was rarely discovering something new.

To give an example, check the results for Metallica:

http://www.tastekid.com/ask?q=metallica (new/current way)

http://www.tastekid.com/ask?q=metallica&old=1 (old way)

As you can see, using the old approach, the second recommendation for “Metallica” was “Nirvana”. Now, I’m sure people trying to discover new bands somewhat similar to “Metallica” aren’t looking for “Nirvana”; whether they like this band or not, they have most certainly already heard of it.

Given all these, I have decided to make a change. I have changed the formula that determines the relevancy of each result in a way that encourages less popular items to achieve good scores. This is a big gamble for Taste Kid. Up until now, people found it hard to disagree with the results, but, in the same time, they where rarely discovering something new. Now, by promoting less popular items, there is a much bigger chance to screw-up and to get reactions like yours. But I think it’s worth it. Since I’ve performed this change, I have personally discovered several interesting bands and movies that I haven’t heard of up until now. I’m sure there are even better ways of computing this relevancy, but I feel that the new formula is a step forward.

So give it some faith and play with it a little. While the new results have a bigger chance of containing things that you don’t like, in the same time, there is a bigger chance of finding a few things that you will like, and you haven’t heard of before. And this is ultimately Emmy’s goal :)

Google’s Perception

|

In terms of usual website content, the one that Google appreciates, Taste Kid is a disaster. Not only it has tens of thousands of pages that all look alike, but the content of these pages are nothing more than a list of internal links (the suggested items). I can’t blame Google if it finds that suspicious, as I’m sure that its bots find it hard to determine the value of these pages. One of my biggest fears was that Google will permanently consider Taste Kid as a sort of link farm, trying to gain page rank by having lots of pages that link randomly to each other (yes, I do think that a page never has a page rank value of 0, and, to a certain extent, having many pages that link to one another will increase your overall page rank, but that’s another discussion).

Luckily for me, Google hasn’t been that drastic. Despite the lack of classic original content, it constantly crawls and indexes Taste Kid’s pages. I suppose, after all, the very enumeration of resources (bands, movies, books), that is unique for every page, can be seen as a type of original content, and I’m glad Google perceives it that way. I just hope it won’t change its opinion one day.

Google AdWords II: Simple Advices

|

After experimenting with Google AdWords I’ve learned some simple lessons, that probably all of you out there that have at least some experience with this program are already aware about. Nevertheless, because my second campaign was actually very successful in terms of high add relevancy, high click rates and low costs-per-clicks, I’ll post here some simple advices for the ones that don’t have experience working with AdWords.

First of all, in order to obtain good ad performance, keep in mind you’ll need a strong connection between your ad keywords (or key phrases), the ad itself and the landing page. To be more specific, one of my ad-triggering key phrases was “similar music”. The ad itself had the title “find similar music” and the body of the ad contained the word “music”. The landing page, that is, the main page of Taste Kid, has a title that contains the phrase “find similar music”, and these words are to be found on the page’s description and content, as well.

To explain a little, Google has to establish how relevant your ad is. The better this relevancy, the better your campaign’s performance. The only data Google has in order to do that is your ad and your landing page. So, the keywords you define should be relevant, words-wise, to the ad that you have defined, and the ads should be relevant to the page that is targeted. So there should be always a strong connection between these 3: the keywords, the ad, the landing page.

Considering this, the steps for defining an ad group would be:

1.Establish what are the keywords that best describe the page you are about to promote.
2.If necessary, do a little SEO to optimize that page for those keywords (setting a relevant page title, page description, etc.)
3.Define an ad that contains those keywords. Of course, it should also be coherent and appealing. Considering an ad has a limited number of characters, concentrate on a maximum of 3 keywords (two, or even one, may be enough). Try to have one keyword in the ad title.
4.Then, set as keywords in Google AdWords the keywords you have chosen at step 1.

Maybe the most important thing is to find the right keywords. A few simple rules for that would be:

1.The keyword you chose should describe or be related to what your page is all about. Moreover, the page should be optimized for those keywords.
2.The keywords should not be too generic. For example, in my successful campaign I’ve used key phrases like “similar music”, as opposed to “music”, or “find music”, that would have been too generic. Many other sites most probably are already using those keywords and are much more relevant for them.
3.The keywords should not be too particular. You have to come up with keywords or key phrases that many people are searching for on Google. This is why I’ve used “similar music”, as opposed to “music recommender system”, for example. I made the educated guess that much more people will search for things like “similar music metallica” than “ music recommender system website”.

Giving all these things I’ve learned, I’ve managed to define a campaign having a very good performance. Due to financial restrains though, I had to pause this campaign for a while. Actually, I still have an outstanding balance for Google of about 100 euros, that I’m planning to pay as soon as I’ll have the money.

Note: This post has been written in retrospect and posted on September 13th, 2008.