Archive for the ‘Lessons Learned’ Category

The Buzz Effect

Posted on October 30th, 2008 by Andrei Oghina  |  3 Comments »

A couple of days ago, an article featuring TasteKid made it up to Digg’s main page. This triggered an “emergency landing” on Delicious’s main page, too. Around the same time, this happen. Then this. Then, others followed.

Pending Update

Posted on October 26th, 2008 by Andrei Oghina  |  Comments Off

A couple of weeks ago I’ve started doing some changes on Emmy’s learning processes. What am I trying to do is basically increasing Emmy’s knowledge base and providing better quality recommendations.

This new redesigned engine is currently in testing, and it will take some time until its knowledge will be used on the public version of TasteKid. If everything goes well, I expect this to happen in about a month or so.

I will describe in further depth these updates when that time will come. Until then, Emmy continues to improve herself using the existing engine. Some of the time constrains are related to the limitations of the available hardware, so I think I will open myself a little bit more to sponsorship opportunities in the near future.

37signals's "Getting Real"

Posted on October 14th, 2008 by Andrei Oghina  |  Comments Off

Getting Real is one of the best resources out there for all the people participating in designing, implementing, launching, marketing and pretty much everything related to a new web-based product or service.

The book (which is free to read online and is divided in small, easy to grasp chapters) dates back to 2006, but it is probably more actual now than ever. To give just one quote:

“The first priority of many startups is acquiring funding from investors. But remember, if you turn to outsiders for funding, you’ll have to answer to them too. Expectations are raised. Investors want their money back — and quickly. The sad fact is cashing in often begins to trump building a quality product.

These days it doesn’t take much to get rolling. Hardware is cheap and plenty of great infrastructure software is open source and free. And passion doesn’t come with a price tag.”

37signals has become a cult company, and, although some may consider they’ve broken their own rules when they took external funding, many of their advices and philosophies are valuable pieces of Internet business wisdom.

Webstock Awards

Posted on October 3rd, 2008 by Andrei Oghina  |  1 Comment »

Today, TasteKid won third prize at Webstock Awards 2008, a Romanian Web 2.0 contest, within the “Utility” category. Thank you Cristian Manafu for suggesting me to participate in this contest. It was a pleasent experience :)

Rewriting the URLs

Posted on October 2nd, 2008 by Andrei Oghina  |  Comments Off

Whether we call it rewriting the URLs or changing the permalink structure, it is a common practice to use “nice” URLs on dynamic content websites, instead of the default ones. For example, this post has the URL http://tastekid.com/blog/?p=75, and if I was rewriting the URLs it would have probably looked like this: “http://tastekid.com/blog/2008/10/rewriting-the-urls”.

Nowadays, noble arguments are used in favor of rewriting the URLs, like “improving the aesthetics, usability, and forward-compatibility of your links” [Wordpress]. But let’s face it, these weren’t the most important arguments back when this whole practice begun. Instead, this was (and still is) one of the basic SEO techniques used in order to improve the SERP performances of web pages.

It is a known fact that one of the measures used by search engines when evaluating a page is the URL relevancy. For example, “http://example.com/how-to-rewrite-urls.html” will be seen as more relevant than “http://example.com/?articleID=2435″, for a search performed on “how to rewrite URLs”, given the same article content, page title, inbound links etc.

Besides the lack of relevancy, there was (and still is, to a certain point) a belief that search engines have something against dynamic URLs, that is, URLs that are using GET variables (e.g.: “?x=1&y=2″).

A long time ago, a large part of the web was consisted of static pages. That meant that if the URL of a page was “http://www.example.com/how-to-rewrite-urls.html”, a real physical HTML file called “how-to-rewrite-urls.html” presumably existed on the disk in the root of the example.com domain. The idea was that these static pages where considered to be more relevant than dynamic pages, that can be artificially generated in large numbers and don’t necessary contain relevant information. This idea is obviously obsolete now, and I am sure that for some time now search engines don’t even bother to consider that a “nice” URL may address a real physical file.

I don’t think that today search engines have anything against dynamic URLs. The only (but important) factor that contributes to the SERP performances is the URL relevancy. To explain my point of view by giving an example, I do think that “http://example.com/how-to-rewrite-urls.html” will perform better than “http://example.com/?articleID=2435″, but I don’t think it is considered any better by search engines than “http://example.com/?q=how-to-rewrite-urls”.

Rewriting the URLs is a good practice in most cases. Besides improving SERP performances, it also provides a sort of teaser for the page in the URL (if you give the URL to a friend through a messaging system, he or she will have an idea what it is about). Rewriting the URLs for this blog would be a good idea. For TasteKid though, I have decided not to do it. One of the reasons is that I consider a link like http://www.tastekid.com/ask?q=Radiohead to be sufficiently relevant. Another reason is a sort of statement against rewriting the URLs when it is not necessary, and proving that search engines don’t have preconceptions with dynamic URLs (Google has indexed most of Taste Kid’s pages). Also, rewriting the URLs comes with a (small but greater than 0) processing overhead, and for a search discovery engine, I think classic dynamic URLs are more appropriate.

Google PageRank Updates

Posted on September 28th, 2008 by Andrei Oghina  |  Comments Off

As a way of marking the recently increase to PageRank 5 of TasteKid at the last Google PageRank update a couple of days ago, I have decided to write this short article about my opinion on what these PageRank updates really are.

While most of us are familiar with the concept of PageRank, there is a certain degree of uncertainty surrounding the actual meaning of the 0 to 10 value displayed by the Google Toolbar. What does this value reflect? Why an update on this value doesn’t necessary reflect in user traffic? And, probably the most important, why is that in such a dynamic environment like the web, where it is clear that the importance of many pages (viewed as the quantity and the quality of inbound links) can change substantially in short periods of time, the Google PageRank updates are performed only once every few months?

For answering these questions we’ll have to first understand what PageRank is and how it is computed. The basic concept is probably best explained by Google itself:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important”.

This basic concept, translated in a formula, looks like this:

meaning that the PageRank value for a page u is dependent on the PageRank values for each page v out of the set Bu (this set contains all pages linking to page u), divided by the number L(v) of links from page v.

The way this formula works makes it clear that the PageRank of a particular web page is computed using the PageRank of all the pages that link towards that specific page. Now, the PageRank of each of those pages has to be already computed in order to perform such a task, but the structure of the web doesn’t provide the possibility to establish a sequence of pages in such a way that, for every evaluated page, all the pages that link towards it have already been evaluated before. In other words, the web has cycles (and it has lots of them).

For that reason, The PageRank computations require several passes, called “iterations”, through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.

These steps can be viewed like this:

Step 0: all pages have equal PageRank.

Step 1: every page gets its PageRank computed. Note that at this point, all inbound links have the same quality, because all pages still have the same PageRank determined at step 0.

Step 2: every page gets its PageRank computed again. This time, the inbound links have different qualities, the ones determined at step 1.

These steps succeed each other indefinitely, thus creating a better and better approximation of the real (actually, theoretical) PageRank value.

There is also a dumping factor involved that prevents the inflation of PageRank. An interesting (and beautifully simple) fact is that, despite the billions of web pages from the web and all the rises and falls of thousands of websites each day, “the sum of all PageRanks is 1” [The Anatomy of a Large-Scale Hypertextual Web Search Engine, Brin, S.; Page, L.].

Considering these facts, I’ll go back to my initial questions and try to answer them based on my limited knowledge regarding this process.

Why is that the PageRank updates are performed only once every few months?

First of all, I would like to emphasize my belief that, even though PageRank updates are published every few months, they are actually happening all the time. Google constantly crawls the web, and, while doing so, also extracts links and computes PageRank using the basic idea of the algorithm described above. This process takes time though. While I’m sure Google puts a lot of effort into computing relevant PageRanks, a lot of web sites, sometimes situated in the more suburban areas of the Internet, are prone to sudden apparent shifts in PageRank caused by spikes of popularity, spamdexing or other events. Regardless of the computing power available, a certain amount of time is necessary in order to dump the effects of such events and establish a more timeless, thus relevant importance given by the web to one of its members, that is, that particular web page.

So what I’m saying is that I don’t think that the published PageRanks are a snap-shot of the actual instant PageRanks of all web sites. In order to prevent as much as possible abnormalities that happen all the time in the more juvenile areas of the web interfering with the relevancy of the published values, I think Google established that it has to perform an evaluation over a longer period of time and then publish some values that reflect in a way the average behavior of those websites in the given window.

In order to keep the consistency of the provided values, this analysis has to be preformed on the same data set. Even though this data set (Internet states) extends over a longer period of time, it has to be the same for all the web sites involved, and this is why I think that PageRank updates are performed the way they are, once every few months, for all the websites at once.

What does the toolbar PageRank value reflect?

As I was saying, in theory, the sum of all PageRanks is 1. That means that the value of real PageRank for a specific web page is most often incredibly small, and I would suspect that in practice a much grater value than 1 is used in order to save all those exponent bits of the floating point representation of real numbers (it is probably possible to automatically determine and adjust this number to an optimal value). Regardless the actual scale used for real PageRank, these values are then rescaled using (what it is thought to be) a logarithmic scale, between 0 and 10. This logarithmic scale basically means that it requires much more incoming links to get from PageRank 4 to PageRank 5 than getting from PageRank 3 to PageRank 4 (where 5, 4 and 3 are values displayed by the toolbar).

Why the logarithmic scale? Well, just think that without it, in order to have a PageRank of 1, your site should have 1/10 from the PageRank of google.com (google.com has a PageRank of 10). This would mean that the majority of web sites would have a displayed page rank of 0 and this would have made the option of displaying the PageRank useless.

So, in my opinion, the toolbar PageRank reflects the logarithmic scaled average (actually, more complex measurements techniques are probably used than preforming a simple average) of the instant PageRank values for a given period of time in the past.

Why an update on this value doesn’t necessary reflect in user traffic?

Although the SERP performances of a web page are obviously driven by PageRank, there is no reason for the publishing of the historical PageRank behavior of that particular page for the last period of time to influence these performances. The Google PageRank updates are just passive reports, and, if a site receives and increase in PageRank on such an update, it has most probably already gradually felt that increase in SERP terms and thus in traffic.

Major Change in Taste Kid's Results

Posted on September 26th, 2008 by Andrei Oghina  |  Comments Off

Sixteen years old Ian wrote to Emmy a few hours ago:

What happened with the recommendations? They used to be so much better. Whenever I would look for Forrest Gump recommendations my 4 other favorite movies came up (The Shawshank Redemption, Gladiator, Braveheart, The Green Mile) on the list, proving that the system you were using before was working since those are my 5 favorite movies. Now only The Green Mile came up and movies like The Terminal and Cast Away are up top. Did you change how you did this? If so, the old way worked much better.

I would like to publicly answer to this feedback and explain a little bit what is happening.

Dear Ian, first of all I would like to thank you for using this service and, moreover, for proving me with this feedback. Yes, you are right, a major change has happened, a couple of days ago. You see, Taste Kid’s main goal is to be a discovery engine, to help people explore their taste by finding out about new bands, artists, movies and books. Many people that are using Emmy (including myself) felt that the suggestions where becoming more and more oriented towards popular stuff. Your personal example is great for that matter: you where searching for “Forrest Gump”, and you where given recommendations like The Shawshank Redemption, Gladiator, Braveheart and The Green Mile. Although these are all great movies, it is very unlikely that you haven’t already seen them. I was myself searching for some of my favorite bands or movies, and even though the recommendations where good, I was rarely discovering something new.

To give an example, check the results for Metallica:

http://www.tastekid.com/ask?q=metallica (new/current way)

http://www.tastekid.com/ask?q=metallica&old=1 (old way)

As you can see, using the old approach, the second recommendation for “Metallica” was “Nirvana”. Now, I’m sure people trying to discover new bands somewhat similar to “Metallica” aren’t looking for “Nirvana”; whether they like this band or not, they have most certainly already heard of it.

Given all these, I have decided to make a change. I have changed the formula that determines the relevancy of each result in a way that encourages less popular items to achieve good scores. This is a big gamble for Taste Kid. Up until now, people found it hard to disagree with the results, but, in the same time, they where rarely discovering something new. Now, by promoting less popular items, there is a much bigger chance to screw-up and to get reactions like yours. But I think it’s worth it. Since I’ve performed this change, I have personally discovered several interesting bands and movies that I haven’t heard of up until now. I’m sure there are even better ways of computing this relevancy, but I feel that the new formula is a step forward.

So give it some faith and play with it a little. While the new results have a bigger chance of containing things that you don’t like, in the same time, there is a bigger chance of finding a few things that you will like, and you haven’t heard of before. And this is ultimately Emmy’s goal :)

Google's Perception

Posted on September 14th, 2008 by Andrei Oghina  |  Comments Off

In terms of usual website content, the one that Google appreciates, Taste Kid is a disaster. Not only it has tens of thousands of pages that all look alike, but the content of these pages are nothing more than a list of internal links (the suggested items). I can’t blame Google if it finds that suspicious, as I’m sure that its bots find it hard to determine the value of these pages. One of my biggest fears was that Google will permanently consider Taste Kid as a sort of link farm, trying to gain page rank by having lots of pages that link randomly to each other (yes, I do think that a page never has a page rank value of 0, and, to a certain extent, having many pages that link to one another will increase your overall page rank, but that’s another discussion).

Luckily for me, Google hasn’t been that drastic. Despite the lack of classic original content, it constantly crawls and indexes Taste Kid’s pages. I suppose, after all, the very enumeration of resources (bands, movies, books), that is unique for every page, can be seen as a type of original content, and I’m glad Google perceives it that way. I just hope it won’t change its opinion one day.

Google AdWords II: Simple Advices

Posted on February 15th, 2008 by Andrei Oghina  |  Comments Off

After experimenting with Google AdWords I’ve learned some simple lessons, that probably all of you out there that have at least some experience with this program are already aware about. Nevertheless, because my second campaign was actually very successful in terms of high add relevancy, high click rates and low costs-per-clicks, I’ll post here some simple advices for the ones that don’t have experience working with AdWords.

First of all, in order to obtain good ad performance, keep in mind you’ll need a strong connection between your ad keywords (or key phrases), the ad itself and the landing page. To be more specific, one of my ad-triggering key phrases was “similar music”. The ad itself had the title “find similar music” and the body of the ad contained the word “music”. The landing page, that is, the main page of Taste Kid, has a title that contains the phrase “find similar music”, and these words are to be found on the page’s description and content, as well.

To explain a little, Google has to establish how relevant your ad is. The better this relevancy, the better your campaign’s performance. The only data Google has in order to do that is your ad and your landing page. So, the keywords you define should be relevant, words-wise, to the ad that you have defined, and the ads should be relevant to the page that is targeted. So there should be always a strong connection between these 3: the keywords, the ad, the landing page.

Considering this, the steps for defining an ad group would be:

1.Establish what are the keywords that best describe the page you are about to promote.
2.If necessary, do a little SEO to optimize that page for those keywords (setting a relevant page title, page description, etc.)
3.Define an ad that contains those keywords. Of course, it should also be coherent and appealing. Considering an ad has a limited number of characters, concentrate on a maximum of 3 keywords (two, or even one, may be enough). Try to have one keyword in the ad title.
4.Then, set as keywords in Google AdWords the keywords you have chosen at step 1.

Maybe the most important thing is to find the right keywords. A few simple rules for that would be:

1.The keyword you chose should describe or be related to what your page is all about. Moreover, the page should be optimized for those keywords.
2.The keywords should not be too generic. For example, in my successful campaign I’ve used key phrases like “similar music”, as opposed to “music”, or “find music”, that would have been too generic. Many other sites most probably are already using those keywords and are much more relevant for them.
3.The keywords should not be too particular. You have to come up with keywords or key phrases that many people are searching for on Google. This is why I’ve used “similar music”, as opposed to “music recommender system”, for example. I made the educated guess that much more people will search for things like “similar music metallica” than “ music recommender system website”.

Giving all these things I’ve learned, I’ve managed to define a campaign having a very good performance. Due to financial restrains though, I had to pause this campaign for a while. Actually, I still have an outstanding balance for Google of about 100 euros, that I’m planning to pay as soon as I’ll have the money.

Note: This post has been written in retrospect and posted on September 13th, 2008.

Finding a Domain Name

Posted on January 10th, 2008 by Andrei Oghina  |  2 Comments »

The next step was to buy a domain name. Although many would have probably done this long before going public, or even before developing the application, most certainly before starting an AdWords campaign, I though getting the application up and running was more important. While online marketing and branding are paramount to any product and service, I think nowadays there is too much buzz surrounding these aspects, almost as if they are more important than the product or service itself.

This type of view also applies on the domain names market. People are buying domains only because they are cheap, not because they are committed to do something with them. Specialized companies are trading large amounts of domains, making a business from reserving domains containing certain keywords, then selling them on a bigger price to other companies and individuals that, most often, end up doing nothing with them. Although I have quite liberal views and I believe in the open market, I think that this trade layer between registration authorities and truly interested clients has many of the characteristics of a parasite.

Even though many interesting domain names are already reserved, usually by such companies that buy them only hoping to sell for profit, I think there are still a lot of free catchy domains out there. After a couple of days of searches, I came up with TasteKid.com for this project, and I must say I am very pleased with it.

Although I’m far from being an expert, and there are many places where you can find suggestions about finding a good domain name, here are some of my advices:

1. Unless you are keen on buying a precise domain, try to find one that is still available – that is, not already registered; it will be much cheaper, you won’t feed the companies that make a living from selling and buying domains and you will have the satisfaction of finding a domain that no one thought about before. Search for available domains using one of the many available tools, like http://www.checkdomain.com/.

2. Try to find a short domain. I know, many short domains are already taken, but there are still a lot of them out there. What does short mean? Obviously it’s very hard to find anything unregistered bellow 6 characters on the .com market. I would 6 to 10 characters, but that is not a rule. Think though that a shorter domain means people will remember it more easily, type it faster in the address bar, and, maybe most important, subconsciously giving it more trust. Think of a Google result page for some music-related query, what would you choose between a spinner.com result and a freemusicforeverybody.com result? I think many of us would chose spinner.com, even though the second domain name is more descriptive.

3. Be careful when thinking about going for other TLDs than .com. Even though .net, .biz and other TLDs may offer a lot more available domain names, people are more reluctant to trust these and have become used to thinking quality web sites have usually .com domain names. There are a lot of successful exceptions, one of them being Last.fm, and hopefully people attitude will change over time, but still think twice before going for it. Obviously this doesn’t apply to regional (contry) TLDs, that are trusted by people living in that particular country. Also, think twice before using numbers or hyphens in your domain name. These may be catchy in some instances, but then again, they usually are driving people away.

4. Think outside the box. If you sell cheap cars, your domain name doesn’t have to contain “cars”, “buy” and “cheap” in it. You can later optimize your site for these keywords, but in my opinion it is more important for a domain to be short and catchy than descriptive. Of course, if you manage to get a domain that contains a keyword related to what the site is about, that is very good. But don’t become frustrated if you can’t find something too descriptive. Your domain is your brand, and your brand should be, first of all, original. buy-cheap-cars.com has nothing original in it.crazycars.com, well, that would be an improvement, don’t you think?

5. Play with words. Think of a few words that you wold like at least one of them to be contained in your domain name. Then add other words to them, maybe less related to what you do. There are keywords suggestion tools available online, that may help you with this game.

6. Search and ask for advice. Read articles like this one, use online domain name suggestion tools. Tell about the domain name you are thinking about buying to your friends, business partners or family. Take into consideration their feedback, but remember that first reactions in situations like these are not always very relevant. Just think about a friend asking you about your opinion on the domain “google.com”, before Google emerged. I bet you would have said something like “But it doesn’t even mean anything!”. So don’t overdo it.

7. When you find a domain name that you like, don’t think too much about buying it. There is always that thought that “maybe I can find an even better one”, but don’t leave this thought take over you. Remember, as important as the domain name is, there are many other things much more important, like the web site you will host on that domain. And, most often, after you buy the domain, you start to get attached to it and to like it even more.

Good luck!

Note: This post has been written in retrospect and posted on September 10th, 2008.