Archive for the ‘Milestones’ Category

Big Update: Better Recommendations, New Features

Posted on August 11th, 2009 by Andrei Oghina  |  1 Comment »

We are happy to announce that we finally deployed the update we have been (silently) working on for the last couple of months.

The most important aspect of this update is that now Emmy is even smarter: she knows about more artists, bands, movies, authors and books, than ever before. She gives better recommendations for a large variety of inputs, and the suggestions are generally cleaner and more to the point. Of course, there still is a lot to do in order to improve her awareness, but this has been an important step forward.

Also, the Wikipedia teasers and the Youtube clips are more accurate now, and the number of situations in which the wrong info is shown for a particular resource has been drastically decreased. Still, because such situations may still occur, we have introduced a new feature: user reporting. This feature allows the users to easily report such a mistake and recommend a more suitable Wikipedia article or Youtube clip for that particular resource. Using this valuable feedback that we have already begun receiving, we are hoping to increase the accuracy of Emmy’s infos even more.

Just so you won’t have to type “Lock Stock and Two Smoking Barrels” or “Fear and Loathing in Las Vegas”, we have introduced the auto-complete: each time you start typing a new resource as input for Emmy’s recommendations, the auto-complete will suggest some of the more popular names that begin with those letters.

All these, along with some other more subtle front-end and back-end improvements, are part of our efforts to provide you with the best recommendation engine for music, movies and books, that you can find. And we’ll continue to do that.

Improved Knowledge Base, Better Recommendations

Posted on March 13th, 2009 by Andrei Oghina  |  Comments Off

I have just deployed the latest version of Emmy’s knowledge base. Built on top of more than double the information that the previous version was using, it allows Emmy to give better, more accurate, and also more up to date recommendations for a larger variety of bands, artists, movies and books. There is, and there will always be room for improvement, but this is an important step forward. Hoping even more people will discover interesting resources using this service, I continue to work in order to provide you with more pleasant surprises and improvements.

Oh, and thanks again for all the encouraging feedback you’re sending!

Extensive Update: New Knowledge Base and More

Posted on November 17th, 2008 by Andrei Oghina  |  Comments Off

A couple of days ago I have uploaded a series of updates that I have been working on for a while.

The most important update probably is Emmy’s knowledge base itself, that I have rebuilt from scratch. It is definitely more comprehensive and accurate now, and I truly hope this reflects in the quality of Emmy’s recommendations. I am aware that some of the users may still be disappointed with her results, but I am making efforts to improve her ability to provide relevant recommendations to more and more different inquiries.

Besides giving better suggestions, Emmy is now able to provide further information for the majority of the stuff she recommends. Until this update, only a limited number of resources had a Wikipedia description available. Now, for almost all the bands, movies, and, for the first time, for books also, Emmy is happy to present you the begging of the Wikipedia article of that particular resource. This way, I hope you will be able to make a better first impression on the things suggested to you.

Some less visible updates have been made also. For example, you can now search for up to 20 items (given as input to Emmy), separated by commas. Up until now, the limit was 10 items, and the few of you who actually tried to give her 10 bands, movies and/or books to better describe your taste, would have probably experienced a longer execution time. Now, Emmy is able to grasp, process and make her recommendations upon your input much faster, and for double to number of resources given as input.

Another less visible update is that now Emmy, besides improving her “Did you mean” feature, has also learned some common spelling mistakes and abbreviations.

To give you just a few examples: beatles, bjork, alanis morriset, linkinpark, rhcp, amélie, lotr, soad, beethoven.

A useful update is that you can now specify the type of your input, and also request a type for the recommended items. There are cases when the same name can stand for a band name and for a movie title in the same time, for example. Now, if Emmy assumes wrong about what you are trying to tell her that you like, you can mention her the type by using the “band:”, “movie:” or “book:” operators.

For example: band:underworld, movie:harry potter, book:trainspotting.

Also, you can specify what type of stuff you want to receive as recommendations, that is, band and artists, movies or books. This can be useful in many cases, interesting, and fun to play with sometimes.

For example: the beatles//movies, fight club//music, pulp fiction//books.

There is still a lot of work to do in order to make Emmy wise enough to provide the vast majority of her users with interesting, relevant recommendations they haven’t heard of before, but I hope these updates will prove to be a step forward in accomplishing this mission.

TasteKid Becomes a Google Word

Posted on November 4th, 2008 by Andrei Oghina  |  Comments Off

Searching for TasteKid on Google no longer triggers a “Did you mean: Taste Kid” message (with the blank space between “taste” and “kid”).

The Buzz Effect

Posted on October 30th, 2008 by Andrei Oghina  |  3 Comments »

A couple of days ago, an article featuring TasteKid made it up to Digg’s main page. This triggered an “emergency landing” on Delicious’s main page, too. Around the same time, this happen. Then this. Then, others followed.

Google PageRank Updates

Posted on September 28th, 2008 by Andrei Oghina  |  Comments Off

As a way of marking the recently increase to PageRank 5 of TasteKid at the last Google PageRank update a couple of days ago, I have decided to write this short article about my opinion on what these PageRank updates really are.

While most of us are familiar with the concept of PageRank, there is a certain degree of uncertainty surrounding the actual meaning of the 0 to 10 value displayed by the Google Toolbar. What does this value reflect? Why an update on this value doesn’t necessary reflect in user traffic? And, probably the most important, why is that in such a dynamic environment like the web, where it is clear that the importance of many pages (viewed as the quantity and the quality of inbound links) can change substantially in short periods of time, the Google PageRank updates are performed only once every few months?

For answering these questions we’ll have to first understand what PageRank is and how it is computed. The basic concept is probably best explained by Google itself:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important”.

This basic concept, translated in a formula, looks like this:

meaning that the PageRank value for a page u is dependent on the PageRank values for each page v out of the set Bu (this set contains all pages linking to page u), divided by the number L(v) of links from page v.

The way this formula works makes it clear that the PageRank of a particular web page is computed using the PageRank of all the pages that link towards that specific page. Now, the PageRank of each of those pages has to be already computed in order to perform such a task, but the structure of the web doesn’t provide the possibility to establish a sequence of pages in such a way that, for every evaluated page, all the pages that link towards it have already been evaluated before. In other words, the web has cycles (and it has lots of them).

For that reason, The PageRank computations require several passes, called “iterations”, through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.

These steps can be viewed like this:

Step 0: all pages have equal PageRank.

Step 1: every page gets its PageRank computed. Note that at this point, all inbound links have the same quality, because all pages still have the same PageRank determined at step 0.

Step 2: every page gets its PageRank computed again. This time, the inbound links have different qualities, the ones determined at step 1.

These steps succeed each other indefinitely, thus creating a better and better approximation of the real (actually, theoretical) PageRank value.

There is also a dumping factor involved that prevents the inflation of PageRank. An interesting (and beautifully simple) fact is that, despite the billions of web pages from the web and all the rises and falls of thousands of websites each day, “the sum of all PageRanks is 1” [The Anatomy of a Large-Scale Hypertextual Web Search Engine, Brin, S.; Page, L.].

Considering these facts, I’ll go back to my initial questions and try to answer them based on my limited knowledge regarding this process.

Why is that the PageRank updates are performed only once every few months?

First of all, I would like to emphasize my belief that, even though PageRank updates are published every few months, they are actually happening all the time. Google constantly crawls the web, and, while doing so, also extracts links and computes PageRank using the basic idea of the algorithm described above. This process takes time though. While I’m sure Google puts a lot of effort into computing relevant PageRanks, a lot of web sites, sometimes situated in the more suburban areas of the Internet, are prone to sudden apparent shifts in PageRank caused by spikes of popularity, spamdexing or other events. Regardless of the computing power available, a certain amount of time is necessary in order to dump the effects of such events and establish a more timeless, thus relevant importance given by the web to one of its members, that is, that particular web page.

So what I’m saying is that I don’t think that the published PageRanks are a snap-shot of the actual instant PageRanks of all web sites. In order to prevent as much as possible abnormalities that happen all the time in the more juvenile areas of the web interfering with the relevancy of the published values, I think Google established that it has to perform an evaluation over a longer period of time and then publish some values that reflect in a way the average behavior of those websites in the given window.

In order to keep the consistency of the provided values, this analysis has to be preformed on the same data set. Even though this data set (Internet states) extends over a longer period of time, it has to be the same for all the web sites involved, and this is why I think that PageRank updates are performed the way they are, once every few months, for all the websites at once.

What does the toolbar PageRank value reflect?

As I was saying, in theory, the sum of all PageRanks is 1. That means that the value of real PageRank for a specific web page is most often incredibly small, and I would suspect that in practice a much grater value than 1 is used in order to save all those exponent bits of the floating point representation of real numbers (it is probably possible to automatically determine and adjust this number to an optimal value). Regardless the actual scale used for real PageRank, these values are then rescaled using (what it is thought to be) a logarithmic scale, between 0 and 10. This logarithmic scale basically means that it requires much more incoming links to get from PageRank 4 to PageRank 5 than getting from PageRank 3 to PageRank 4 (where 5, 4 and 3 are values displayed by the toolbar).

Why the logarithmic scale? Well, just think that without it, in order to have a PageRank of 1, your site should have 1/10 from the PageRank of google.com (google.com has a PageRank of 10). This would mean that the majority of web sites would have a displayed page rank of 0 and this would have made the option of displaying the PageRank useless.

So, in my opinion, the toolbar PageRank reflects the logarithmic scaled average (actually, more complex measurements techniques are probably used than preforming a simple average) of the instant PageRank values for a given period of time in the past.

Why an update on this value doesn’t necessary reflect in user traffic?

Although the SERP performances of a web page are obviously driven by PageRank, there is no reason for the publishing of the historical PageRank behavior of that particular page for the last period of time to influence these performances. The Google PageRank updates are just passive reports, and, if a site receives and increase in PageRank on such an update, it has most probably already gradually felt that increase in SERP terms and thus in traffic.

Major Change in Taste Kid's Results

Posted on September 26th, 2008 by Andrei Oghina  |  Comments Off

Sixteen years old Ian wrote to Emmy a few hours ago:

What happened with the recommendations? They used to be so much better. Whenever I would look for Forrest Gump recommendations my 4 other favorite movies came up (The Shawshank Redemption, Gladiator, Braveheart, The Green Mile) on the list, proving that the system you were using before was working since those are my 5 favorite movies. Now only The Green Mile came up and movies like The Terminal and Cast Away are up top. Did you change how you did this? If so, the old way worked much better.

I would like to publicly answer to this feedback and explain a little bit what is happening.

Dear Ian, first of all I would like to thank you for using this service and, moreover, for proving me with this feedback. Yes, you are right, a major change has happened, a couple of days ago. You see, Taste Kid’s main goal is to be a discovery engine, to help people explore their taste by finding out about new bands, artists, movies and books. Many people that are using Emmy (including myself) felt that the suggestions where becoming more and more oriented towards popular stuff. Your personal example is great for that matter: you where searching for “Forrest Gump”, and you where given recommendations like The Shawshank Redemption, Gladiator, Braveheart and The Green Mile. Although these are all great movies, it is very unlikely that you haven’t already seen them. I was myself searching for some of my favorite bands or movies, and even though the recommendations where good, I was rarely discovering something new.

To give an example, check the results for Metallica:

http://www.tastekid.com/ask?q=metallica (new/current way)

http://www.tastekid.com/ask?q=metallica&old=1 (old way)

As you can see, using the old approach, the second recommendation for “Metallica” was “Nirvana”. Now, I’m sure people trying to discover new bands somewhat similar to “Metallica” aren’t looking for “Nirvana”; whether they like this band or not, they have most certainly already heard of it.

Given all these, I have decided to make a change. I have changed the formula that determines the relevancy of each result in a way that encourages less popular items to achieve good scores. This is a big gamble for Taste Kid. Up until now, people found it hard to disagree with the results, but, in the same time, they where rarely discovering something new. Now, by promoting less popular items, there is a much bigger chance to screw-up and to get reactions like yours. But I think it’s worth it. Since I’ve performed this change, I have personally discovered several interesting bands and movies that I haven’t heard of up until now. I’m sure there are even better ways of computing this relevancy, but I feel that the new formula is a step forward.

So give it some faith and play with it a little. While the new results have a bigger chance of containing things that you don’t like, in the same time, there is a bigger chance of finding a few things that you will like, and you haven’t heard of before. And this is ultimately Emmy’s goal :)

"Official" launch

Posted on January 21st, 2008 by Andrei Oghina  |  Comments Off

Well, actually it wasn’t all that official, but on the 21st of January 2008 I finally got the tastekid.com domain up and running and uploaded the new version (with the new design) of TasteKid. And that was pretty much it.

Note: This post has been written in retrospect and posted on September 10th, 2008.

Getting a Design

Posted on January 15th, 2008 by Andrei Oghina  |  Comments Off

This might be an unusual sequence of events for a web startup, but, after implementing the application, getting it online in a test environment, playing with Google AdWords and buying a domain, I was finally starting to think about getting a web design.

Even though I am an advocate of the importance of the true value of a product or service, the looks are certainly very important. I was satisfied with the functionality of my initial design, but in order to attract people I knew I had to come up with something much more eye-candy than that. One of the reasons I postponed the moment of getting a professional web design was that it was the only thing I actually had to pay for (besides the domain name).

I talked with Romi @ AdWorks Media, who, for a reasonable amount of money, came up with the look and feel that Emmy has today. I thank him for that, it is truly great work.

I also want to mention here xX-Faith-Xx, who drawn Emmy’s first sketch, without knowing it (I later asked permission for using the sketck, but it wasn’t drawn for this project – to quote her, it was “just a crap drawing, drawn in the car”).

Last, but not least, I’ll mention my brother Felix here, who helped me with some wicked tweaks, like getting the tooltip (that appears when mouse-overing the “?” icons) done and working properly.

Note: This post has been written in retrospect and posted on September 10th, 2008.

Finding a Domain Name

Posted on January 10th, 2008 by Andrei Oghina  |  2 Comments »

The next step was to buy a domain name. Although many would have probably done this long before going public, or even before developing the application, most certainly before starting an AdWords campaign, I though getting the application up and running was more important. While online marketing and branding are paramount to any product and service, I think nowadays there is too much buzz surrounding these aspects, almost as if they are more important than the product or service itself.

This type of view also applies on the domain names market. People are buying domains only because they are cheap, not because they are committed to do something with them. Specialized companies are trading large amounts of domains, making a business from reserving domains containing certain keywords, then selling them on a bigger price to other companies and individuals that, most often, end up doing nothing with them. Although I have quite liberal views and I believe in the open market, I think that this trade layer between registration authorities and truly interested clients has many of the characteristics of a parasite.

Even though many interesting domain names are already reserved, usually by such companies that buy them only hoping to sell for profit, I think there are still a lot of free catchy domains out there. After a couple of days of searches, I came up with TasteKid.com for this project, and I must say I am very pleased with it.

Although I’m far from being an expert, and there are many places where you can find suggestions about finding a good domain name, here are some of my advices:

1. Unless you are keen on buying a precise domain, try to find one that is still available – that is, not already registered; it will be much cheaper, you won’t feed the companies that make a living from selling and buying domains and you will have the satisfaction of finding a domain that no one thought about before. Search for available domains using one of the many available tools, like http://www.checkdomain.com/.

2. Try to find a short domain. I know, many short domains are already taken, but there are still a lot of them out there. What does short mean? Obviously it’s very hard to find anything unregistered bellow 6 characters on the .com market. I would 6 to 10 characters, but that is not a rule. Think though that a shorter domain means people will remember it more easily, type it faster in the address bar, and, maybe most important, subconsciously giving it more trust. Think of a Google result page for some music-related query, what would you choose between a spinner.com result and a freemusicforeverybody.com result? I think many of us would chose spinner.com, even though the second domain name is more descriptive.

3. Be careful when thinking about going for other TLDs than .com. Even though .net, .biz and other TLDs may offer a lot more available domain names, people are more reluctant to trust these and have become used to thinking quality web sites have usually .com domain names. There are a lot of successful exceptions, one of them being Last.fm, and hopefully people attitude will change over time, but still think twice before going for it. Obviously this doesn’t apply to regional (contry) TLDs, that are trusted by people living in that particular country. Also, think twice before using numbers or hyphens in your domain name. These may be catchy in some instances, but then again, they usually are driving people away.

4. Think outside the box. If you sell cheap cars, your domain name doesn’t have to contain “cars”, “buy” and “cheap” in it. You can later optimize your site for these keywords, but in my opinion it is more important for a domain to be short and catchy than descriptive. Of course, if you manage to get a domain that contains a keyword related to what the site is about, that is very good. But don’t become frustrated if you can’t find something too descriptive. Your domain is your brand, and your brand should be, first of all, original. buy-cheap-cars.com has nothing original in it.crazycars.com, well, that would be an improvement, don’t you think?

5. Play with words. Think of a few words that you wold like at least one of them to be contained in your domain name. Then add other words to them, maybe less related to what you do. There are keywords suggestion tools available online, that may help you with this game.

6. Search and ask for advice. Read articles like this one, use online domain name suggestion tools. Tell about the domain name you are thinking about buying to your friends, business partners or family. Take into consideration their feedback, but remember that first reactions in situations like these are not always very relevant. Just think about a friend asking you about your opinion on the domain “google.com”, before Google emerged. I bet you would have said something like “But it doesn’t even mean anything!”. So don’t overdo it.

7. When you find a domain name that you like, don’t think too much about buying it. There is always that thought that “maybe I can find an even better one”, but don’t leave this thought take over you. Remember, as important as the domain name is, there are many other things much more important, like the web site you will host on that domain. And, most often, after you buy the domain, you start to get attached to it and to like it even more.

Good luck!

Note: This post has been written in retrospect and posted on September 10th, 2008.