Paul Lamere tests the algorithmic playlist generators from iTunes, Echo Nest, and the new Google Music. I like his metric, the “WTF Test”:
Evaluating playlists is hard. However, there is something that we
can do that is fairly easy to give us an idea of how well a
playlisting engine works compared to others. I call it the WTF
test. It is really quite simple. You generate a playlist, and just
count the number of head-scratchers in the list. If you look at a
song in a playlist and say to yourself ‘How the heck did this
song get in this playlist’ you bump the counter for the
playlist. The higher the WTF count the worse the playlist. As a
first order quality metric, I really like the WTF Test. It is easy
to apply, and focuses on a critical aspect of playlist quality. If
a playlist is filled with jarring transitions, leaving the
listener with iPod whiplash as they are jerked through songs of
vastly different styles, it is a bad playlist.
Spoiler: Google’s Instant Mix did terribly on these tests. I’ll play devil’s advocate and say that maybe this is the sort of thing that needs more time and more users to get the algorithm and song database tuned.