« May 2006 | Main | July 2006 »

Thoughts about Tagging

For those of you who don't know tagging is a little bit of a phenomenon sweeping the web right now. The idea is that any content can be "tagged" with a series of words which will describe it and then you can reaccess that content through the words. Tagging leads to tag clouds where the tags for a piece of content or across a community of users are displayed in a paragraph and the font size corresponds to the tag popularity. Tagging also leads to tag search engines, community categorization and loads more interesting applications.

To my mind however tagging is a very old thing in terms of the web. Tagging is (IMHO) a way of attaching the most relevant keyword/keywords to a set of content. Effectively anchor text ("<a href='someurlorother'>THIS IS ANCHOR TEXT</a>") is just a form of "tagging" by webmasters and titles for webpages are just strings of tags the owners have put together.

Putting together these existing "tags" and the concept of tagging has however been giving me some interesting ideas. Auction sites, classified sites and personal sites effectively receive thousands of tags a day. Self learning category structures could be created really powerfully from this stream of user generated tags and since it is in the interest of the millions of these users to describe their products or items for sale so they sell well it is definitely in their interest to make the tags as relevant and accurate as possible to the ideal (if not always the actual) product. All this data is exposed to any API programmer and someone with the skills and interest could create a product search more powerful than any of the Kelkoo's etc... out there right now with millions of small businesses and individuals automatically helping them with accuracy without even realizing it.

I suspect I may post more on this later since I am now planning to give it a go in a small category of listings on eBay. My previous post on temporal perturbation definitely has relevance in this area too.

Temporal Perturbation

I had a really interesting chat with one of my colleagues last night about keyword detection, keyword relevance algorithms and in general self learning computer systems for the web. These are growing more and more important in the web today. Great examples of this are Google search results, Amazon recommends and all tagging technologies.

The problem that is becoming evident at a low level is described in natural search circles as the rich get richer. Since links exist for ever older links may be less relevantly describing a site's content than newer links and yet these links are still used heavily in determining keyword relevance for a page in Google.

In recommends services if I have browsed both good to great and built to last recently amazon will happily recommend the two together. Unsurprisingly with that combo thousands of others will do that over time and the affinity between those books will grow strong in their database. Here is where I enter fiction since I don't know how Amazon really works but some recommendation services certainly work in this way: What if Chris Anderson's upcoming book on the long tail has a stronger affinity with Good to Great (not implausible) and more people start visiting those two books in the same session... many will still visit built to last and good to great and due to the massive weight that affinity already has the more recent change could take months or years to appear.

Advanced direct response paid search strategies will optimize the keywords they buy based on historic data on ROI and brand campaigns on historic brand uplift or tweaks in user behaviour. At what point do those historic models harm you and do you need to downweight history in order to upweight changing trends in user behaviour.

A suggestion I have heard is to assign all data sets a relevance score and downweight that relevance logarithmically with time... I love that suggestion... there are certainly other possibilities out there but don't be surprised if the internet back end needs another tweak within the next 2 yrs as the big sites begin to realize their clever technology isn't quite as clever as it used to be and is getting a little bit too tied to past results and a little slow to pick up on present trends... interestingly close to a human being as they age really!

Thoughts about JSON

JSON is a really interesting new trend in webservices it stands for JavaScript Object Notation. For those of you who use AJAX you should know that the key call XMLHttpRequest will only work within the same domain and so you have to have an interface hosted on your server to call the XML output (if it is on a 3rd party server) and re-render it for your AJAX (or DHTML if you are old fashioned like me) interface.

JSON is different since JSON returns a series of javascript arrays and elements in the same way as you would from a javascript include. This means a programmer doesn't need to know the complicated XML Dom model processing in Javascript to do dynamic work but instead the programmer only needs to know how to handle variables.

JSON also gives bandwidth savings in the majority of cases since the JSON code sent contains fewer characters (in general) than XML in specifying elements and variables.

So with this double upside a lot of companies are starting to offer JSON for example the excellent yahoo geocoding is now offering JSON as described in this O'Reilly XML.com post.

I feel however the strongest JSON offering (if you understand XSL) is the eBay REST XSLT service. This service allows you to upload a stylesheet of your design to the eBay servers and have the response to any REST GetSearchResults call converted instantly into your favourite flavour (which could be JSON) using a stylesheet you design. This doesn't tend to notably slow down the REST call and saves time and coding on the developer side of the equation. I am really excited to see how JSON expands and am throwing together a few tests myself but wanted to shout about what a great opportunity it is and how everyone should be using it :)

My Photo
Blog powered by TypePad

my flickr tag cloud