In my last post I explained that I was investigating the interrelation of keywords (or tags) in eBay titles to see if they could help me to producing some kind of dynamic categorization for the eBay site. I completed my research yesterday and wanted to share it here, the answer definitely seems to be yes and that really excites me for a whole host of reasons.
The experiment involved me pulling all the titles for the 77750 listings related to the word iPod on eBay.com (siteid:0) and saving all those titles into a database. I then split each title into individual words and entered them into a mySQL database (I love mySQL but that's a whole other story) then running an SQL query to give me the DISTINCT keywords and the COUNT of how many times they occured in the titles.
What I found was absolutely fantastic. The keyword ipod had a 31% probability of association with the word Nano, 28% with case and Apple, 27% with video and so on. It also had a 42% association with the word for however out of the top 100 words that was the only irrelevant word (- and + and & also turned up). Thinking about how to remove this irrelevance I also ran the keyword tyre on eBay.co.uk and guess what all the irrelevant words for ipod turned up in the top 100 for tyre as well so deduping the two lists simply obliterated the irrelevant words. I realize there is a lot further to go with this to make it scalable and useful. My article on temporal perturbation is important too but this research simply tells me there is gold in them there hills as far as the eBay API is concerned and using it for research rather than just tools. Someone needs to mine it more than they have already.
This however leaves me with my one irritation with the awesome tool which is phpMyAdmin and that is unlike Teradata QueryMan it does not save your last n queries. I often want to go back to a query which I forgot to save and rerun it and I have to rewrite it from scratch. That happened in the posting of this article (I didn't automate the final analysis... my bad!) and left me a tiny bit grumpy.
Anyway, dynamic categorization and keyword discovery carefully optimized by millions of sellers over 10s of thousands of items daily with billions of dollars of GMV in any category of item you could want exposed for free!!! That's gotta be worth something to someone :)