Today has been quite challenging in project cocktail. The main issue is working out how much information to store and what granularity to summarize it on. Data is pouring in rapidly, I have defined over 5000 directional cocktail relationships and probably 10% of my cocktails within one day have at least one relationship I would define as relevant (I shall work on statistical significance fairly heavily later on).
The issue is that every row in my database is currently taking up 46bytes and I am adding c. 7.5k rows a day. The index on the database is then adding another 25% to this and there is little I can do here since I am required to create a primary key. Therefore daily I am creating 0.4MB of data in unsummarized form. The machine I am using has 100MB of storage and so I could store up to 250 days of raw (unsummarized) data or 2.17million rows (handy cos right now my cocktail DB could theoretically create 2.17million different directional relationships with different likelihood factors).
In the short term I am going to do nothing more than summarize on a month basis and depreciate the value of prior months over time but in the long term I want to know if recommendations should differ significantly by various user related variables and hence I want to store those variables so that in the future I can have a significant dataset to query and work out the impact of those variables. My target is to make this engine slightly scary at predicting what cocktail you might want to see next and to hit the scary threshold will take a little more than the #1 most likely relationship to deliver. So all things considered I have 0.5TB of data storage going spare right now in 5 mySQL databases... let's go for the big data :) and see what comes out!!!
Tomorrow I will be helping move 100 rowing boats so that my new boatclub can undergo an awesome renovation over the next 3wks which means I will have to take a break (and let the data gather into an even higher volume). Hopefully Sunday I will be able to do some analysis and maybe even build a mock up that shows what the output could start to look like. For now goodnight!