mysql copy table

Continuing my chain of blatantly obvious mysql queries that I nonetheless didn't know at one stage :) copying a table isn't an obvious function in mySQL. So here's how to do a mysql copy table

Say you have a table called "tbl_theoriginal" which has the perfect, ideal, wonderful table structure for a new table you want "tbl_thenewone" and you would like to create it the query you would create is as follows:

CREATE tbl_thenewone SELECT * FROM tbl_the original

Simple huh! You can put in any WHERE clause you like and even a limit 1,1 or whatever if you don't want to transfer any content but just the table structure.

mySQL rename table

One of the (few) downsides of  phpMyAdmin (at least the version I use) is that it doesn't support the mySQL rename table function.

It is very simple to use this function through the SQL tab in phpMyAdmin (or any other direct interface with mysql). Say you want to change the name of the table from "tbl_oldtable" to "tbl_newtable" then this would be the code:

RENAME TABLE tbl_oldtable TO tbl_newtable

Simple huh! Right with that duty to my readers and hopefully my natural search traffic... (these mySQL posts tend to do pretty well) I am off to drink more cabernet sauvignon :)

Code Performance is Important

The puzzles that a fair few geeky companies post on their jobs pages to try and vet/encourage great coders to complete them and apply for jobs are quite cool (I definitely don't fit in the great coder bucket). Recently I noticed a fair few of them are focussed on code performance and so thought I would share my experience of code performance harming my site.

The graph above is the exit rate for users visiting my display cocktail pages on my cocktail site (according to Google analytics which has pretty graphics but some data issues) against time. The initial decrease in exit rate is awesome and primarily due to implementing the cocktail recommendation engine. Suddenly in mid July there is a massive jump in exits, this coincided with a process I was running for my facebook app really screwing up and slowing down every query run on my database :(. I shut that query down (although didn't realize what was happening for a month) and (for the moment) have given up on my facebook app to concentrate on the main site. After shutting down the process performance and exit rates improved instantly.

I don't monitor the performance of the scripts on my site anywhere near enough. Last weekend I built tracking to monitor the ranking of cocktails on my site weekly, pull out fast moving cocktails and so on. I have started logging keyword searches on my site too as of last week. This weekend a key focus for me is producing a performance dashboard summarized by hour of day and day of year. I am going to start monitoring the efficiency of all changes I make and see where I need to make improvements.

Cocktailmaking.co.uk has grown 3x since last year. The site also gets 25% of it's visits in December with 10% of those on New Years' Eve. I need to make the site much much more efficient before that date comes or my servers will be a smoking mess this year and I'll capture none of the revenue that I should.

Code efficiency is really really important.

Cocktail Recommendation: Part 4 - the results

Wow did it work or what! The cocktail recommendation engine I produced earlier this year has been a fabulous success. I set up an AB test of the cocktail recommendation by randomly assigning my visitors to either see the cocktail recommendation engine or not and the main target was to reduce the bounce rates of visitors to my cocktail recipe pages (both buckets contain c. 1million page views).

The above graph shows you the overall impact of the test across all my cocktails with 100days of data. This impact is diluted because of the volume of cocktails for which I don't have enough data to produce cocktail recommendations so below is a list broken out by cocktail id (I have removed the names since work people see this blog and some are naughty). The cocktails in the list below are essentially the cocktails found as the first 10 here (you are warned... that page contains rude words) under the tab "Graph of cocktail recipes in order".

On average for the top 10 cocktails the recommendation engine reduced the bounce rate by 16% with some cases (esp. slippery nipple) having an impact as high as a 31% reduction in bounce rate for the page.

Understanding Cocktail Rating Distribution

Something interesting with my facebook application is that in the past week I have gathered >1000 ratings on the cocktails in my database from 745 unique users of my cocktail making application. This is a fairly small number for data analysis but 3 things stood out:

The most votes were given in the bucket of "10" - people seem to tend to vote on the cocktails they like.

The cocktails that were rated 10 tended to get more votes in general i.e. 3 votes per cocktail and not the average 1.5

When looking just at the users who gave a ten and what % of each voting bucket they made up it became clear that these users were pretty much all or nothing guys relative to the crowd, either they gave a cocktail a 10 or they gave it nothing.

More to come on user voting behaviour over time I am sure and also on the most popular cocktails, cocktail ingredients, favourite cocktails lists and so on... I now have far too much data to work through and that is great!

Cocktail Recommentation Engine: Part 3 - The First Results

The following table is a subset of the output results from the cocktail recommendation engine (be warned some rude words in some of the cocktail names). I have tried to drill the list down to just the best matches so bear in mind there are probably 30% more matches thrown out which appear to have statistical significance but I can find no reason for the relationship (beyond sometimes they are next to each other in the large cocktail A-Z list).

Cocktail One - the cocktail someone is on, Cocktail Two - the cocktail I recommend, Probability - the probability that if someone is on cocktail one and visits another cocktail cocktail two will be that cocktail. There had to be at least 10 relationships in total recorded between cocktail one and another cocktail in my db for a cocktail to make this table.

I am hoping that as I get more data this table will grow more accurate and I will have fewer "wtf?" moments looking at the list. I also want to add two more probabilities to the calculation:

  1. Probability if they visit 1 they will like 2
  2. Probability if I display 2 on 1 they will click on it

Anyway for now enjoy looking at the cocktail relationships and expect more data soon.

Cocktail One Cocktail two Prob.
The Refresher Tom Collins 0.76
throat cramp Toxic Waste 0.67
Raunchy island sex Red Headed Slut 0.5
three wise men Three Wisemen 0.5
Tomate Momisette 0.43
Wicked one Screwdriver 0.35
Amaretto Rose Amaretto Dream 0.33
Purple Haze Radiation 0.33
Strawberry shortcake StrawBerry Shortcake 0.33
Toxic Waste Titanic Sinkers 0.32
martini spiller Niks Shot of Aussie 0.32
The Orville three wise men 0.31
3 Headed Parrot A-Bomb 0.3
BABY GUINNESS Baby Guinness 0.3
Flaming Shooter Flatliner 0.29
tequila slammer!!!!!!!!!!! TEQUILA SUICIDE ! ! ! ! ! ! ! ! ! ! ! 0.28
see you at the hospital Tequila Sunrise 0.28
Absolut Nut Absolut Quaalude 0.27
Sex on the beach Sex On The Beach 0.27
suicide surfer on acid 0.26
Tom Collins Orange Fizz 0.24
Kick in the Willy martini spiller 0.24
Slippery Dick Slippery Nipple 0.24
Multiple Orgasm Multiple Screaming Orgasm 0.24
Absolut Stress absolutly mistafying 0.22
death cock sucking cowboy 0.22
A strawberry with milk A-Bomb 0.22
Cheeky Vimto Layered cheeky Vimto 0.21
Tequila Sunrise throat cramp 0.21
Sex On The Beach Sex on the beach 0.19
Screwdriver Bay Breeze 0.19
Arsonist awamawama ding dong 0.19
Absinthe Drinker Absinthe Minded 0.19
nipple twister Nymphomaniac 0.19
Jolly Green Giant Flaming Lamborghini 0.17
Flatliner Flaming Lamborghini 0.17
Zipper Zombie 0.17
Absolut Nut Level 28 0.17
Absinthe Drinker Royal Absinthe Fizz 0.17
Long island ice tea Long Island Iced tea 0.16
Slippery Dick cock sucking cowboy 0.14
blowjob Blow Job 0.14
Blow Job Clit Licking Cowgirl 0.14
Wet Pussy Wet Dream 0.13
Pina Colada Pina Colada (Virgin) 0.13
Screaming Orgasm Sex on the beach 0.13
Orgasm Cocktail Screaming Orgasm 0.13
B52 Blow Job 0.13
Slippery Nipple cock sucking cowboy 0.13
Sex on the beach (shooter) Sex on the beach 0.13
Black Russian Black Russian - proper 0.13
Brain Damage brain haemorage 0.12
Alabama Slamma Alabama Slammer 0.12
cock sucking cowboy Clit Licking Cowgirl 0.12
Mai Tai Mojito 0.12
Sex on the beach Slippery Nipple 0.12
Pornstar Purple rain 0.12
Pornstar Quick Fuck 0.12
Baby Guinness Bad Babysitter 0.11
Arsonist red , white & Blue 0.11
Baby Guinness A-Bomb 0.11
Baby Guinness Billy Badass 0.11
Screwdriver Wicked one 0.11
Angel's Kiss B52 0.1
Blow Job blowjob 0.1
Clit Licking Cowgirl A-Bomb 0.1
Clit Licking Cowgirl cock sucking cowboy 0.1
Flaming Lamborghini Deep Throat 0.09
A-Bomb Abbey cocktail 0.09
Sex On The Beach Sex on the beach (shooter) 0.09
Blue Lagoon blue lagoon special 0.08
Slippery Nipple Sex on the beach 0.08
Clit Licking Cowgirl Orgasm Cocktail 0.08
Clit Licking Cowgirl Deep Throat 0.08
Screaming Orgasm Sex On The Beach 0.08
B52 Baby Guinness 0.08
Screaming Orgasm Blow Job 0.07
Blow Job Blue Lagoon 0.07

Huge steaming piles of data (Project Cocktail Part 2)

Today has been quite challenging in project cocktail. The main issue is working out how much information to store and what granularity to summarize it on. Data is pouring in rapidly, I have defined over 5000 directional cocktail relationships and probably 10% of my cocktails within one day have at least one relationship I would define as relevant (I shall work on statistical significance fairly heavily later on).

The issue is that every row in my database is currently taking up 46bytes and I am adding c. 7.5k rows a day. The index on the database is then adding another 25% to this and there is little I can do here since I am required to create a primary key. Therefore daily I am creating 0.4MB of data in unsummarized form. The machine I am using has 100MB of storage and so I could store up to 250 days of raw (unsummarized) data or 2.17million rows (handy cos right now my cocktail DB could theoretically create 2.17million different directional relationships with different likelihood factors).

In the short term I am going to do nothing more than summarize on a month basis and depreciate the value of prior months over time but in the long term I want to know if recommendations should differ significantly by various user related variables and hence I want to store those variables so that in the future I can have a significant dataset to query and work out the impact of those variables. My target is to make this engine slightly scary at predicting what cocktail you might want to see next and to hit the scary threshold will take a little more than the #1 most likely relationship to deliver. So all things considered I have 0.5TB of data storage going spare right now in 5 mySQL databases... let's go for the big data :) and see what comes out!!!

Tomorrow I will be helping move 100 rowing boats so that my new boatclub can undergo an awesome renovation over the next 3wks which means I will have to take a break (and let the data gather into an even higher volume). Hopefully Sunday I will be able to do some analysis and maybe even build a mock up that shows what the output could start to look like. For now goodnight!

Recommendation Engine: Project Cocktail Part 1

Ok so how does one build a recommendation engine?

For the last year I have been tweaking improving and revising my cocktail recipes website with the prime goal of driving more traffic to the site. I have certainly acheived some of that with >1500 visitors daily on most days at the moment (still not huge but really massive growth). What is noticeable though compared to my paper airplanes site is a distinct lack of loyalty in my user base. With paperairplanes.co.uk ~32% of the visits daily are from returning users but with cocktailmaking.co.uk only ~9% are.

I need to make the site better and so my first attempt is to produce a cocktail recommendation engine. Phase one of the project just released to the site and I wanted to chat about that a little bit here.

I am logging every chain of two cocktails viewed by a visitor to my site within 2hrs. My thinking is that time spent and number of clicks browsing both degrade the correlation between cocktails in a user's browse path and so two cocktails viewed back to back in a short time period are likely to be very closely related. If I observe this interaction over millions of such pairs (data I can gather in just a few months) the most correlated cocktail combos should stand out from the mix at least for the top 10% of cocktails viewed on my site. That is my thinking and I will start to share the results on here as the data gathering proceeds.

The wonders of having a website with enough traffic to make analysis interesting :)

SELECT ... CASE expressions rock!!!

CASE expressions are a huge timesaving SQL feature that I have just recently fallen in love with. The specific example was trying to find how the activity of a class of users has declined over a period of twelve months. Without CASE this would have required writing and executing 12 different queries and then joining them into a table either by joining all those queries together or by pasting them into (in this case) excel. That's hard enough but what about 24hrs in a day for day parting analysis across 100000 keywords (far too big for excel <2007 that one ;) )

CASE selects each of those different queries I described into a column using a clause something like:

CASE month WHEN 2006M10 THEN revenue ELSE 0 END

For a table something like:

month userid revenue
2006M10 alex $500
2006M10 beth $402
2006M11 alex $220
2006M11 beth $170
2006M12 alex $52
2006M12 beth $50
2007M01 beth $1000

 The select may look like:

SELECT

      sum( CASE month WHEN 2006M10 THEN revenue ELSE 0 END) as oct06,

      sum( CASE month WHEN 2006M11 THEN revenue ELSE 0 END) as nov06,

      sum( CASE month WHEN 2006M12 THEN revenue ELSE 0 END) as dec06,

FROM

      tbl_userrevenue

WHERE

      userid IN ('alex', 'beth')

 

and the output would look like:

oct06 nov06 dec06
$902 $390 $102

 

Which in my mind is a pretty cool tool for analyzing anything where you want to look at discrete time intervals or users classes or almost anything else individually.

PHP mySQL close connection

So I have been creating lots of new features for me to maintain my cocktail making website. One feature has involved me setting up a new mySQL 5.0 database (with cool stored procedures I am only starting to understand now) in order to do some tracking of what is happening on the site and with my widget. This has required me to use two databases at once for the first time. One to check if a user is signed in with admin rights and the other to check the tracking out of the other database (php security is really tricky so I am slightly nervous posting even this here :s ).

However I did want to note how you close a connection with a mysql database using PHP in order to open one with another mysql database since it's really simple but I found it hard to get info. Here is an example (with $hostname, $username, $password and $databasename containing what they are named):

<?php$link = mysql_pconnect("$hosturl", "$username", "$password")

               or die("Could not connect: " . mysql_error());

$db = mysql_select_db($databasename,$link)

               or die ("Couldn't select database");

mysql_close($link);

?>

So when you open databases try closing them, I know I will in future for good practice, it makes everything much simpler if you add another database at a later stage.

My Photo

Recent Posts

Blog powered by TypePad

my flickr tag cloud