How is sample size calculated for A/B tests?
I have been unable to find enough detailed information on how google recommends when it's OK to end a test.
On my recent test I felt the sample size collected was not high enough and so left the test running more days, during that time it went from 'a clear leader has been found' back to 'we're still collecting data' and then back again to 'a clear leader has been found'. I want to know how reliable this result is. To do that I need to know how sample size needed is calculated (i.e. a power calculation). What parameters are used? This would also help users to know how long they will likely need their test to run for based on their daily traffic.