
Introductory Problem: Spreadsheet SolutionsI've attacked this problem numerically and analytically. The analytical solution to some of the questions is here. The numerical solutions are contained in a spreadsheet in Microsoft Excel format. I've tried to be as transparent as possible about the layout and formulas in the spreadsheet, but I've put together a brief overview here. The questions which have numerical solutions are: Smarties® come in six different colors and there are fifteen candies in each roll.

In the spreadsheet, formulas which are copied into a range are colored darker than the range into which they are copied. So, for example, A8 is copied into A8:A117 and B8 is copied into B8:F117. Similarly, T7:W7 is copied into T7:W116. I think this convention should help others to understand how the spreadsheet works.
I took two approaches to the numerical solutions. The first these was combinatorical. In this approach, I asked how the candies could be colored, without respect to the order of the colors, or to exchange of colors. For example, all fifteen candies could be the same color, or there could be three colors with three candies each and three colors with two candies each. In colums A–F I generated the possible distributions. In each line, I consider the previous line and find the next possible solution, such that the solutions are sorted according to the largest number of candies of the same color, then by the second largest number of the same color, etc.
This means that I was generating all i, j, k, l, m, n where i + j + k + l + m + n = 15, where i is greater than or equal to j and j is greater than or equal to k, etc. To generate the next distribution in the list, I figure out which column will be decremented by one. It will be the rightmost column which has at least one valid combination for the columns further to the right.
There is no way to decrement the rightmost column by one and keep the columns to the left of that unchanged, because that would change the sum of all columns (which is fixed at 15). The second rightmost column can be decremented by one if the rightmost column can be incremented by one and still be less than or equal to the second rightmost column. Specifically if m – n >1, then the next solution is i, j, k, l, m–1, n+1.
In general, you can never decrement a column unless it is greater than the next column to the right, and you can always decrement a column if it is two or more greater than the column to the right. If the column is exactly one greater than the column to the right, then it can be decremented if and only if there is some place further to the right, which can be incremented. This is tracked in Columns H–L. Column L is equal to 1 if Column E can be decremented. This signals that Columns A–D will not change. Similarly, Column K is equal to 1 if Column D is to be decremented, signalling that Columns A–C will not change. If a column in H–L is set equal to 1, then all columns to the left of that column are set equal to 2.
If Column E cannot be decremented, then Column L is set to zero or one. It is equal to one if Column F is less than Column E and is equal to zero if Column F is equal to Column E. If Column L is equal to one, it indicates that Column F can be incremented if, for example, Column D is one more than Column E. Similarly if Column K is equal to one, it indicates that Column E or F could be incremented.
Column T calculates the possible color combinations for the distribution in Columns AF. For example, 15, 0, 0, 0, 0, 0 could mean 15 red or 15 orange or 15 yellow, etc., so there are six different color combinations for that arrangement. If all six numbers were different, then there would be 6! different color combinations, since you could assign any color to any number. This only happens for one of the arrangements (5, 4, 3, 2, 1, 0). In all of the other arrangements, there is at least one duplicated number in Columns AF. Since switching the colors associated with identical counts doesn't change the solution, we have to divide 6! by the ways that the colors can be rearranged without creating new solutions.
If there are k colors with the same count, then there are k! different ways to rearrange those colors without changing the overall arrangement. The factorials are calculated using the workspace in Columns N–S. It is probably easier to look at how it is done than it is to explain it in more detail.
Column U counts the number of different ways that the fifteen candies can be ordered. For example, if all fifteen candies are the same color, then there is only one way to order them (RRRRRRRRRRRRRRR, for example.) If, however, there are fourteen red candies and one orange candy, then there are fifteen different ways to order them. (RRRRRRRRRRRRRRO, RRRRRRRRRRRRROR, etc.)
Column V is the total number of combinations (out of 15!) which correspond to the distribution in Columns A–F. Column W counts the total number of different colors (nonzero) in the distribution, and Columns X–AC simply count them up for different values of k from one through six. The total number of combinations for each value of k are added up in Row 118 and the percentages are calculated in Rows 120 and 5.
The combinations where there are at least two candies of each color are added up in Column AD.
This approach didn't seem to offer an easy way to determine the probability of getting six candies in a row where each candy was a different color. As a result, I came up with a different approach, in which I consider each of the fifteen positions in turn and maintain certain stats along the way. Basically, I calculate the probability that the positions I have already examined will have certain characteristics. To answer the remaining question about the probability of getting six different colors in a row, I only needed to track the number of different candies in a row at each point, but this approachi also offered a relatively easy way to check the earlier work.
I chose to track both the number of colors which had been seen (k) and the number of different candies in the current sequence (p). For example ROYOG would have four different colors (k) and a current sequence (p) of three. The number of colors in the current sequence must obviously be no greater than the total number of colors. Furthermore, I'm trying to calculate the probability of getting six in a row, so I let pk = 66 correspond to any sequence where there have been six different candies in a row. For example RROYGVWY would be categorized as pk = 66, not pk = 46.
There are 21 possible states, which are listed in Cells AG6:BA6. As you step through the roll, you have a (6 – k)/6 chance of going from k to k + 1. Similarly, you have a (6 – p)/6 chance of going from p to p + 1. If you don't go from k to k + 1, then you stay at k, so the chance of staying at k is k/6. But if you don't go from p to p + 1, then you have an 1/6 chance of going to any value from one through p. The exception is for p = 6 which always stays at p = 6. These rules are coded into the formulas in AG8:BA8 and copied into AG8:BA21.
The initial distribution is entered as a percentage by putting 100 into Cell AG7, but it could be entered as a total number of combinations by starting with 15! instead. In any case, Cells AH7:BA7 are zero.
I didn't come up with any easy way to use this second approach to determine the probability that there are at least two candies of each color, but it can be done manually, since there are so few combinations. Cell BA gives the probability (relative to Cell AG7) that there are at least six different candies in a row somewhere in the roll.
If you have a more elegant approach, I'd love to hear about it. Send all responses to
Please send responses to
Thanks,
Steve