Google Code Jam Archive — Code Jam to I/O for Women 2018 problems

Our 2018 I/O contest was our biggest yet! Over 800 contestants solved at least one dataset and made it onto the scoreboard.

Burger Optimization looked complex but turned out to have a relatively straightforward solution. CEO Search could be solved in various ways, but all of those required a key "greedy" insight and a sensible way to apply it. Centrists could be solved by thinking through a small number of possibilities. Our D problem is often much tougher than the others, and Tricky Trios was no exception; it required probabilistic thinking and careful casework. Only 10 contestants solved its Small dataset, which was indeed tricky despite presenting only two new test cases, and only 6 solved the Large!

limli was the first to reach a perfect score, around the halfway mark of the contest. mkirsche and crashethereum followed up not too long after that to take second and third place.

Most contestants who made it into the top 150 did so by solving all of A, B, and C with a penalty time less than 2 hours and 30 minutes. Congratulations to our prize winners, and thanks to everyone for participating!

Cast

Problem A (Burger Optimization): Written by Pablo Heiber. Prepared by Jonathan Irvin Gunawan.

Problem B (CEO Search): Written by Ian Tullis. Prepared by Josef Ziegler.

Problem C (Centrists): Written by Pablo Heiber. Prepared by Jonathan Irvin Gunawan.

Problem D (Tricky Trios): Written by Pablo Heiber. Prepared by Ian Tullis.

Solutions and other problem preparation and review by Liang Bai, Lauren Mancino Gallagher, Md Mahbubul Hasan, Emily Miller, Trung Thanh Nguyen, Ray Robinson, and Mary Streetzel.

Analysis authors:

Burger Optimization: Jonathan Irvin Gunawan
CEO Search, Centrists, Tricky Trios: Ian Tullis

Problem

In 2017, Google learned about a serious Android bug: in the burger emoji, the cheese was directly on top of the lower bun, rather than on the patty itself. Really, who makes a burger that way? Sundar, our CEO, vowed to drop everything and address this issue immediately.

To prevent this sort of situation in the future, the Code Jam team has constructed a mathematical model for understanding burgers. A burger consists of a stack of K ingredients between two buns, with each ingredient appearing exactly once. We are interested in the distance-to-bun value of each ingredient. The distance-to-bun value of an ingredient is the minimum number of other ingredients between the that ingredient and a bun:

If K is even, then the distance-to-bun values for the ingredients (starting with the ingredient at the top of the stack) are: 0, 1, ..., K/2 - 1, K/2 - 1, ..., 1, 0.
If K is odd, then they are: 0, 1, ..., ((K - 1) / 2) - 1, (K - 1) / 2, ((K - 1) / 2) - 1, ..., 1, 0.

After carrying out a lot of focus group testing (and eating a lot of burgers), we have determined that the i-th of each of our K ingredients has an optimal distance-to-bun value of D_i. We think our burger emoji users will be happiest if we choose an ordering for our ingredients that minimizes the error value, which we define as the sum of the squared differences between each ingredient's optimal and actual distance-to-bun values.

For example, if we have five ingredients A, B, C, D, and E with optimal distance-to-bun values of 0, 2, 1, 1, and 2, and we place them between the buns in that order, then the error is (0-0)² + (2-1)² + (1-2)² + (1-1)² + (2-0)² = 6. If we instead place them in the order C, E, B, D, A, then the error is (1-0)² + (2-1)² + (2-2)² + (1-1)² + (0-0)² = 2, which turns out to be the minimum possible error for these ingredients.

Given the list of optimal distance-to-bun values for our ingredients, can you help us determine the smallest possible error?

Input

The first line of the input gives the number of test cases, T; T test cases follow. Each begins with one line containing an integer K: the number of ingredients in our burger. Then, there is one more line containing K integers D_i, the optimal distance-to-bun values of our ingredients.

Output

For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is the smallest possible error, as described above.

Limits

1 ≤ T ≤ 100.
Time limit: 20 seconds per test set.
Memory limit: 1GB.
0 ≤ D_i ≤ floor((K-1)/2), for all i. (Each optimal distance-to-bun value is within the range of attainable distance-to-bun values.)

Small dataset (Test set 1 - Visible)

1 ≤ K ≤ 8.

Large dataset (Test set 2 - Hidden)

1 ≤ K ≤ 100.

Sample

Input	Output
3 5 0 2 1 1 2 1 0 6 2 2 2 2 2 2	Case #1: 2 Case #2: 0 Case #3: 10

Sample Case #1 is the one illustrated in the problem statement.

In Sample Case #2, there is only one ingredient in the burger; that is not much of a burger, but our model has to be able to handle this base case! There is no confusion over how to place the one ingredient, and the error is 0.

In Sample Case #3, there are six ingredients, but all of them have an optimal distance-to-bun of 2. Any way of placing them is equivalent, and the error is 2² + 1² + 0² + 0² + 1² + 2² = 10.

Problem

The CEO of Code Jam has just retired to spend more time with The Art of Computer Programming, so we need your help finding a new one!

Every Code Jam employee has an experience level that is a nonnegative integer. When we hire our new CEO, we must organize the Code Jam team as follows:

Every employee other than the CEO must have a single direct manager who is another employee with an experience level greater than that employee's own experience level. (The CEO cannot have a direct manager.)
An employee (including the CEO) with experience level E can be a direct manager for between 0 and E other employees, inclusive. Note that if employee A is the direct manager of employee B, and B is the direct manager of C, A is not also a direct manager of C.
Because of office politics, the new CEO cannot be one of the existing employees, no other new employees can be added, and no existing employees can be removed.

Of course, hiring a more experienced CEO is more expensive! What is the minimum possible experience level for the new CEO such that the Code Jam team can be organized according to the rules above?

Input

The first line of the input gives the number of test cases, T. T test cases follow. Each begins with one line with an integer L: the number of different experience levels present among the existing employees. Then, L lines follow; the i-th of these contains two integers N_i and E_i, and indicates that there are N_i existing employees that have the experience level E_i.

Output

For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is the minimum possible experience level for the new CEO, as described above.

Limits

1 ≤ T ≤ 100.
Time limit: 20 seconds per test set.
Memory limit: 1GB.
For all i < j, E_i < E_j.

Small dataset (Test set 1 - Visible)

1 ≤ L ≤ 10.
1 ≤ N_i ≤ 10.
The sum of all N_i ≤ 10. 0 ≤ E_i ≤ 10.

Large dataset (Test set 2 - Hidden)

1 ≤ L ≤ 1000.
1 ≤ N_i ≤ 10¹².
0 ≤ E_i ≤ 1000.

Sample

Input	Output
2 3 2 0 2 2 1 3 1 5 0	Case #1: 4 Case #2: 5

In Sample Case #1, there are five existing employees: one with an experience level of 3, two with an experience level of 2, and two with an experience level of 0. We can hire a new CEO with an experience level of 4; then, for example, we can have the new CEO directly manage the level 3 and one level 0, and have the level 3 directly manage the two level 2s and the other level 0. (Other valid arrangements are possible.) Moreover, we know that the new CEO must be at least level 4, or else there would be nobody who could directly manage the existing level 3. So 4 is both an upper and lower bound, and must be the correct answer.

In Sample Case #2, all five of the existing employees have an experience level of 0 and cannot directly manage other employees. The new CEO must personally directly manage all of them, which requires an experience level of at least 5.

Problem

Exactly three candidates are running for office. Each candidate's name is a single string of length L that is made up only of uppercase letters of the English alphabet, and no two candidates have the same name.

This area has a law to ensure that candidates will not consistently be at an advantage or disadvantage based on their names. Before each election, an ordering of the English alphabet will be chosen from among all possible orderings, and that ordering will be used to alphabetize the names before listing them on the ballot. (To determine which of two different names comes earlier in the list on the ballot, start by comparing the first letter of each name. If they are different, then the name with the first letter that comes earlier in the chosen ordering is the one that comes earlier in the list on the ballot. If they are the same, move on to comparing the second letter of each name, and so on.)

Each of the three candidates wants to adopt a "middle-of-the-road" image, and so they think that being in the middle of the list of three names on the ballot will be advantageous. For each candidate, determine whether there is at least one ordering of the English alphabet for which their name would be the second of the three names on the ballot. Note that we are considering each candidate independently.

For example, suppose that our candidates are named BCB, CAB, and CBC. Since the letters D through Z are not used in these names, we will only consider the relative ordering of the letters A, B, and C. If the ordering A, B, C is chosen, for example, then the candidates will be listed in the order BCB, CAB, CBC; this demonstrates that it is possible for CAB to be in the middle. If the ordering A, C, B is chosen, then the candidates will be listed in the order CAB, CBC, BCB; this demonstrates that it is possible for CBC to be in the middle. However, even under any of the other four possible orderings, it turns out that BCB can never be in the middle.

Input

The first line of the input gives the number of test cases, T; T test cases follow. Each begins with one line containing one integer L: the length of each candidate's name. Then, there is one line with three different strings N_i of uppercase letters of the English alphabet; the i-th of these is the name of the i-th candidate.

Output

For each test case, output one line containing Case #x: y1 y2 y3, where x is the test case number (starting from 1) and each yi is YES if it is possible for the i-th candidate to be in the middle, as described in the problem statement, and NO otherwise.

Limits

1 ≤ T ≤ 100.
Time limit: 20 seconds per test set.
Memory limit: 1GB.
1 ≤ L ≤ 100.
N_i is of length L, for all i.
N_i ≠ N_j, for all i ≠ j.

Small dataset (Test set 1 - Visible)

N_i contains only uppercase letters from the set {A, B, C}, for all i.

Large dataset (Test set 2 - Hidden)

N_i contains only uppercase letters from the English alphabet, for all i.

Sample

Input	Output
3 3 BCB CAB CBC 2 CC CA AA 6 MEDIAN MEDIAL MEDIAS	Case #1: NO YES YES Case #2: NO YES NO Case #3: YES YES YES

Note that the last sample case would not appear in the Small dataset.

Sample Case #1 is the one described in the problem statement.

In Sample Case #2, no matter which of the two possible relative orderings of A and C is chosen, CA will be in the middle.

In Sample Case #3, any of the names can end up in the middle, depending on which of L, N, and S comes second in the relative order of those three letters in the ordering.

Problem

The game of Tricky Trios is played using a deck of 3N cards consisting of three identical cards labeled 1, three identical cards labeled 2, and so on, up to three identical cards labeled N. The cards are shuffled (such that all possible card orderings have an equal probability of appearing), and then dealt out onto a table, face down, so that all the numbers are hidden.

Each round of the game proceeds as follows:

Choose one of the cards and flip it over to reveal its number.
Choose a second card and flip it over to reveal its number. If that number is not equal to the revealed number on the first card, the round is over and you may not flip a third card. Otherwise:
- Choose a third card and flip it over to reveal its number. If that number is not equal to the revealed number on the second card, the round is over. Otherwise, you have found a trio, and you can remove all three cards from the game; then the round is over.
Once the round is over, if there are no more cards remaining, you have won the game. Otherwise, before beginning the next round, you must flip all revealed cards back over to hide their numbers, but you have an amazing memory and you can remember where they are for the rest of the game.

Note that you may choose to flip a card even if you already know its number. Also, even if you know the locations of all of the cards in a trio, you must actually flip all three cards in the trio on the same round in order to remove it.

You would like to win as quickly as possible, so you will use a strategy that minimizes the expected number of rounds needed to end the game. What is that expected number of rounds?

Input

The first line of the input gives the number of test cases, T; T test cases follow. Each consists of one line with an integer N, as described above.

Output

For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is a rational: the minimal expected number of rounds needed to end the game, as described above. y will be considered correct if it is within an absolute or relative error of 10^-6 of the correct answer. See the FAQ for an explanation of what that means, and what formats of real numbers we accept.

Limits

Time limit: 20 seconds per test set.
Memory limit: 1GB.

Small dataset (Test set 1 - Visible)

1 ≤ T ≤ 5.
1 ≤ N ≤ 5.

Large dataset (Test set 2 - Hidden)

1 ≤ T ≤ 100.
1 ≤ N ≤ 100.

Sample

Input	Output
3 1 2 5	Case #1: 1.000000 Case #2: 3.400000 Case #3: 9.842024

In Sample Case #1, all three cards have the same number, so flipping them over in any order will end the game in one round.

In Sample Case #2:

If the first two cards we flip over are different, our round ends and we cannot flip over a third card. Then, we can use the next round to flip over two more of the unknown cards.
- If they match, then we already know where the remaining third card is, and then we can use one more round to flip over the remaining trio, taking three rounds total. The chances of this scenario are 3/5 × 1/3 = 1/5.
- Otherwise, our second round ends, but once we have flipped over another unknown card on the third round, we know how to finish building both trios, taking four rounds total. The chances of this scenario are 3/5 × 2/3 = 2/5.
If the first two cards we flip over are the same, we leave the details as an exercise for the solver.

The answer is 3 × 1/5 + 4 × 2/5 + ... = 17/5.

Burger Optimization: Analysis

Small dataset

This dataset can be solved using complete search. Since K ≤ 8, there are only at most 8! = 40320 different orderings of ingredients.

Therefore, we can try all K! different orderings of ingredients and find which ordering of ingredients has the minimum sum of squared differences between each ingredient's optimal and actual distance-to-bun values. Since we can calculate the sum of the squared differences between each ingredient's optimal and actual distance-to-bun values for one particular ordering of ingredients in O(K) time, this solution runs in O(K! × K) time.

Large dataset

We need an observation to solve this dataset. Suppose we have two ingredients A and B to be placed at two positions X and Y. Let ingredients A and B have optimal distance-to-bun values of D_A and D_B, respectively, and let positions X and Y have actual distance-to-bun values of D_X and D_Y, respectively. Suppose D_A ≤ D_B and D_X ≤ D_Y. We claim that it is no less optimal to put ingredient A at X and to put ingredient B at Y.

Why? Let us assume that D_B = D_A + K₁ and D_Y = D_X + K₂ for some non-negative integers K₁ and K₂. Therefore, the cost of putting ingredient A at X and putting ingredient B at Y is (D_A - D_X)² + (D_B - D_Y)², and the cost of putting ingredient A at Y and putting ingredient B at X is (D_A - D_Y)² + (D_B - D_X)².

We can derive that

[(D_A - D_X)²] + [(D_B - D_Y)²]
= [D_A² - 2D_AD_X + D_X²] + [D_B² - 2D_BD_Y + D_Y²]
= [D_A² - 2D_AD_X + D_X²] + [D_B² - 2(D_A + K₁)(D_X + K₂) + D_Y²]
= [D_A² - 2D_AD_X + D_X²] + [D_B² - 2D_AD_X - 2D_AK₂ - 2K₁D_X - 2K₁K₂ + D_Y²]
= [D_A² - 2D_AD_X - 2D_AK₂ + D_Y²] + [D_B² - 2D_AD_X - 2K₁D_X + D_X²] - [2K₁K₂]
= [D_A² - 2D_A(D_X + K₂) + D_Y²] + [D_B² - 2(D_A + K₁)D_X + D_X²] - [2K₁K₂]
= [D_A² - 2D_AD_Y + D_Y²] + [D_B² - 2D_BD_X + D_X²] - [2K₁K₂]
= [(D_A - D_Y)²] + [(D_B - D_X)²] - [2K₁K₂]

Therefore, (D_A - D_X)² + (D_B - D_Y)² ≤ (D_A - D_Y)² + (D_B - D_X)², since 2K₁K₂ is a non-negative integer. Therefore, we have proved that it is no less optimal to put ingredient A at X and to put ingredient B at Y. In other words, we want to put an ingredient which has a smaller optimal distance-to-bun value in a position which has a smaller actual distance-to-bun value.

Based on the above observation, the solution to this problem becomes simple. Let D' be the list of actual distance-to-bun values for the ingredients sorted in non-decreasing order (e.g. if K is 7, then D' = {0, 0, 1, 1, 2, 2, 3}), and let D be the list of optimal distance-to-bun values for the ingredients sorted in non-decreasing order. Using the observation above, it is optimal to put the i-th ingredient on the list of D to a position which has an actual distance-to-bun value of D'[i]. Therefore, the answer is simply Σ (D'[i] - D[i])².

We can sort the ingredients in O(K log (K)) time and calculate the sum of the squared differences between each ingredient's optimal and actual distance-to-bun values in O(K) time. Therefore, this solution runs in O(K log (K)) time.

(For a similar approach, you might enjoy reading about the rearrangement inequality).

CEO Search: Analysis

Small dataset

In the Small dataset, there are at most 10 existing employees, so one viable approach is as follows. We will start by supposing that the new CEO has the minimum experience level possible: one level higher than the highest level among all existing employees. Then, we can check all possible assignments of employees to potential managers of higher levels than those employees. (Throughout this analysis, for convenience, we will use "manager" to mean "direct manager"). If any of the assignments is valid — that is, nobody is managing too many employees — then we have solved the problem. Otherwise, we try again with a CEO of one level higher, and so on. This process is bounded, since the answer cannot possibly be larger than 11; the existing employees are all level 10 or lower, so a level 11 CEO could manage all of them personally.

At first, it may seem that the search space is too large. When we have one employee of each level between 1 and 10, for example, and we try to add a new CEO of level 11, there are 10! possible sets of assignments. (The level 1 employee has 10 potential managers, the level 2 employee has 9 potential managers, and so on.) But the cases like this one with large search spaces are also easy to solve; note that any assignment is valid in our example, since it is impossible to give a manager too many employees. On the other hand, when we have a case like five level 0s, four level 1s, and one level 2, and we try to add a new CEO of level 3, it turns out that no set of assignments is valid... but there are so few possible sets of assignments that it is easy to check them all and move on to trying a new CEO of level 4.

Large dataset

Suppose that we have chosen a new CEO of some level (which may or may not turn out to work), and we have started trying to assign managers in some way; not everyone necessarily has a manager yet. Suppose that an employee A of level L_A is managing an employee B of level L_B, and there is some other employee C of level L_C (not managed by A) such that L_A > L_C > L_B. Then we can safely have A manage C instead of B. If employee C previously had a manager D, that manager is of a high enough level to manage employee B, so we can have D manage B and we are no worse off than before. If employee C did not yet have a manager, then we are better off than before. The same logic holds if employee B stands for an empty space rather than an employee.

This observation suggests a top-down approach to the problem. We can work from the highest-level employees to the lowest-level ones, and apply a simple rule: each employee should manage as many other employees as they can, and those employees should be of the highest levels available among all remaining unmanaged employees.

We can simplify this approach by thinking in terms of levels instead of employees. We will start at the top level, which has the new CEO of level X; the new CEO creates X management "slots". Then, let us consider the next highest level, which has N_L employees of level E_L. If N_L > X, then our plan does not work and we need a CEO with a higher level. Otherwise, we have used up N_L of our X slots, but the employees from this level have added N_L × E_L more slots. We can continue in this way, abandoning a plan whenever we do not have enough slots to manage all of the employees at a given level.

For the Large dataset, the CEO level may be as large as the largest possible N_i value plus 1, so we may not have time to check all of them. We can speed up the process by binary searching on the level of our new CEO, with an inclusive lower bound equal to E_L + 1, and an inclusive upper bound equal to the lower bound or the maximum N_i allowed by the problem, whichever is greater. (If there are 10¹² level 0 employees, for example, the answer could be as large as 10¹².) If we try a possible value and find that it works, we can make that value our new upper bound; if a value does not work, we can set the lower bound to that value plus 1. Each search takes O(L) time, and the binary search adds another factor of log(max(N_i)), so the running time is O(L × log(max(N_i))).

This is fast enough to solve the Large dataset, but we can do even better by cutting out the binary search entirely! We can proceed through the levels as before, but without including the new CEO or trying to guess their level. Whenever we encounter employees with no possible managers, we add them to a count and then move on. (This is definitely the case for every employee of level E_L, and it may be true of employees at other levels as well.) The final answer is then either that count (since all of those unmanaged employees must be personally managed by the new CEO), or E_L + 1, whichever is greater.

It is also possible to work from the bottom up instead, using the same idea, or to apply Hall's theorem to get a closed-form expression of the solution; we leave those as exercises. In any case, our running time is now O(L).

Centrists: Analysis

Small dataset

In the Small dataset, the candidates' names can only contain the letters A, B, and C, so we can effectively disregard the rest of the alphabet. That is, since there are no Ds anywhere in any of the names, for example, we do not care whether A comes before or after D in an ordering of the alphabet. Only the relative order of A, B, and C matters, and there are only 3! = 6 such orders, so we can check each one of them and see how the three names get sorted in each case. Then, we can report YES for any name that appeared in the middle under at least one ordering, and NO for any name that did not.

Large dataset

The English alphabet has 26! possible orderings; this is close to 4 × 10²⁶, which is far too many for us to check individually. We will take another approach. Let us examine the first letter of each name. If all three of those letters are the same, then they cannot influence the order in which the names get sorted, so we can move on to looking at the second letters, and so on. Eventually, we will find an index at which the letters are not all the same; there are two ways in which this can happen.

The first of these — all three are different — is the easiest to deal with. Let L_i denote the letter at that index in the i-th name. Then, if we choose an alphabet ordering in which L_i falls somewhere between the other two letters, the i-th name will be in the middle. So the answer is YES for all three names.

Otherwise, two of the letters at that index are the same, and one is different. Call the shared letter L_s and the different letter L_d, and call the name with L_d at that index N_d. We can already see that N_d can never end up in the middle. If L_s < L_d (where < means "comes before in the alphabet ordering"), then the other two names will come before N_d. (Other differences that come later in the names do not influence this.) Otherwise, N_d will come before the other two names.

What about the order of the other two names? Let us scan through them, starting just after our aforementioned index, looking for the earliest index at which the letters of these names are different. Call the differing letters at that index L₁ and L₂, and call the names containing those letters (respectively) at that index N₁ and N₂. Let us consider the various possible identities of L₁ and L₂.

Suppose that L₁ = L_s and L₂ = L_d. Then, if L_s < L_d, our name order will be N₁, N₂, N_d. Otherwise, it will be N_d, N₂, N₁. So N₁ cannot be in the middle, but N₂ can. A similar situation holds for L₁ = L_d and L₂ = L_s.

Otherwise, regardless of the identities of L₁ and L₂, either of N₁ and N₂ could be in the middle:

Suppose that L₁ (without loss of generality) = L_d. We can choose an ordering in which L_d comes first to get the order N_d, N₁, N₂, or we can choose an ordering in which L_s < L_d < L₂ to get the order N₁, N₂, N_d.
Suppose that L₁ (without loss of generality) = L_s. We can choose an ordering in which L_s comes first to get the order N₁, N₂, N_d, or we can choose an ordering in which L_d < L_s < L₂ to get the order N_d, N₁, N₂.
Otherwise, we can choose an ordering that begins with L_d (which puts N_d first), and has L₁ and L₂ in the order in which we want N₁ and N₂ to appear.

Tricky Trios: Analysis

Playing the game

Before we can approach either of the datasets, we need to figure out an optimal strategy for the game. One important insight is that we should not begin a round by flipping over a known card, unless we know where all three cards in a trio are (in which case we can flip them all to remove that trio).

To see why this is, suppose that we begin by flipping the one or two known cards with a certain number. Then, when we move on to flipping unknown cards, we either find the one or two remaining cards we need, or we fail and learn the identity of only one more card. But if we instead begin by flipping an unknown card, we have exactly the same chance of encountering the cards needed to remove the aforementioned trio, in which case we can end the round by flipping the known ones. We also have some other advantages: we have a chance of removing any of the other trios, and if we fail to do that, we will learn the identities of two more cards. So, beginning with an unknown card is strictly better than beginning with a known card; it leaves more options open and gets us more valuable information.

What about the parts of a round other than the beginning? There is one situation in which we have some flexibility: if the unknown card revealed on our first flip matches one known card, there is no harm in flipping over that known card as our second action before flipping over an unknown card as our third action. But it is no worse to flip over an unknown card as our second action instead, since we need to find the last member of that trio either way.

So, the following simple rule is optimal: only flip known cards when they are the last ones needed to remove a trio.

Small dataset

This is an unusual dataset for Code Jam: there are only five possible test cases, and three of them are given away as samples, so we can hardcode those in and only focus on solving the N = 3 and N = 4 cases.

Notice that whenever we flip an unknown card, we have no basis for choosing any particular unknown card. Without loss of generality, we can choose to reveal them from left to right. With this in mind, one tempting approach is to repeatedly simulate playing the game, each time choosing a random left-to-right order in which to deal with the cards, and then take the average number of rounds. The tight requirement of an absolute or relative error of 10^-6 makes this challenging, though; even millions of simulations might not get us close enough! We can let a simulation run for quite some time and get answers before starting the 4-minute submission window, but even optimized code may not be fast enough, especially in an interpreted language, for example.

Instead of simulating random left-to-right orders, can we enumerate them all, play through each one, and then take the average number of rounds? For our N = 4 case, there are a total of (12 choose 3) × (9 choose 3) × (6 choose 3) × (3 choose 3) = 369600 orders, which is a tractably small amount. Although it is not necessary, we can cut this down by another factor of N! by noting that the numbers on the cards are essentially interchangeable; an order like 111222333444 will yield the same result as an order like 222333444111. This drops the number of cases to 15400 for N = 4, and 1401400 for N = 5. Beyond that point, though, there are just too many orders to consider individually.

Large dataset

At the start of each round of the game, for each trio that we have not already removed, we know the locations of three, two, one, or zero of the cards, depending on what we have flipped over on earlier rounds. Let us classify trios accordingly as Three-Known, Two-Known, One-Known, or Zero-Known. Then we can consider the number of trios of each type — (K₃, K₂, K₁, K₀) — as a state, and think of a round as a transition from one state to another state. We begin the game at (0, 0, 0, N), and we want to get to (0, 0, 0, 0) as efficiently as possible, but our path will depend on how lucky we get when flipping over unknown cards.

Let us use f(state) to denote the question "how many rounds will it take, on average, to finish the game when starting from this state?" To calculate f(state), we must consider all the states in which the round can end, and the probabilities of reaching each of those destination states. Then f(state) is a sum of the f()s for those destination states, weighted by the probability of reaching each of them.

We can simplify this model at the outset by noting that if we play according to our strategy above, K₃ will never be larger than 1, and if it is 1, we should spend the next round to immediately remove the Three-Known trio. That is, f(K₃, K₂, K₁, K₀) = 1 + f(K₃ - 1, K₂, K₁, K₀). So we can remove the K₃ term and work with states of the form (K₂, K₁, K₀), and add in an extra round as needed to reflect removing a Three-Known trio.

For example, suppose that we start a round in a state (1, 1, 1); this means that there are six unknown cards remaining, with one from a Two-Known trio, two from a One-Known trio, and three from a Zero-Known trio. (Remember that there are other known cards around as well — for instance, the other two cards from the Two-Known trio — but we are focusing on the unknown cards.) Let us consider one possible scenario: our first draw turns out to be from the One-Known trio, and then our second draw turns out to be from the Two-Known trio, which ends the round. This converts our Two-Known trio into a Three-Known trio and our One-Known trio into a Two-Known trio. We should immediately spend the next round to remove the Three-Known trio.

The probability of the above happening is 2/6 (the odds of drawing one of the unknown cards from the One-Known trio) × 1/5 (the conditional odds of drawing the unknown card from the Two-Known trio). So, when we write the expression for f(1, 1, 1), it should include the term 2/6 × 1/5 × (2 + f(1, 0, 1)). The 2 represents the current round plus the extra round spent removing the Three-Known trio. Note that (1, 0, 1) reflects that we have converted a One-Known trio into a Two-Known trio and lost another Two-Known trio.

The above illustrates the calculation of just one term, and our solution needs to be able to handle any state; the more general version of the term above would be ((2 × K₁) / (K₂ + 2 × K₁ + 3 × K₀)) × (K₂ / (K₂ + 2 × K₁ - 1 + 3 × K₀)) × (2 + f(K₂, K₁ - 1, K₀)). Setting up terms like this is the heart of the problem, and there are various possible pitfalls, e.g.:

If our first flip on a round reveals a card from a Zero-Known trio, our second flip might reveal a card from any of the following: a Two-Known trio, a One-Known trio, the same Zero-Known trio, or a different Zero-Known trio. It is important to distinguish between the latter two possibilities.
We must avoid considering terms that represent impossible situations; we cannot draw a card from a different Zero-Known trio if there is only one Zero-Known trio in our starting state.

Once we have all this nailed down, we need a way to avoid computing f() for the same state more than once. We can store each result and then look it up again later if we need it for a future calculation, instead of doing redundant work. This is memoization, a form of dynamic programming. This optimization allows the entire Large dataset to be solved in seconds.

Test set 1: 786 correct solutions (97.0% solve rate)

First
limli	3:59
mkirsche	5:21
msh97	6:22
mnf	7:41
Ghada..	7:52

Test set 2: 676 correct solutions (83.5% solve rate)

First
limli	3:59
mkirsche	5:21
msh97	6:22
mnf	7:41
Ghada..	7:52

Test set 1: 448 correct solutions (55.3% solve rate)

First
alexwice	13:47
msh97	14:56
mkirsche	16:59
limli	17:50
Taube	18:27

Test set 2: 306 correct solutions (37.8% solve rate)

First
alexwice	13:47
msh97	14:56
mkirsche	16:59
limli	17:50
Taube	18:27

Test set 1: 302 correct solutions (37.3% solve rate)

First
alexwice	21:25
mkirsche	32:31
msh97	33:43
toolate	34:16
limli	35:46

Test set 2: 214 correct solutions (26.4% solve rate)

First
alexwice	21:25
mkirsche	32:31
msh97	33:43
toolate	34:16
limli	35:46

Test set 1: 10 correct solutions (1.2% solve rate)

First
limli	76:07
mkirsche	90:20
alexwice	97:14
ishu05	125:26
Limetka	126:06

Test set 2: 6 correct solutions (0.7% solve rate)

First
limli	76:07
mkirsche	90:20
alexwice	97:14
zhanaratanir	143:05
stacy992	144:12

Google Code Jam Archive — Code Jam to I/O for Women 2018 problems

Overview

A. Burger Optimization

Problem

Input

Output

Limits

Small dataset (Test set 1 - Visible)

Large dataset (Test set 2 - Hidden)

Sample

B. CEO Search

Problem

Input

Output

Limits

Small dataset (Test set 1 - Visible)

Large dataset (Test set 2 - Hidden)

Sample

C. Centrists

Problem

Input

Output

Limits

Small dataset (Test set 1 - Visible)

Large dataset (Test set 2 - Hidden)

Sample

D. Tricky Trios

Problem

Input

Output

Limits

Small dataset (Test set 1 - Visible)

Large dataset (Test set 2 - Hidden)

Sample

Analysis — A. Burger Optimization

Burger Optimization: Analysis

Small dataset

Large dataset

Analysis — B. CEO Search

CEO Search: Analysis

Small dataset

Large dataset

Analysis — C. Centrists

Centrists: Analysis

Small dataset

Large dataset

Analysis — D. Tricky Trios

Tricky Trios: Analysis

Playing the game

Small dataset

Large dataset

Statistics — A. Burger Optimization

Test set 1: 786 correct solutions (97.0% solve rate)

Test set 2: 676 correct solutions (83.5% solve rate)

Statistics — B. CEO Search

Test set 1: 448 correct solutions (55.3% solve rate)

Test set 2: 306 correct solutions (37.8% solve rate)

Statistics — C. Centrists

Test set 1: 302 correct solutions (37.3% solve rate)

Test set 2: 214 correct solutions (26.4% solve rate)

Statistics — D. Tricky Trios

Test set 1: 10 correct solutions (1.2% solve rate)

Test set 2: 6 correct solutions (0.7% solve rate)