Sunday, July 18, 2010

Practicing Theory

I'm (hopefully) starting to practice for the upcoming ACM regional (East Central North America). Upcoming being 3 months away. After reading Outliers and an interview on the Freakonomics blog I want to change how I practice a little bit. The big change is the concept of "deliberate practice" where rather than just doing something you focus on improving specific parts. One problem is that I don't really know what I need to get better at so I decided to start collecting data and see what comes out of it.

I made a program which allows me to record what I am doing when I solve problems. I'll probably refine what data I collect as I get a better idea of what is going on. I started with the Arab and North Africa regional from 2009 which ended up being a really good problem set. Interesting problems that were not horrible to code.

A-E were very straightforward. I should have had B way faster but I couldn't code GCD and had my formula way wrong. F should have been easy but I missed a trivial case which took a while to find. G was actually a fairly hard problem which I may have only solved due to weak data. H was a interesting somewhat greedy problem. Basically you can reduce the problem from each player having tons of moves to only having 3 possible optimal moves and then brute force the entire game. I was just coding up matrix inversion and multiplication. It should have been way faster but I had some silly errors. J was a good max flow problem. I ended up having a bug in how a created the flow graph.
I think a key metric for ACM competitions is the time from when you start coding to when you get your first submission. Overall I was decent but not great. H, I, and J should all be 5-15 minutes faster. What is bad is the extra debug time on F and J simple mistakes caused me to double the amount of time spent on the problem. On F I missed a case (where all of the points were in the second disk) and on J I screwed where each artifact's critical points are.

These kind of mistakes are hard to track down because I'm much more focused on the hard parts of the code than the easy places. My first pas
s of debugging should be focused on the hard part but after that I need to look at everything else.

This is still a small dataset so I need to do many more problems before I get a good idea of what I'm doing wrong.

Thursday, November 5, 2009

The Fumbling Bumbling ACM ECNA Regional

The test data for the regional was finally published. One test case for problem B violated the input spec. My team (CMU Tartans) solved that case "correctly" (whatever that means for an invalid input) while Waterloo Black and CMU Dragons solved it "incorrectly". Those runs will have to be rejudged and the scoreboard recalculated.

To preface this I will say that I am pissed because there is a good chance that after being a regional winner for 5 days there is a good chance we won't go to world finals. Right now it looks like Waterloo got 1st, CMU Dragons got 2nd, CMU Tartans got 3rd, and Michigan got 4th.

The first big mistake was not checking the test data. I helped create the problem set for the CMU Invitational. For all of the problems I wrote I made my solution check the input format and verify that all of the test cases were valid. It is a simple sanity check to make sure the problem specification matches the test cases.

The second big mistake was not checking the test data right after the contest. The problem decided world finals spots its probably worth checking. If they had asked either of the two top teams that missed the problem with many resubmits they would have been told that there is a decent chance of a bug in the test data. If they had looked at the one test case which everyone was failing they probably would have seen the bug. Instead they just glanced over things and missed it completely. They should also have realized that they screwed up test data the last two years, which means there is a decent probability they did it again.

It is hard to write a completely correct problem set, but it isn't that hard. More importantly the contestants deserve a better contest. We practice countless hours to do well in this one contest, and they can't put in the time to make sure that their problems are correct?

There was another, smaller, error in the problem set. On problem H the number of queries is not limited. Thus there is a valid input file of infinite size. When we first read it we thought that you might need some fancy data structure because their could be a huge number of queries. It turns out that nothing special is needed. The problem is that for every solution judged correct there is a huge input file that makes it time out. That didn't cost us the problem, but it definitely cost us time penalty.


Wednesday, November 4, 2009

ACM ECNA Regional 2009 Part 2

We found out some more information about the 8 minute error from the regional. All sites but Youngstown started on time. The contest was essentially 5 hour and 8 minutes long, and Youngstown started 8 minutes late. The solution itself is not great. I would have preferred running 5 hours for all sites. Youngstown would have an advantage because they would be able to look at the scoreboard 8 minutes in the future from other sites, but it is a very small advantage.

But what I really don't like about the solution was that the judges were okay with it because no top teams were from Youngstown. I think that one of the key parts of a competition like this is that it doesn't matter where you start from, as long as you show up to the competition you are on even ground. Sure its nice to get preferential treatment but fundamentally it isn't fair. I realize that when there is some sort of mistake there isn't always a fair way to fix it, but I don't think top teams should be given any thing extra just because of paster performance.

At this point we aren't sure whether or not there is a problem with B, because the test data has not been published. At this point I'm willing to give the judges the benefit of the doubt, but I'll post more about it once we figure out exactly what happened on the problem.

Tuesday, November 3, 2009

ACM ECNA Regional 2009

The final standings are here. As usual the contest report doesn't mention any technical problems. I thought I would mention the ones I heard of here.

When the contest started the McMaster site had not yet handed out problems. So after PC^2 was started it was stopped again, then the contest was started over 8 minutes later. At least at my site we started and were never told to stop, and didn't check PC^2. I believe that the other sites except for McMaster did the same thing. Essentially McMaster was 8 minutes late to a 5:08 long competition. Its not fair, but it shouldn't change the results of the world finals teams.

The time left on PC^2 was wrong for about the last hour of the contest. It showed < 2 minutes left for about the last hour. We were done at that point so we were unaffected. This is a huge problem for teams still working because it makes it much harder to manage time. The other CMU team was printing every few minutes because on the printout was the current time.

The last issue may or may not be an issue. At this point I'm willing to give the benefit of the doubt to the judges. Most of the teams who got B incorrect (Waterloo, CMU Dragons, Purdue) weren't sure why they got it wrong. There are rumblings that the test data was wrong, or that there was floating point error. I haven't throughly checked our 4 wrong answers to that problem, but at this point I believe all 4 of them were incorrect. After the floating point debacle of last year I think the judges probably checked the problem very throughly before finalizing the results.

When the test data is released for B we'll be able to go back and figure out why teams were getting wrong answers. The problem was hard to code, and hard to debug so it is plausible that teams just had subtle bugs in their code.

The final results were CMU Dragons in 4th, Waterloo in 3rd, University of Michigan 2nd, and my team, CMU Tartans in first. My team had plenty of good luck on our side. We weren't affected any of the issues and potentially we got B right despite rounding errors in the test data. Either way we are headed to Harbin China in February.

Tuesday, October 20, 2009

The TopCoder Easy

The easy division one problem on TopCoder is a weird type of problem, that I'm not sure exactly how to tackle. The problem almost always has a very short (~15 lines of code) solution, but normally you need at least one insight to get to that solution. My problem has always been either finding the insight, or being able to prove its correctness.

For the last two SRMs I have solved the easy using a more complicated approach, but one that doesn't require much insight at all. Take OrderedNim(http://www.topcoder.com/stat?c=problem_solution&rm=302474&rd=13904&pm=10190&cr=22657324) for example. The answer is directly related to whether or not the piles have one stone or more than one, but I couldn't think of the exact solution. Instead I just used DP to solve the problem. Sure it took me longer to code, but it avoided a leap of faith in doing so.

I'm still not sure if it is optimal to spend more time looking for the elegant solution, or to just dive into the longer solution. At this point I am confident enough in my coding that I'm not going to lose much time by coding the longer solution. On the other hand I think I am also running away from the elegant code because I don't have the solution. It is hard to force yourself to think of a better way when you have found a method that works.

I am probably doing something wrong because my times for the easy solutions have not been great. ()They haven't been bad, but for the last two SRMs my mediums have carried a majority of the weight. I'm very happy to be able to get fast times and still be very confident that my code is correct. I got the fastest medium overall in SRM 450 and the fastest Java solution in 451 (~10th overall). ()

Saturday, October 17, 2009

The Programming Contest Version of Chicken

In the last few practices we have had a couple of problems that have induced an interesting phenomena. The best example is one problem from world finals last year. It was questionable whether the brute force solution would time out or not. My team decided to wait and just look at the scoreboard in 15 minutes. When we checked the scoreboard a bunch of teams had a wrong answer for the problem and no one had solved it. We had our answer without wasting any of our time solving the problem.

Assuming all parties recognize that the problem may or may not be brute force able it seems optimal to always wait until someone else attempts it. On one side if you are able to brute force it you will have a smaller time penalty, on the other side if it isn't you waste console time. In general I think that console time is more important.

At world finals I think it is much more crucial to not spend times on problems you don't solve then to worry about time penalty. All but a few teams there will solve no more than 6 of the 10 problems. You can waste a ton of time by attempting one of the harder problems and never solving it. This creates an even better argument for not attempting the questionably brute force able problem.

At world finals there will probably always be a couple teams who are too optimistic and try the brute force solution, but at regionals that may not be the case. My region (East Central North America) had only 5-6 top teams last year. In theory this could lead to a phenomenon where a lot of the top teams don't solve a brute force able problem until very late into a contest.

Saturday, May 9, 2009

Learning while Coding

There has been a phenomena that has been bugging me for a while, but I just couldn't put my finger on it. A couple days ago my brother (a non programmer) asked about which programming language to use for a little project of his. I asked what the project was, and didn't get more than some vague statements. He said that he wanted to start coding and figure out how to do it on the way. I have been guilty of this many times (including the AI project I am currently having trouble with). I think it is one of the most inefficient ways of going about programming.

For some reason when you code you aren't able to focus on the big picture of the program. I have a very hard time anticipating problems I am going to have 20 lines in the future until I actually get there. There are plenty of TopCoder problems where half way through I realize that my algorithm doesn't work. I have had the same thing happen at my internship over the summer. In a good case I figure out the problem before I am finished coding, occasionally it takes until I start debugging to figure things out.

I think the problem comes from the fact that it is hard to spend the time and understand your algorithm. It's very easy for me to just dive in and start coding. For some reason I am scarred of going through the work to really figure out exactly how things are going to work. I think that being able to plan things out in detail (to the point where what you code doesn't change much between the first draft and the final version) is a crucial skill for both competative programming and for programming in the real world. 

I think you can really see the difference when it comes to debug time. Look at the last problem we solved at the ACM world finals. Alan and I talked it over thoroughly and we really knew exactly what we wanted to do. The code got written in about 15 minutes and the debug time was just a couple of minutes. Contrast that with my performance on the medium problem in the TopCoder open round 4, I recognized it as DP (without grasping the full complexity of it), jumped in and started coding. After an hour I was still debugging a flawed algorithm.

Another example is coding malloc. My friend just jumped in and started coding it. He was well on his way to finishing the code while I was still figuring out how I wanted to setup my #defines. I debugged it for a couple of hours, it took him days to finally get his code correct. Planning is especially important for things like malloc (basically a ton of bit operations, and problems show up way after they occurred). The more complicated the code, the more you need to plan.

When I started TopCoder I always wanted to be really fast. Speed was a huge focus for me as it is for many other competitors. I think that focusing on speed early on doesn't help the learning process at all. The real focus should be on correctness. Plan things out on paper in detail and get things correct (in practice and in the contest). After doing that for a while it is easy to recognize a problem that you can solve quickly without planning and ones you need to spend a half our coding.