The problem was to perform a bunch of line segment casts through objects made out of "triangle soup".
Points scored for speed.
Chapters:
00:00 Intro
00:28 Baseline (10991 ms)
00:59 Slowest Submission (21714 ms)
03:24 Local Copy vs Reference (13572 ms)
07:35 More Local Copies (9949 ms)
09:16 Don't Make Unnecessary Copies of Data
10:12 Optimizing Matrix Inverse for Affine Matrix (9541 ms)
15:44 Profiling (VS Performance Profiler)
18:45 Cache Redundant Calculations (4519 ms)
22:36 Cache More Calculations (3134 ms)
25:00 Broad Phase Test with AABBs
26:35 Multithreading (390 ms)
32:58 Summary & Other Ideas