Recursion as a performance boost!

A common misconception some (many?) have about recursive solutions is that they are not fast. There are many reasons to use recursion, both in code and in data, mainly that solutions based on recursion are easy to implement, understand, and maintain. For an introduction see Recursion (http://www.tushar-mehta.com/publish_train/book_vba/07_recursion.htm).

What is interesting is that recursive solutions can also provide an improvement over a non-recursive solution. This happened recently when I finally decided to tackle Project Euler problem 14 (http://projecteuler.net/index.php?section=problems&id=14). In the problem, one had to find the number under 1,000,000 that had the longest Collatz chain.

I had postponed addressing the problem because until earlier today I could think of only a “brute force” approach1. But, then, I realized something. If one started from 1 and worked their way up to 999,999, one could build a database of answers for all the numbers encountered in earlier calculations. Then, whenever one encountered a number already in the database, a simple look up of the associated would short circuit the need for any further computations.

For the recursive implementation as well as the intuitive approach see Project Euler - Problem 14 (http://www.tushar-mehta.com/misc_tutorials/project_euler/euler014.html)

1 A approach that should work but that I consider as somewhat unethical is to assume that the solution is a high number. So, work backwards from 999,999 to 900,000. Check this solution with the Project Euler system. If it rejects your answer, try the range 899,999 to 800,000. If that isn’t OK, try the next range down. Sooner or later you will find the correct answer. Just keep in mind that the PE website has a 20 minute wait before one can resubmit an answer.

5 Comments

  1. Doug Jenkins says:

    Was the link to problem 14 supposed to be recursive? :)

    I used a simple brute force solution that solved in about 11 seconds on my computer, compared with about 70 seconds for your recursive solution, and about twice as long for the non-recursive routine. One reason was that I only checked odd starting numbers, but checking every number only doubles the time to about 22 seconds.

    I modified your non-recursive code as shown below, which reduced the solution time to about 7 seconds! It seems that the CDec() or mod functions were taking a lot of time. I still haven’t worked out why your code is so much faster than mine (which was essentially the same).

    [VB]
    Sub Euler014s()
    Dim I As Long
    Dim ProcTime As Single, MaxRslt As Integer, MaxStartNbr As Long
    ProcTime = Timer
    MaxRslt = 1: MaxStartNbr = 1
    For I = 3 To 1000000 - 1 Step 2
    Dim Rslt As Double, ThisSeqLen As Long
    ‘ Rslt = CDec(I): ThisSeqLen = 1
    Rslt = I: ThisSeqLen = 1
    Do While Rslt 1
    ‘If Right(Rslt, 1) Mod 2 = 0 Then Rslt = Rslt / 2 _

    If Rslt / 2 - Int(Rslt / 2) = 0 Then Rslt = Rslt / 2 _
    Else: Rslt = 3 * Rslt + 1
    ThisSeqLen = ThisSeqLen + 1
    Loop
    If MaxRslt

  2. Doug Jenkins says:

    The dreaded greater/less than strikes again.

    The rest of the code is as shown at Tushar’s link.

    I also meant to comment that making similar changes to the recursive routine does not seem to help significantly. I’m not sure why not.

  3. fzz says:

    Recursion isn’t the same thing as accumulating previous results.

    Tushar Mehta’s lookup approach is slow due to the function calls to CheckOneNbr as well as error trapping accesses to the SeqCount collection. While it takes a reference in the VBA Project, WSH Dictionary objects fly compared to plain vanilla VBA collection objects.

    My results agree with Doug Jenkins’s: brute force takes about 11 seconds.

    The point about cheating by interating from 999999 to 2 is foolish. Unless one has a mathematical proof that the answer would be towards the high end of the range, one must try all values in the range either largest to smallest or smallest to largest.

    Finally, Right(Rslt, 1) Mod 2 to avoid overflow is losing tactic when the goal is performance. For that matter,

    If Rslt / 2 - Int(Rslt / 2) = 0 Then Rslt = Rslt / 2 _
    Else: Rslt = 3 * Rslt + 1

    (why the : after Else?) performs the Rslt / 2 test more often than needed. Better something like

    Rslt = Rslt / 2#
    If Int(Rslt) Rslt Then Rslt = 6# * Rslt + 1#

    1 divide rather than 2 in every loop, no additional divide when Rslt starts off even, and the same number of multiply and add operations when Rslt starts off odd.

  4. Jonah Feld says:

    I think the move to make is to solve the problem in reverse. The recursive solution still does tons of calculations to come up with the answer.

    This is a big hint:
    Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

    So, start at 1 and work backwards trying to find the longest chain.

    The functions:
    F1: n → n/2 (n is even)
    F2: n → 3n + 1 (n is odd)

    The functions in reverse are:
    R1: n → n*2
    R2: n → (n-1) / 3

    The trick is to know when to choose R2 instead of R1. You don’t have to make this choice very often. I’ll work on this and post again later.

  5. Jonah Feld says:

    After playing around with this, my plan doesn’t work. Even with the correct decision path in hand, backing up from 1, I can only get the first eighteen numbers correct with simple decision criteria. My criteria for choosing R2 was: n mod 2 = 0 and R2(n) mod 3 = 2.

    Even if there’s a reliable way to choose when to use R2 instead of R1, there’s not a great way of knowing at what point n will never drop back below 1,000,000.

Leave a Reply