Project Euler: Problem 1 Constant Time

2010/01/06

Ok, so at this point we’ve solved problem 1 (YAY!) but we’ve done it in a naive way. We used what’s called a brute force approach to solve. And we came up with a linear time algorithm. Meaning as we make our target number bigger (10,000,000 for example instead of 1,000) the time taken to find the solution also increases linearly.

What if we actually think for a second about what it means to find the sum of these multiples. Is there a relationship between these numbers we can exploit to arrive at a more efficient algorithm?

Let’s step back a bit and remember our good friend Euler. He showed us that we can find the sum of a sequence of integers by simple induction. The sum of consecutive numbers up to n can be found with n*(n+1)/2. Let’s try this out in our lovely IDLE shell…

>>> 1+2+3+4+5
15
>>> 5 * (5 + 1) // 2
15
>>> sum(range(6)) # remember range stops at n-1
15
>>> # WHOA! It works! How about 10?
>>> 1+2+3+4+5+6+7+8+9+10
55
>>> sum(range(11)) # remember range stops at n-1
55
>>> 10 * (10 + 1) // 2
55

Is there some way we can use this with factors? Well how many factors of 3 are there in 1000? Well 1000 doesn’t divide by 3, so let’s go lower to 999. Hey look! 999 / 3 = 333. So there are 333 factors of 3 in between 1 and 1000. Could we sum a sequence of those numbers somehow? You bet.

>>> 333 * (333 + 1) // 2
55611
>>> #But wait a second, isn't that just the sum of integers from 1 to 333? Yep. But thanks to the rules of multiplication it works to just multiply that sum by the factor we were originally looking for.
>>> (333 * (333 + 1) // 2) * 3
166833
>>>

So now we have a reliable way to find the sum of a range, or the sum of a range of factors. But I don’t want to type all of that again for a different factor. Can’t I use my awesome variable friends? Of course, and we can go one better than that, by introducing the next epic element of programming. Our friends…

functions

A function is a block of code that has been abstracted enough to work on various variables. Almost no programming is done without the use of functions (also known as “subroutines” or “methods” in other contexts) In python you define a function with the “def” keyword. This instructs python to define the function for later use, but not run the code yet. The “return” keyword is the output of the function. You can save the value of a function’s return by assigning a variable to it. Here’s a simple function that squares numbers that are sent to it (entering functions in IDLE is a little weird, just double check your indentation and press enter on the blank line)

>>> def square(num):
    return num * num # this line is indented since it's part of the function's body
    # notice how there is no output after defining the function, it's defined now and we can call it as often as we like
>>> square(5)
25
>>> square(19)
361
>>> # let's save those return values
>>> a = square(4)
>>> b = square(5)
>>> a + b
41

So there you have it. We know that the operation of squaring a number doesn’t depend on what the number is, it’s just whatever number we pass into the function times itself. So this type of thing lends itself well to being a function.

Let’s get back to our summing procedure. Based on our formula, we should be able to make a function out of this without issue. Let’s open up a new editor from IDLE (File -> New Window) and start writing our new script.

""" A constant-time algorithm for problem 1"""
def sum_to_n(n):
return n * (n + 1) // 2

def sum_of_factors(factor, n):
    total_factors = n // factor
    return sum_to_n(total_factors) * factor

limit = 999 # we're looking for values *under* 1000
threes = sum_of_factors(3, limit)
fives = sum_of_factors(5, limit)
print(threes + fives)

So when we save and run this program we get 266333. That’s not quite right? What did we do wrong? Well there are some numbers that have factors of both 3 and 5. So they must be getting counted twice. Once when we compute threes and once when we compute fives. How can we get rid of those? Well what’s the lowest common multiple? 15 you say? Excellent, let’s fix it up.

""" A constant-time algorithm for problem 1"""
def sum_to_n(n):
    return n * (n + 1) // 2

def sum_of_factors(factor, n):
    total_factors = n // factor
    return sum_to_n(total_factors) * factor

limit = 999
threes = sum_of_factors(3, limit)
fives = sum_of_factors(5, limit)
"""some of the numbers have been counted twice
so let's remove them from the total"""
fifteens = sum_of_factors(15, limit)
print(threes + fives - fifteens)

When we save and run this program we get 233168 which is the output of our original program. WE WIN!

Now let’s figure out why this is actually a better solution algorithmically. First we’ll time the two solutions (We’ll even include doc’s that is posted on page 2 for fun).

Here’s a big file I wrote that contains all three solutions along with some timing information…

def first_solution(limit):
    print(sum([i for i in range(1, limit) if i % 5 == 0 or i % 3 == 0]))

def docs_solution(limit):
    print(sum(list(range(0, limit, 3)) + list(range(0, limit, 5)) + list(range(0, -limit, -15))))

def constant_time_solution(limit):
    limit -= 1
    def sum_to_n(n):
        return n * (n + 1) // 2
    def sum_of_factors(factor, n):
        total_factors = n // factor
        return sum_to_n(total_factors) * factor
    threes = sum_of_factors(3, limit)
    fives = sum_of_factors(5, limit)

    """some of the numbers have been counted twice
    so let's remove them from the total"""
    fifteens = sum_of_factors(15, limit)
    print(threes + fives - fifteens)
import time
limits = [1000, 1000000, 100000000]
for limit in limits:
    print("Finding multiples of 3 or 5 up to", limit)
    start = time.time()
    first_solution(limit)
    print("1st solution: %0.5f" % (time.time() - start), "seconds")
    start = time.time()
    docs_solution(limit)
    print("2nd solution: %0.5f" % (time.time() - start), "seconds")
    start = time.time()
    constant_time_solution(limit)
    print("3rd solution: %0.5f" % (time.time() - start), "seconds")

which outputs the following…

Finding multiples of 3 or 5 up to 1000
233168
1st solution: 0.00400 seconds
233168
2nd solution: 0.00500 seconds
233168
3rd solution: 0.00400 seconds
Finding multiples of 3 or 5 up to 1000000
233333166668
1st solution: 0.33500 seconds
233333166668
2nd solution: 0.12400 seconds
233333166668
3rd solution: 0.00400 seconds
Finding multiples of 3 or 5 up to 100000000
2333333316666668
1st solution: 29.62200 seconds
2333333316666668
2nd solution: 9.80800 seconds
2333333316666668
3rd solution: 0.00400 seconds

Hmm so doc’s solution is clearly faster than our first, just by virtue of doing less expensive operations (not doing modulus or any division). However you can see that as we increase the limit, both the 1st and 2nd algorithms take linearly more time to solve. Only the third solution is constant time. Notice it takes 4 milliseconds to solve for any given number. If you keep increasing the limit over 1 billion, the first two will actually run a 2GB machine out of memory before arriving at a solution. So not only does the third solution not use much memory at all, but it always solves in the same amount of time!

This sort of knowledge is what separates your John Carmacks from your average Java hacker out of Northwestern Southern Florida State University. I hope we’ve all learned something here today.

Advertisement

2 Responses to “Project Euler: Problem 1 Constant Time”


  1. [...] way to solve this? Thanks for following along. I hope you learned something and had a good time. In the next section I’ll cover how brute force approaches like this aren’t always the be… Possibly related posts: (automatically generated)Project EulerBored? Want some programming [...]


  2. Hello Sir,

    I was going through your blog and found the problem interesting.

    An alternative solution to your constant time algorithm could be the use of sum of an arithmetic progression series. I think its pretty similar to the solution you suggested.

    Finding the sum is like series of two numbers:

    A (multiples of 3)= 3,6,9,12,15………An; where n is the number of terms in the series

    B (multiples of 5)= 5,10,15,20,25,30…….Bn

    where Sum of a finite series= (n/2)*(2*first_term+(n-1)diff.)

    So, considering an example;

    Case 1: Series A(multiples of 3 between 1-10): 3,6,9

    No. of terms=3

    Sum = (3/2)* [2*3+(3-1)*3]
    = 18

    Case 2: Series B: 5, 10, 15, 20

    No. of terms = 4

    Sum = (4/2) * [2*5+(4-1)*5]
    = 2 * 25
    = 50

    This way, the modification looks like;

    limit=999
    threes=sum_ap(limit,3);
    fives=sum_ap(limit,5);

    where
    int sum_ap(limit, n)
    { fact=limit/n;
    Sum=(fact/2)* [2*n + (fact-1)*n];
    return Sum;
    }

    And, the final output would be
    threes+fives-fifteen

    Thanks,

    Shirish


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.