I like coding. I hate shipping software.
2010/02/23
- translating
- documenting
- testing
- DLL hell
- install scripts
- customers
- marketing
- Talking with other developers
- white boards
- new tech
- compilers
- crazy features
- jokes in comments
- feelings of accomplishment.
- satisfying diff emails
When to give up your Open Source Project?
2010/01/21
After recently reading Jeff Atwood’s plea to maintainers to not be douche bags. I have come to the realization that I am a horrible Software Parent. I have been writing open source code since 2000, and some has even enjoyed moderate success. But where I have been dropping the ball for 10 years is in responsibly letting them go.
Exhibit A:
My first project, IPM was just a simple multi-user todo list I needed for the first company I started. I put it up on sourceforge in 2000 hoping that someone way better than me at PHP would help me fix it. But instead I got my first dose of internet popularity. People actually liked it, and they wanted new features. NO! THEY NEEDED NEW FEATURES! It was all very exciting, and the project benefited from contributions to the code and documentation. It was great for everyone. Fast forward about a year. IPM is now being used by some groups at CERN, which makes my nerd gland swell, however I could care less about adding another task display feature. I had just stopped caring about it.
I would get emails from concerned users asking when I was going to fix a bug, or release a new version, and I found myself just ignoring the emails. Not maliciously, but just out of apathy and a deluded thought that I would get to them eventually. It turns out that wasn’t very healthy, and the project sits there on several websites now, neglected and sad. Untouched for almost a decade.
Exhibit B:
My next project to have any success was a PERL monstrosity meant to help out with the administration of a game server than I ran. The server was for Return to Castle Wolfenstein. It was a great server with about 30 people hanging out at any given time on any night. Of course being the internet, some people were fond of showing up and causing trouble. The native commands for banning someone or kicking them off the server were clumsy and unreliable. So I wrote a simple interface to the server that would read what I typed in chat and execute the commands directly via a socket to the server. It worked very well and eventually ended up on several hundred other servers. Eventually I created a statistics affiliate network out of that code base, and as a player community we shared our abilities with a thompson submachine gun with every player that had been on the network. This pre-dated stuff like Valve’s stats by about 7 years.
But then… I stopped caring about that too. I ported the bot to Call of Duty, but quickly lost interest there as well. And it still sits, waiting like a good dog for me to come home and love it again. But it’s just not going to happen.
Exhibit R:
I skipped a whole bunch of other projects over the last decade that you don’t care about. This brings us to my current project: Dwarf Therapist. I hacked on it for most of the end of last year, and it has come a long way. But the telltale signs of not caring are showing their ugly heads.
But this time, I’m still engaged enough to care about its fate. There are other contributors who would willingly take it over. I must decide how and when to do it. And what role I want in the future. I know I can’t bear to see another of my projects just slowly decay. How do other open-source authors handle this stage of a project?
Project Euler: Problem 1 Constant Time
2010/01/06
Ok, so at this point we’ve solved problem 1 (YAY!) but we’ve done it in a naive way. We used what’s called a brute force approach to solve. And we came up with a linear time algorithm. Meaning as we make our target number bigger (10,000,000 for example instead of 1,000) the time taken to find the solution also increases linearly.
What if we actually think for a second about what it means to find the sum of these multiples. Is there a relationship between these numbers we can exploit to arrive at a more efficient algorithm?
Let’s step back a bit and remember our good friend Euler. He showed us that we can find the sum of a sequence of integers by simple induction. The sum of consecutive numbers up to n can be found with n*(n+1)/2. Let’s try this out in our lovely IDLE shell…
>>> 1+2+3+4+5 15 >>> 5 * (5 + 1) // 2 15 >>> sum(range(6)) # remember range stops at n-1 15 >>> # WHOA! It works! How about 10? >>> 1+2+3+4+5+6+7+8+9+10 55 >>> sum(range(11)) # remember range stops at n-1 55 >>> 10 * (10 + 1) // 2 55
Is there some way we can use this with factors? Well how many factors of 3 are there in 1000? Well 1000 doesn’t divide by 3, so let’s go lower to 999. Hey look! 999 / 3 = 333. So there are 333 factors of 3 in between 1 and 1000. Could we sum a sequence of those numbers somehow? You bet.
>>> 333 * (333 + 1) // 2 55611 >>> #But wait a second, isn't that just the sum of integers from 1 to 333? Yep. But thanks to the rules of multiplication it works to just multiply that sum by the factor we were originally looking for. >>> (333 * (333 + 1) // 2) * 3 166833 >>>
So now we have a reliable way to find the sum of a range, or the sum of a range of factors. But I don’t want to type all of that again for a different factor. Can’t I use my awesome variable friends? Of course, and we can go one better than that, by introducing the next epic element of programming. Our friends…
functions
A function is a block of code that has been abstracted enough to work on various variables. Almost no programming is done without the use of functions (also known as “subroutines” or “methods” in other contexts) In python you define a function with the “def” keyword. This instructs python to define the function for later use, but not run the code yet. The “return” keyword is the output of the function. You can save the value of a function’s return by assigning a variable to it. Here’s a simple function that squares numbers that are sent to it (entering functions in IDLE is a little weird, just double check your indentation and press enter on the blank line)
>>> def square(num):
return num * num # this line is indented since it's part of the function's body
# notice how there is no output after defining the function, it's defined now and we can call it as often as we like
>>> square(5)
25
>>> square(19)
361
>>> # let's save those return values
>>> a = square(4)
>>> b = square(5)
>>> a + b
41
So there you have it. We know that the operation of squaring a number doesn’t depend on what the number is, it’s just whatever number we pass into the function times itself. So this type of thing lends itself well to being a function.
Let’s get back to our summing procedure. Based on our formula, we should be able to make a function out of this without issue. Let’s open up a new editor from IDLE (File -> New Window) and start writing our new script.
""" A constant-time algorithm for problem 1"""
def sum_to_n(n):
return n * (n + 1) // 2
def sum_of_factors(factor, n):
total_factors = n // factor
return sum_to_n(total_factors) * factor
limit = 999 # we're looking for values *under* 1000
threes = sum_of_factors(3, limit)
fives = sum_of_factors(5, limit)
print(threes + fives)
So when we save and run this program we get 266333. That’s not quite right? What did we do wrong? Well there are some numbers that have factors of both 3 and 5. So they must be getting counted twice. Once when we compute threes and once when we compute fives. How can we get rid of those? Well what’s the lowest common multiple? 15 you say? Excellent, let’s fix it up.
""" A constant-time algorithm for problem 1"""
def sum_to_n(n):
return n * (n + 1) // 2
def sum_of_factors(factor, n):
total_factors = n // factor
return sum_to_n(total_factors) * factor
limit = 999
threes = sum_of_factors(3, limit)
fives = sum_of_factors(5, limit)
"""some of the numbers have been counted twice
so let's remove them from the total"""
fifteens = sum_of_factors(15, limit)
print(threes + fives - fifteens)
When we save and run this program we get 233168 which is the output of our original program. WE WIN!
Now let’s figure out why this is actually a better solution algorithmically. First we’ll time the two solutions (We’ll even include doc’s that is posted on page 2 for fun).
Here’s a big file I wrote that contains all three solutions along with some timing information…
def first_solution(limit):
print(sum([i for i in range(1, limit) if i % 5 == 0 or i % 3 == 0]))
def docs_solution(limit):
print(sum(list(range(0, limit, 3)) + list(range(0, limit, 5)) + list(range(0, -limit, -15))))
def constant_time_solution(limit):
limit -= 1
def sum_to_n(n):
return n * (n + 1) // 2
def sum_of_factors(factor, n):
total_factors = n // factor
return sum_to_n(total_factors) * factor
threes = sum_of_factors(3, limit)
fives = sum_of_factors(5, limit)
"""some of the numbers have been counted twice
so let's remove them from the total"""
fifteens = sum_of_factors(15, limit)
print(threes + fives - fifteens)
import time
limits = [1000, 1000000, 100000000]
for limit in limits:
print("Finding multiples of 3 or 5 up to", limit)
start = time.time()
first_solution(limit)
print("1st solution: %0.5f" % (time.time() - start), "seconds")
start = time.time()
docs_solution(limit)
print("2nd solution: %0.5f" % (time.time() - start), "seconds")
start = time.time()
constant_time_solution(limit)
print("3rd solution: %0.5f" % (time.time() - start), "seconds")
which outputs the following…
Finding multiples of 3 or 5 up to 1000 233168 1st solution: 0.00400 seconds 233168 2nd solution: 0.00500 seconds 233168 3rd solution: 0.00400 seconds Finding multiples of 3 or 5 up to 1000000 233333166668 1st solution: 0.33500 seconds 233333166668 2nd solution: 0.12400 seconds 233333166668 3rd solution: 0.00400 seconds Finding multiples of 3 or 5 up to 100000000 2333333316666668 1st solution: 29.62200 seconds 2333333316666668 2nd solution: 9.80800 seconds 2333333316666668 3rd solution: 0.00400 seconds
Hmm so doc’s solution is clearly faster than our first, just by virtue of doing less expensive operations (not doing modulus or any division). However you can see that as we increase the limit, both the 1st and 2nd algorithms take linearly more time to solve. Only the third solution is constant time. Notice it takes 4 milliseconds to solve for any given number. If you keep increasing the limit over 1 billion, the first two will actually run a 2GB machine out of memory before arriving at a solution. So not only does the third solution not use much memory at all, but it always solves in the same amount of time!
This sort of knowledge is what separates your John Carmacks from your average Java hacker out of Northwestern Southern Florida State University. I hope we’ve all learned something here today.
Project Euler: Problem 1
2010/01/04
Alright folks. Let’s call this programming for non-programmers. I’m always looking for more folks to get into what I think is one of the coolest things we can do with our brains. A program is essentially a proof. Not quite a pure-mathematical proof, but still a proof.
Who cares? Programming teaches your brain to assess problems in a very structured way. A lot of the time this leads to a deeper understanding of your problem, simply because you had to think about all the facets of it. I could talk for days about how awesome I think programming is, but let’s jump in with a few math problems. Most of what I want to cover can be found at Project Euler, but I will guide us through the first few problems there with Python.
What you need:
- Python 3.1 (http://www.python.org/ftp/python/3.1.1/python-3.1.1.msi for Windows32)
- about 45 minutes
- your brain
Problem 1: If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.
Let’s break down what this problem is all about. It concerns factoring numbers and summing numbers. If you know how to do addition and division we’re ready to rock. Since the problem gives us a nice example for a small number (10) we should write our code to solve that. If the code works for a small set, it should work for a larger set.
In your start menu (You installed python right?) you should have an entry under Python 3.1 called “IDLE (Python GUI)”. Click that and you should see an ugly white window show up with a “>>>” prompt. This is the python interactive shell. It’s effectively a running python interpreter, and you can just type code into it. So to start out let’s see how we do arithmetic in Python. We know we’re going to need addition and division, so let’s get comfy with those.
Enter the following into IDLE:
>>> 5 + 9
Then press enter.
The shell should return the result (14), and hand you another blank prompt. Well that’s all well and good, but my calculator can do that too. Let’s cover division next…
>>> 13 // 5 2 >>> 13 / 5 2.6 >>> 13 % 5 3
WTF? What does that mean? The first operation (13 // 5) told python to execute “floor division” meaning only an integer can result, and it will be rounded down (hence “floor”). The second operation (13 / 5) told python to execute “true division” which returns the best decimal approximation your machine can give for a ratio (2.6 in this case). The third operation (13 % 5) told python to execute “modulus” or what 4th graders call the remainder. Meaning what number is left over after 5 evenly divides into 13 as many times as it can.
So now we know how to add numbers, and 3 different ways to do division, but we’re missing one key element. Variables. What if we want to keep the result of a calculation for later use? We can store it in RAM and use it as much as we like. Guys with PhDs have been arguing about variable design for about 70 years, and strong vs weak types, static vs dynamic types etc… are beyond the scope of this post. So let’s just pretend a variable is like a bucket. A bucket that can hold just about anything. You can put a “2″ in a bucket, or the whole text of War and Peace. Or an email address, etc… Let’s see an example:
>>> a = 2 >>> b = 3 >>> a + b 5
Here we’ve told python to store the integer 2 as a, the integer 3 as b and then add the the values of a and b. Guess what we get? Exactly what you’d expect (5). There is a lot more discussion we could have concerning variables, RAM usage, and types, but we’ll skip that for now. We’re almost at a spot where we can start tackling the original problem.
Up til now we’ve been using the interactive part of IDLE. However we haven’t written a program yet. If you close IDLE now, you’ll lose everything we’ve typed so far. So let’s start writing a program we can keep.
- In IDLE, select File -> New Window
- You will see a new blank window show up without the “>>>” prompt.
- Enter the following into the window
"""If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000."""
The triple double quotes around this message are important, you’ll notice IDLE has colored this block of text green. The triple quotes tell python that this is a multi-line comment, and not to execute it as code. This is how programmers make notes for themselves inside a program. It’s generally good to write comments like this in your code so you don’t have to keep alt-tabbing to a browser to remember what you were doing, and for later reference when you find this code 3 years from now and forgot what it does.
- In the menu at the top select File -> Save
- Save the file as “problem1.py” on your desktop.
Congratulations, you just wrote your first python program. Well, not really, since it doesn’t do anything yet. So now let’s set up some variables so we can solve the first part of the problem. We know we want to find multiples of 3 and 5 for all numbers from 1 to 10. So let’s store 10 as a variable.
"""If we list all the natural numbers below 10 that are multiples
of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."""
limit = 10
print("Starting to find multiples up to", limit)
Save the file. Then in the menu select Run->Run Module or press F5. You will see the results of your program in the other IDLE window. We stored our upper limit (10) in the variable called “limit”, we then told python to print the string “Starting to find multiples up to” and then print the value of limit after that. When you run this program you should see “Starting to find multiples up to 10″ in the output window.
How does one find multiples of a number? Well if you divide number a by number b and there is no remainder, that means b is a multiple of a. 12 has the multiples 1, 2, 3, 4, 6, and 12. Because you can divide 12 by any of those numbers without a remainder. The problem states we only care about numbers that 3 or 5 divide evenly into. So we don’t need to find every multiple of a given number, we just need to test for 3 and 5.
Which leads us to the most powerful statement in any programming language. The one statement that makes computers interesting at all. The most powerful statement in the world that some may argue makes our brains quantum machines…
if
An “if statement” is just that. It’s a branching of logic where you can alter the flow of a program based on variables or expressions. Our brains handle thousands of these a day, and programming is basically identifying “ifs” and reacting accordingly. You have to pee, you walk to the bathroom. There are two doors. If you’re a male, you go into the men’s room, otherwise you go into the women’s room. Let’s start out by finding out if 10 has 3 or 5 as multiples. Your program should now look like this…
"""If we list all the natural numbers below 10 that are multiples
of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."""
limit = 10
print("Starting to find multiples up to", limit)
if limit % 5 == 0:
print("5 is a multiple of", limit)
if limit % 3 == 0:
print("3 is a multiple of", limit)
Save and run the program and you should get
Starting to find multiples up to 10 5 is a multiple of 10
as the output. But what about the third print statement? Why did the output not show “3 is a multiple of 10″? Well because it’s not. Our if statements were basically saying if you can modulus 5 into 10 and the result is 0, then 5 is a multiple of 10, so print that out. We then did the same thing for 3. But 3 goes into 10, 3 times with a remainder of 1. And 1 does not equal 0, so our print statement didn’t execute. This is the power of if. If you notice in our if statements when were wanted to test if something equaled another something, we used “==” instead of “=”. This is very important as “=” is the “assignment operator”. You use “=” when you want to set something as equal to another thing. When you want to test if two things are equal you use “==”.
So this is great and all, but the problem states we should be finding all the numbers from 1 up to 1000. Not just 10. Man it would suck to copy paste those tests 9 more times, let alone 999 more times. So how do we tell the machine to test those numbers for us without wearing out our keyboards? The second most powerful idea in programming:
loops
Anytime you need a machine to do something over and over, you use a loop. There are many different kind of loops, including the famous infinite loop (which is why our keyboards ever had a “Break” key). For our purposes here we want to loop over the integers from 1 to 10 and test if each one has 3 or 5 as multiples.
"""If we list all the natural numbers below 10 that are multiples
of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."""
limit = 10
print("Starting to find multiples up to", limit)
for i in range(1, limit):
print("Checking", i)
if i % 5 == 0:
print("5 is a multiple of", i)
if i % 3 == 0:
print("3 is a multiple of", i)
Hey check it out, we made a loop. The line “for i in range(1, limit):” tells the program to start a loop over the range of integers from 1, to limit (10). Now notice the code under the “for” line is indented. This is called the “loop body”. The “for” line states the boundaries of the loop, or how the loop should iterate. The indented code under it says what to do on each step. On each step of the loop, python will set the variable “i” for us, to whatever is next in the loop. So in plain english we’re saying:
- Do the following 9 times starting with the number 9
- Set i to the current loop step (1)
- Print out which number the loop is currently on (i)
- Check if i modulus 5 is zero, if it is, then print that out
- Check if i modulus 3 is zero, if it is, then print that out
- Set i to the next value and go back to step 2, unless “i” was already set to the last value of the loop
When you run this program you should see the following…
Starting to find multiples up to 10 Checking 1 Checking 2 Checking 3 3 is a multiple of 3 Checking 4 Checking 5 5 is a multiple of 5 Checking 6 3 is a multiple of 6 Checking 7 Checking 8 Checking 9 3 is a multiple of 9
Hey it works! We just checked all the numbers from 1 to 10 without having to copy/paste a bunch of code. It’s a little wordy. It was nice seeing that it checked all the numbers, but we don’t need to see that every time, so let’s comment that part out. You can turn any single line into a comment by putting a “#” at the start of the line. This will make our output little more concise. Also, the problem stated that we wanted the sum of these numbers, not just the numbers. So let’s change it a bit to be more in line with the original problem.
"""If we list all the natural numbers below 10 that are multiples
of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."""
limit = 10
print("Starting to find multiples up to", limit)
total = 0
for i in range(1, limit):
#print("Checking", i)
if i % 5 == 0 or i % 3 == 0:
total += i
print(i, "has either 3 or 5 as a multiple. Total is now:", total)
print("total: ", total)
There’s actually a lot that changed here.
Line: 9 Before the loop, we set a new variable called “total” we will use this to track the total of numbers that we find. It starts out at 0 because before the loop starts we haven’t found any numbers.
Line: 11 On the first line inside the loop you can see we’ve commented out the line that prints “Checking…” to keep our output from being too wordy. You comment out code in this manner instead of deleting it in case you want to put it back at some point. All you’d have to do to get it back is remove the “#”.
Line: 12 We combined the two if statements into a single compound statement. The if statement now reads “if the current loop value modulus 5 is zero *OR* the current loop value modulus 3 is zero“. If a number i satisfies *either one* of these statements then we execute the code below the if statement. If i matches, then we add it to the total with “+=”. This means add it to the value that total already has. If we had used just “=” it wouldn’t accumulate a sum, total would just hold the last value of i that the loop ran. On each step we also print out the number that matched, and what the running total is on that step.
Line: 16 We un-indent from the for loop, meaning Line 16 will run after the for loop is finished. As the last line of the program, it will print out the total that we’ve accumulated. When you run this program it should now look like this…
Starting to find multiples up to 10 3 has either 3 or 5 as a multiple. Total is now: 3 5 has either 3 or 5 as a multiple. Total is now: 8 6 has either 3 or 5 as a multiple. Total is now: 14 9 has either 3 or 5 as a multiple. Total is now: 23 total: 23
Wicked! As per our comment at the top of the file, this is the answer we were looking for! Maybe we can solve the whole problem now! As you’ll recall we’re supposed to be doing this for numbers from 1 to 1000. And since we’re such awesome programmers this doesn’t require many changes. Since we used a variable for the limit, we can just add a couple of more zeros to change 10 to 1000! Let’s make a couple of changes:
"""If we list all the natural numbers below 10 that are multiples
of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."""
limit = 1000
print("Starting to find multiples up to", limit)
total = 0
for i in range(1, limit):
#print("Checking", i)
if i % 5 == 0 or i % 3 == 0:
total += i
#print(i, "has either 3 or 5 as a multiple. Total is now:", total)
print("total: ", total)
Line: 6 We upped the limit to 1000. If you had just used 10 everywhere limit is written you’d have to change it in several places, meaning more work, and more opportunity to miss a spot and introduce a nasty bug into the program. Variables are your friend.
Line: 14 We commented out this line so that we don’t get so much spam in our output. After running this program you should see this in the output window…
Starting to find multiples up to 1000 total: 233168
Which is the correct answer! It may comfort or frustrate you to know you can do this whole program in a single line of python. It’s a bit more obfuscated, but it works nonetheless.
# complete program
print("total: ", sum([i for i in range(1, 1000) if i % 5 == 0 or i % 3 == 0]))
See if you can figure out what this one line is doing.
Bonus Points
How efficient is this algorithm? Did you notice a speed decrease in changing limit from 10 to 1000? What about 1000000 or 1000000000? How long does it take your machine to compute? Is there a better way to solve this? Thanks for following along. I hope you learned something and had a good time. In the next section I’ll cover how brute force approaches like this aren’t always the best idea. And how a little bit of math knowledge puts you leaps and bounds ahead of most other programmers out there.