Wednesday, April 4, 2012

Further adventures in sequence counting

So I grabbed a file with the 1st billion digits of pi to play with that as well. Using standard c types I ran into overflow issues when I got to around the 48 millionth digit. Luckily there is the GNU MP library. So converting things to using the mpz_t type I was able to scan the billion digits and get an answer. The call overhead for using mpz_t and its related functions only added 2 seconds of runtime.

The answer for 1000000000 digits is:

snits@cantor:~/proj/c/pi=>time ./nseqpi pi-1000000000.txt
sequence 1, 2, 3: 1000000000 166644616079693012548675 1323986657

real 0m18.796s
user 0m18.191s
sys 0m0.186s

Tuesday, April 3, 2012

An Improvement on Counting Subsequences in Pi

So I was thinking about this problem some more this evening while watching Dora the Explorer with my daughter, and I realized that I could use the same technique I was using to add up occurrences of the subsequence to track occurrences of the earlier portion of the subsequence and I would require only 3 counters and it would run in linear time. This solution should also work for subsequences of other lengths and remain O(n) bound.

Here is the solution:

1. While ! EOF, scan a digit
2. If it is the first digit in the subsequence, increment a counter.
3. If it is the second digit in the subsequence, increment second counter by current value of counter tracked in 2
4. If it is the third digit in the subsequence, increment occurrences counter by the current value of the counter tracked in 3.
5. goto 1


Here is the output from the old version:

sequence 1, 2, 3: 1000000 167568126814676 10031551523

real 0m41.618s
user 0m41.436s
sys 0m0.004s

Here is the output from the new version:

sequence 1, 2, 3: 1000000 167568126814676 1300014

real 0m0.019s
user 0m0.018s
sys 0m0.001s


As stated in the previous post on this, the first number is the number of digits scanned, the 2nd is the number of occurrences of the subsequence, and the 3rd is the number of logical operations.