Euler 2

The second Project Euler problem.

Given the Fibonacci sequence, calculate all terms below 4 million. What is the sum of the even terms?

We are going to use the library “numbers” a lot!
It includes a function, that allows us to generate the Fibonacci-sequence. But how many terms should we get?
By inspection we find that the 34th term is:

library(numbers)
fibonacci(34)
## [1] 5702887

Therefore we only need the 33 first terms:

(f <- fibonacci(33, sequence = TRUE))
##  [1]       1       1       2       3       5       8      13      21
##  [9]      34      55      89     144     233     377     610     987
## [17]    1597    2584    4181    6765   10946   17711   28657   46368
## [25]   75025  121393  196418  317811  514229  832040 1346269 2178309
## [33] 3524578

We only have to sum the even terms:

sum(f[!f%%2])
## [1] 4613732

Lets split that up a bit. %% i modulus. It returns what is “left” in the division.
4%%2 returns 0, because there is no remainder – because 4 is an equal number.
When we do that on all the terms in the sequence, we get:

f%%2
##  [1] 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0

0 for the equal terms, 1 for the unequal terms.
0 is equivalent to FALSE, and 1 to TRUE.
And ! negates the logical value:

!f%%2
##  [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
## [12]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
## [23] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE

Compare that with the original sequence – now the value is FALSE for the unequal numbers, and true for the equal.
We can use that to pick out the equal terms in the sequence:

f[!f%%2]
##  [1]       2       8      34     144     610    2584   10946   46368
##  [9]  196418  832040 3524578

And then it is simple enough to just sum them:

sum(f[!f%%2])
## [1] Censored

Lessons learned:
1. You should always save your Euler solutions, otherwise you will have to reconstruct them when you decide to put them all on your website three years later.
2. Numbers is a really usefull library.

Merging PDF-files

You may need to merge several pdf-files into one.

There are several tools. Some of them you have to pay for.

If you are in a linux environment, try this:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=final.pdf filea.pdf fileb.pdf

continue with filec.pdf etc.

Some say it is slow.

Others say, yes it is slow, but it produces smaller final files, and are able to handle “difficult pdfs”.

I say it is nice that it can be done from the command line.

Euler 1

The very first Euler problem. And a relatively simple one.

Find all natural numbers below 1000, that are multiples of 3 or 5. What is the sum?

threes <- seq(3,999,by=3)
fives <- seq(5,999,by=5)
both <- unique(c(threes,fives))
sum(both)
## [1] Censored

seq(3, 999, by=3) gives us all numbers below 1000 that are a multiple of 3. Fives contains the numbers that are multiples of 5. Both is the union – unique removes duplicate elements. And sum finally gives the result.

Lessons learned:

  1. Make sure you always save your work. Otherwise you will have to reproduce it.

Euler 3

Problem 3 in Project Euler

What is the largest prime factor in 600851475143

Super simple. The library numbers have a function primeFactors(). It simply returns all prime factors.
Put that inside a max(), and you have the result.

library(numbers)
max(primeFactors(600851475143))
## [1] Censored

Using the library purrr, it can be written like this, and this is a way of coding that I need a bit more practice in, so lets do it that way as well.

library(purrr)
600851475143 %>%
  primeFactors() %>%
  max()
## [1] Censored

The %>% are a pipe function. It takes whats on the left, and passes it to whats on the right. In this case the number is piped to the function primeFactors(), and the result of that function is piped to the max() function. And you get the result.

Lesson learned – nope. I already knew that numbers is a pretty useful library.

Euler 49

There is a sequence 1487, 4817 and 8147 that has three properties:
1. All numbers are primes
2. They are all permutations of each other
3. Each term increases by a certain number (in this case 3330)

Project Euler claims that there are two such sequences with four-digit numbers. Our task is to find the other.
The answer is the three primes with the required properties, pasted together in ascending order.

As usual I’ll load the numbers library. And the first task must be to find all four-digit primes.

library(numbers)
cand <- Primes(1000,9999)

But we will not need these prime. We already know that answer:

cand <- cand[-which(cand %in% c(1487, 4817, 8147))]

If one of these four-digit primes is part of such a sequence, there must exist at least two other primes, which are permutations of it.
Lets begin by generating that list
The library gtools provides the function permutations()

library(gtools)

listing <- function(x){
  b <- unlist(strsplit(as.character(x),""))
  c <- permutations(4,4,v=b, set=FALSE)
  d <- apply(c, 1, paste, collapse="")
  d <- as.numeric(d)
  sort(unique(d[d %in% cand])) 
}

We take a four-digit number as input, converts it to characters via as.character(), and uses strsplit() to get a character vector with for numbers.
Then permutations. The first “4” indicates that the source vector (b) has 4 elements. The second “4” that we want results that contains 4 elements. That we take “b” as the source vector – i.e. the elements from which the permuations should be made. And set=FALSE is a lgical flag. Default is TRUE, and will remove duplicates. That is not what we want, so it is set to FALSE.
Then all the permutations are collapsed to four-digit strings, converted to numeric, and duplicates are removed with unique(), returning a list of all permuations of the digits in the input, that are in the list of candidate primes.

Thats nice. But we know that there should be three numbers in the sequence. This function simply tests if there are at least three numbers in the listing:

poss <- function(x){
  length(listing(x)) >= 3
}

That was the simple part. We can now take all the four-digit primes, excluding the three we already know about, make permutations of them, and see if there are at least three permutations among them, that are prime.

Now it gets a bit more complicated. Given a set of four-digit numbers, we need to locate a subset of three, A, B and C. The differences C-B and B-A should be equal.
There are probably more elegant ways to do that, but this is the way I found.
Given a number x, get all possible permutations of that:
listing(x)
Get all the combinations of three of these numbers:
comb(listing(x),3)
Now we have a matrix, with all the possible combinations in the columns.
Find the differences between the first and the second row, and the second and the third row, for all columns:
apply(apply(combn(listing(x), 3),2,diff)
If the two differences are equal, they fullfil the third property these numbers should have. Therefor we need to find the differende between the two differences. It that is 0, we have our result:
apply(apply(combn(listing(x), 3),2,diff),2,diff)==0])
That gives a logical vector, that is TRUE whenever the difference between the three numbers is the same. We get the numbers by:
combn(listing(x), 3)[,apply(apply(combn(listing(x), 3),2,diff),2,diff)==0]
I’ll define that as a function:

corrComb <- function(x){
  combn(listing(x), 3)[,apply(apply(combn(listing(x), 3),2,diff),2,diff)==0]  
}

It would be nice to have that as a logical test as well. I’m running out of ideas for reasonable function names:

test <- function(x){
  as.logical(sum(corrComb(x)))
}

Very simple – if there is a combination fulfilling the requirements, return TRUE.

Now, lets put it all together. I’ll use the library purrr, which allows me to use these nifty pipes.

Taking the candidate primes, we keep the ones that have three or more permutations that are prime, and in our candidate list. Of those, we keep the ones that have fulfill the requirement that the pairwise differences between three of them are equal.
Those are piped to corrComb, that returns the three numbers. They are piped to sort, so we get them in ascending order, and then I paste them together.

library(purrr)
answer <- cand %>%
  keep(poss) %>%
  keep(test) %>%
  corrComb%>%
  sort %>%
  paste(.,collapse="")

Lessons learned.

  1. Those pipes are pretty neat.
  2. The apply family is also pretty cool
  3. Read the documentation. I spent some time being confused about the results from permuations() until I discovered the set-flag

Euler 40

We construct a decimal fraction, by concatenating all positive integers:
0.123456789101112 etc.

We define Dx as the digit on position x. Eg D1 = 1, D2 = 2 etc.

What is the product:
D1 * D10 * D100 * D1000 * D10000 * D100000 * D1000000

Just looking at the fraction, we can see that D1 = 1 and D10 = 1.
Let’s quickly save those variables:

D1 <- 1
D10 <- 1

But what about D100?

Adding all one-digit integers gives us 9 digits in the fraction.
Adding all two-digit integers gives us 90 * 2 digits in the fraction.

Given that we have the first 9 digits from the one-digit integers, we need to add 90 two-digit integers to get to D99.
That is 45 integers. The first is 10, the second is 11 etc. So number 45 is 54.

If you want convincing:

9 + length(10:54)*2
## [1] 99

D99 is equal to 4. 54 was the two-digit integer we had to add to get to a total of 99 digits, and the last digit in 54 is 4.
The next two-digit integer we can add is 55. Therefore D100 = 5. Lets save that:

D100 <- 5

Now we’ll find D1000.

Adding all two-digit integers brings us to a total of:

9 + 90*2
## [1] 189

The last integer we add is 99. So D189 = 9.

We now have to add:

999-189
## [1] 810

digits to the fraction. Doing that with three-digit integers, means that we should add:

810/3
## [1] 270

Integer number 270 we add is 369. That will give us D999 = 9.
The next integer we add to the fraction will be 370. So:

D1000 <- 3

Next – D10000

Adding all three-digit integers gives at total of:

9*1 + 90*2 + 900*3
## [1] 2889

We have to at a total of:

10000-2889
## [1] 7111

digits, and do it by adding four-digit integers.

7111/4
## [1] 1777.75

of them to be precise.

Adding 1777 four digit integers brings us to:

2889 + 1777*4
## [1] 9997

The four-digit integer we add to get to that is 2776. D9997 = 6
The next integer we add to it is 2777. Leading to:

D10000 <- 7

D100000

All four-digit integers gets us to:

9*1 + 90*2 + 900*3 + 9000*4
## [1] 38889

We need to add

100000-38889
## [1] 61111

digits. We are now getting to five-digit integers, so we need to add

61111/5
## [1] 12222.2

of those.
Når vi har tilføjet 12222 tal er vi nået op på:

9 + 90*2 + 900*3 + 9000*4 + 12222*5
## [1] 99999

Integer number 12222 we add (of the five-digit variety) is 22221. D99999 = 1
The next we add is 22222 and:

D100000 <- 2

Finally D1000000

All integers up to and including five-digit integers:

9+90*2+900*3+9000*4 + 90000*5
## [1] 488889

We need to add:

1000000-488889
## [1] 511111

And do it by adding six-digit integers.

511111/6
## [1] 85185.17

The final six-digit integer we concatenate to the fraction before reaching the million is 185184.

When we have done that we have:

9 + 90*2 + 900*3 + 9000*4 + 90000*5 + 85185*6
## [1] 999999

digits (D999999=5).
The next six-digit integer we add is 185185. And:

D1000000 <- 1

Giving the result:

D1 * D10 * D100 * D1000 * D10000 * D100000 * D1000000
## [1] Censored

Lessons learned:
1. Sometimes you don’t have to code all that much.
2. I should practise explaining these sort of things without using quite as many words.

Command line tools

Linux have a lot of small tools, that only does one thing. But do it really well (compare to Windows, that has a lot of large tools that does everything rather badly).

This is really just a note to myself. These tools are really useful but they are not (yet) second nature to me. I often find myself in the situation, where I know there is a tool for something, that I have used several times before. But simply can’t remember what it was.

grep. Searches files for lines matching a regular expression. Useful parameters (or at least parameters I have a regular use for):

-c returns a count of the lines matching.
-n returns the linenumber (in the file) of the matching line.

tail. returns the last part of a file

-n 6 returns the last 6 lines of the file (standard 10)

head. Returns the first part of a file

-n 6 returns the first 6 lines of the file (standard 10)

cut. Removes sections of lines in a file (or other input)

-d x. Splits the line at x. Use ‘ ‘ for space
-f 1. Select the first field.

wc. Counts stuff in files.

-l. counts the lines in the file (or other input)
-w count the words in the file (or other input)

|. Piping. Takes the result of the command in front of it, and pass it to the command after it (and that is the direction. If you find examples on Stackoverflow that will only give the desired result if the direction is reversed, don’t be surprised if it does not work…)

cat. Prints one or more files to standard output (your screen).

But if we print to another output, eg with “> file.name”, we can concatenate several files.

find. Searches for files. “find .” finds everything. Pipe it to grep to search for something specific. eg “find . | grep ‘acta'” to find all files containing the string “acta”.

-print prints the complete filename.
-print0 prints the complete filename even if it includes a newline.

csvsql. Commandline tool for analysing csv-files using sql-syntax

csvstat. Commandline tool for analysing csv-files

 

Copy rows to another sheet – based on cell-values

And handling images while you’re at it.

Given: We have some data in a sheet – lets call it Source. Based on some values in another sheet – lets call that Condition – we want to copy rows from Source to a third sheet. We’ll call that Target.
To complicate things, we want to copy images as well.

Set some variables to Target, Source and Condition.
Delete the content of the existing target sheet. First alle the images, and then the rest. Note that I’m not deleting everything, just from row 6 and down.
Then for each something (d) in column B (adjust ranges – here I’m only looking at the rows from 2 to 9), check if the relevant row in Source matches, then copy to Target.

There’s a small detail here, I needed to insert an identifier in Target, defined by a value in Condition. Instead of trying to insert in Column B, I’m just searching and replacing a placeholder – “£$”, a string I was pretty certain would not show up anywhere.


Sub CopyYes()
Dim c As Range
Dim j As Integer
Dim Source As Worksheet
Dim Target As Worksheet
Dim Condition As Worksheet
Dim k As String
Dim fnd As Variant
Dim rplc As Variant
fnd = "£$"

Set Source = ActiveWorkbook.Worksheets("Ark4") 'Note that ranges in Souce and Condition below should be adjusted. We're not quite there yet.
Set Target = ActiveWorkbook.Worksheets("Ark3")
Set Condition = ActiveWorkbook.Worksheets("Ark1")

' Start by clearing target sheet
' begin with images
Target.Pictures.Delete
' Then we'll delete the rest
'
With Target
.Rows(6 & ":" & .Rows.Count).Delete
End With

j = 7 'This will start copying data to Target sheet at row 1
For Each d In Condition.Range("B2:B9") 'Ark1
k = d.Offset(0, -1)
rplc = k
For Each c In Source.Range("B2:B52") 'Ark2
If d = c Then
Source.Rows(c.row).Copy Target.Rows(j)
j = j + 1
End If
Next c
Target.Cells.Replace what:=fnd, Replacement:=rplc

Next d
'we'll end by hiding some columns
Target.Columns("A:E").Hidden = True
End Sub

Changing class on columns i a dataframe

Given a dataframe with some columns that has a “wrong” class, eg character, that should be numeric. How to make that change.

i <- as.character(1:10)
c <- rep("a",10)
c
##  [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
df <- data.frame(a=i,b=i, c=c, d=i, stringsAsFactors = FALSE)
sapply(df,class)
##           a           b           c           d 
## "character" "character" "character" "character"

The dataframe has four columns. The columns a,b and d are characters, but should be numeric.

df[,c("a","b","d")] <- sapply(df[,c("a","b","d")], as.numeric)
sapply(df,class)
##           a           b           c           d 
##   "numeric"   "numeric" "character"   "numeric"

An important thing to note here is that apply functions a bit different from the other members of the apply-family:

apply(df,2,class)
##           a           b           c           d 
## "character" "character" "character" "character"
sapply(df,class)
##           a           b           c           d 
##   "numeric"   "numeric" "character"   "numeric"

What happens is that apply coerces df to an array. And all content in an array have to be of the same class. The lowest common denominator for that is character.

if (!require('knitr')) {
  install.packages("knitr")
}
if (!require('devtools')) {
  install.packages("devtools")
}
if (!require('RWordPress')) {
  devtools::install_github(c("duncantl/XMLRPC", "duncantl/RWordPress"))
}
library(RWordPress)
library(knitr)

options(WordPressLogin=c(ChristianKnudsen="2ZKwcFS6vD*fqWMQ"),
        WordPressURL="https://christianknudsen.info/xmlrpc.php")
knit2wp('classchange.Rmd', title = 'Changing class on columns i a dataframe',publish = FALSE)
## 
## 
## processing file: classchange.Rmd
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |.......                                                          |  11%
##   ordinary text without R code
## 
## 
  |                                                                       
  |..............                                                   |  22%
## label: unnamed-chunk-55
## 
  |                                                                       
  |......................                                           |  33%
##   ordinary text without R code
## 
## 
  |                                                                       
  |.............................                                    |  44%
## label: unnamed-chunk-56
## 
  |                                                                       
  |....................................                             |  56%
##   ordinary text without R code
## 
## 
  |                                                                       
  |...........................................                      |  67%
## label: unnamed-chunk-57
## 
  |                                                                       
  |...................................................              |  78%
##   ordinary text without R code
## 
## 
  |                                                                       
  |..........................................................       |  89%
## label: unnamed-chunk-58
## 
  |                                                                       
  |.................................................................| 100%
##   ordinary text without R code
## output file: classchange.md
## [1] "152"
## attr(,"class")
## [1] "WordpressPostId"
options(WordPressLogin=c(admin="7g0G9!27"),
        WordPressURL="https://nusse.dk/xmlrpc.php")
knit2wp('classchange.Rmd', title = 'Changing class on columns i a dataframe',publish = FALSE)
## 
## 
## processing file: classchange.Rmd
## Error in parse_block(g[-1], g[1], params.src): duplicate label 'unnamed-chunk-3'

Euler 80

Problem 80 from Project Euler.

The problem tells us that if the square root of a natural number is not an integer, it is irrational.
Project Euler also claims that it is well known. I did not know it.

We are then told that the square root of 2 is 1.4142135623730950… And that the digital sum of the first 100 digits is 475.

The task is now to take the first 100 natural numbers. And find the total of the digital sums for the first 100 digits for all the irrational square roots.

Lets begin by figuring out how to handle that many digits. R does not support more than around 15 places after the decimal point.

The library Rmpfr can handle arbitrary precision:

library(Rmpfr)
a <- sqrt(mpfr(2,500))

The variable a now contains the square root of 2 with a precision of 500 bytes. I’m not quite sure how many decimal places that actually translates to. But testing the following code allows me to conclude with confidence that it is at least 100.

A thing to note here is, that

a <- mpfr(sqrt(2),500)

and

a <- sqrt(mpfr(2,500))

are not equal. In the first exampel sqrt(2) is evaluated before saving the value with the high precision. Start by converting the number 2 to a high precision representation, before doing math on it.

Next is writing a function that will return the digital sum of the first 100 digits of a number.

digitsum <- function(x){
  s <- 0
  for(i in 1:100){
    s <- s + floor(x)
    x <- (x - floor(x))*10
  }
  s
}

First s is initialized to 0. Then floor(x) gives us the first digit in x. We add that to s, and subtract it from x, and multiply by 10. Repeat that 100 times, and you get the sum of the first 100 digits in x.

Let us test that it works. Project Euler told us what the result for sqrt(2) is:

digitsum(a)
## 1 'mpfr' number of precision  500   bits 
## [1] 475

Nice, the correct result (not that that guarantees that I’ve done everything correctly).

Now, lets find all the irrational square roots we need to look at:

library(purrr)
t <- 1:100
s <- t %>%
  keep(~as.logical(sqrt(.x)%%1))

I need to practice this way of coding a bit more. t contains the first 100 natural numbers. I pass that to the keep()-function, and the predicate function takes the square root of each number, take the modulus 1, and convert it to a logical value. If the modulus of the square root is 0, the square root is an integer, and 0 i false. So we’re keeping all the non-integer squareroots.

Now I’ll convert all the natural numbers to the mpfr-class. The next line takes the square root. The third line calculate the digitalsum. And the final line gives us the result.

s <- mpfr(s,500)
r <- sqrt(s)
r <- digitsum(r)
sum(r)
## 1 'mpfr' number of precision  500   bits 
## [1] censored

Lessons learned:
Rmpfr allows us to work with (more or less) arbitrary precision.
But we need to convert numbers to the relevant class before doing math on it.