Make GitHub R Code Available within R
After gradually migrating most of my workflow from Subversion to GitHub I discovered an itty, bitty, tiny, huge freakin’ problem. Part of my old workflow involved me storing code I would use again and again in a public repository then source-ing the code directly into R as needed. It also made this code easy for me to share with others, especially students and collaborators. No problem.
GitHub is superior to Subversion in notable ways, but that’s not our topic here. GitHub does make it easy to read source code directly from the site as plain text. Here’s an example of an address for a bit of code I use almost daily to give me a clean R session.
https://raw.github.com/shaptonstahl/R/master/Decruft/Decruft.R
Anyone see the problem? Two things: (1) The URL for the plain text version of code hosted on GitHub is reached via a secure connection, and (2) R can’t source via https without the use of an external library. I’m a big fan of R’s external libraries, but it doesn’t fit the purpose of the code. This code usually sits at the top of just about every .R file I write:
## Start fresh! source("http://address.of/Decruft.R")
Isn’t that pretty? Short, sweet, easy to remember. This is how I used to do business. Unfortunately, this is the most concise way that I could find for doing it with GitHub:
### Fugly code if( !any("devtools" == installed.packages()[,"Package"] ) install.packages("devtools") library(devtools) source("https://github.com/crikey/thats/as/long/as/the/code/Im/sourcing.R")
I checked the Google and such but nobody seemed to be asking precisely what I was: how can I read code stored on GitHub in plain text using http, not https? We will not be discussing how long it took me to come up with a satisfactory solution. Let’s just say it took long enough that I really don’t want anyone else to have to go through it.
Here’s my solution:
- Have a Web server running PHP that allows you to create and use .htaccess files.
- Choose a URL for the stem of where your code will appear to be.
- Use .htaccess to point 404 Not Found errors to a custom error page.
- The PHP-based error page uses https to get the live file from GitHub and feeds it to the person requesting it.
It could be worse. I decided that I would request pages from (nonexistent) subfolders of http://www.haptonstahl.org/R/ in order to read code stored under https://raw.github.com/shaptonstahl/R/. So I put two little files in the document root of my site. The first is named .htaccess and contains this:
ErrorDocument 404 /404.php
The other file is the 404.php file mentioned in .htaccess. You can download the PHP file here. This is a copy of the actual one I am using. Now to get a clean R session I just type the following:
source("http://www.haptonstahl.org/R/Decruft/Decruft.R")
Easy peasy. Perhaps the best part is that I’m done. I never have to update or modify this if I want to source other public code in that GitHub repository. For example, without changes I can source
http://www.haptonstahl.org/R/RoundBoundsNicely/RoundBoundsNicely.R
to get
https://raw.github.com/shaptonstahl/R/master/RoundBoundsNicely/RoundBoundsNicely.R
Lesson to take home:
- Putting code you reuse up on teh webz makes it easy for you to use it over and over instead of writing it over and over.
- GitHub rocks, now even more since I can source my R code from the live GitHub versions.
- Safety first, kids!
Hey, nice post. I have tried to solve this problem by creating an installable R package (called sacbox: https://github.com/SChamberlain/sacbox) that has all the functions I commonly use. I then just load this package when I start R by putting library(sacbox) in my .Rprofile file. This is a nice solution becuas I can add roxygen documentation, include examples, etc., that I can browse in the R help.
Thanks, Scott
I started by making a package to do this but I realized that I wanted to be able to source Github code in a single line. That precludes first installing and then loading a specialized package.