A new home for pifreak

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

pifreak is my twitterbot. It started tweeting the digits of pi in April 2012 and has tweeted the next 140 digits at 3:14 pm GMT every day since. Not especially useful or popular (only 48 followers) but I’ve grown fond of she/he/it.

Screen Shot 2014-08-19 at 14.27.11

I was housing her on an AWS ec2 micro instance, however my one year of free hire ran out and it has become a little too expensive to keep that box running.

So I’ve been looking at alternatives. I’ve settled on the google app engine which I’m hoping is going to come out as pretty close to free hosting.

So here’s a few notes for anyone else who might be thinking of using the google app engine for automated posting on twitter.

It was reasonably simple to set up

  1. Download the GAE python SDK. This provides a GUI for both testing your code locally and then deploying it to the cloud when you are happy with it.
  2. Create a new folder for your app and within that place your python modules together with an app.yaml file and a cron.yaml file which will configure the application and schedule your task respectively. It’s all very well documented here and for the cron scheduling here.
  3. Open the App Engine Launcher (which is effectively the SDK), add your folder, then either hit run to test locally or deploy to push to the cloud (you’ll be taken to some forms to register your app if you’ve not already done so)
  4. Finally if you click on dashboard from the launcher you’ll get lots of useful information about your web deployed app including error logs and the schedule for your tasks.

The things that caught me out were:

  1. Make sure that the application name in your app.yaml file is the same as the one you register with Google (when it takes you through to the form the first time you deploy.)
  2. There wasn’t a lot in the documentation about the use of the url field in both the cron and app yaml files. I ended up just putting a forward slash in both since in my very simple app the python module is in the root.
  3. Don’t forget module names are case sensitive so when you add your python module in the script section of the app file you’ll need to get this right.
  4. Yaml files follow an indentation protocol that is similar to python. You’ll need to ensure it’s all lined up correctly.
  5. Any third party libraries you need that are not included in this list will need to be included in your app folder. For example I had to include tweepy and some of its dependencies
  6. Where the third party library that you need is included in the GAE runtime environment you need to add it to the app file using the following syntax

    libraries:
    - name: ssl
    version: "latest"

And here finally is a link to the code.

Are there any good five letter dot com domains?

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Here’s a python script I wrote when tearing my hair out over domain names.

It creates all five letter word permutations, ranks them by how pronounceable they are (in a rough and ready kind of way), uses wiktionary to check they are a word in some language then finally checks whether the domain is free.

I didn’t like any of them.

Lazy D3 on some astronomical data

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

I can’t claim to be anything near an expert on D3 (a JavaScript library for data visualisation) but being both greedy and lazy I wondered if I could get some nice results with minimum effort. In any case the hardest thing about D3 for a novice to the world of web design seems to be getting started at all so perhaps this post will be useful for getting people up and running.

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

The images above and below are visualisations using D3 of a classification hierarchy for astronomical objects provided by the IVOA (International Virtual Observatory Alliance). I take no credit for the layout. The designs are taken straight from the D3 examples gallery but I will show you how I got the environment set up and my data into the graphs. The process should be replicable for any hierarchical dataset stored in a similar fashion.

A dendrogram of the same data

A dendrogram of the same data

Even better than the static images are various interactive versions such as the rotating Reingold–Tilford Tree, the collapsible dendrogram and collapsible indented tree . These were all created fairly easily by substituting the astronomical object data for the data in the original examples. (I say fairly easily as you need to get the hierarchy into the right format but more on that later.)

First the environment.

If you are an R user with little experience of building web pages then you’ll probably find yourself squinting at the D3 documentation wondering how you get from the code to the output. With just a browser to read the JavaScript and an editor (notepad if you like but preferably a specialist HTML editor) you can get through the most of the tutorials referred to in Mark’s previous post . Doing this locally on your own pc works ok because the data visualised in the tutorials is mostly hard coded into the JavaScript. However once you want to start referring to data contained in external files some sensible security restrictions on what files your browser can access will block your attempts. The options are to turn these off (not advised) or switch to working on a webserver.

You can either opt for one of the many free hosting services or if you are feeling more adventurous you can follow Mark’s posts on setting up a Linux instance on Amazon Web Services and then follow Libby Hemphill’s instructions for starting up Apache and opening the port. Finally, going with the latter, I use FileZilla to transfer over any data I need to my Linux instance. See this post for getting it to work with your authentication key.

This should leave you in the following situation:

  • You have a working web server and an IP address that will take you to an HTML index page from which you can provide links to your D3 documents.
  • You have a way of transferring work that you do locally over to your web server.
  • You have a whole gallery of examples to play around with

Rather than creating my own D3 scripts I’m just going too substitute my own data into the examples. The issue here is that the hierarchical examples take as an input a nested JSON object. This means something like:

{
 "name": "Astronomical Object",
 "children": [
  {
   "name": "Planet",
   "children": [
    {
     "name": "Type",
     "children": [
      {"name": "Terrestrial"},
      {"name": "Gas Giant"}
     ]
    }

The issue is that our data looks like this:

1. Planet
1.1. [Type]
1.1.1. Terrestrial
1.1.2. Gas Giant
1.2.  [Feature]
1.2.1. Surface
1.2.1.1. Mountain
1.2.1.2. Canyon
1.2.1.3. Volcanic
1.2.1.4. Impact
1.2.1.5. Erosion
1.2.1.6. Liquid
1.2.1.7. Ice
1.2.2. Atmosphere
1.2.2.1. Cloud
1.2.2.2. Storm
1.2.2.3. Belt
1.2.2.4. Aurora

To put this into the right format I’ve used Python to read in the file as csv (with dot as a delimiter) and construct the nested JSON object. Here’s the full python code:

import csv, re
#Open the file and read as dot delimited
def readLev(row):
    level=0
    pattern = re.compile("[a-zA-Z]")
    for col in row:
        if pattern.search(col) !=None:
            level=row.index(col)
    return level
with open('B:PythonFilesPythonInOutAstroObjectAstro.csv', 'rb') as csvfile:
    astroReader = csv.reader(csvfile, delimiter='.')
    astroJson='{"name": "Astronomical Object"'
    prevLev=0
    for row in astroReader:
        #Identify the depth
        currentLev=readLev(row)
        if currentLev>prevLev:
            astroJson=astroJson+', "children": [{"name": "' + row[currentLev]+'"'
        elif currentLev<prevLev:
            jump=prevLev-currentLev
            astroJson=astroJson+', "size": 1}'+']}'*jump+', {"name": "' + row[currentLev]+'"'
        elif currentLev==prevLev:
            astroJson=astroJson+', "size": 1},{"name": "' + row[currentLev]+'"'
        prevLev=readLev(row)
    astroJson=astroJson+ '}]}]}'
    print astroJson

Since only the level of indentation matters to this process it could be repeated with on any data that has the form

* Planet
** [Type]
*** Terrestrial
*** Gas Giant

So that’s it. You’ll see that it in the source code for the hierarchical examples there is a reference to flare.json. Substitute in a reference to your own file containing the outputted JSON object and be sure to include that file in the directory on your web server.

Of course it’s a poor substitute for learning the language itself as that enables you to construct your own innovative visualisations but it gets you started.

#sherlock & the power of the retweet

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Much has been made over the last few days of Sherlock writer Steven Moffat‘s views on people who tweet whilst watching TV. Whilst watching it last night, I kept an eye on the tweets during the show and there was clearly a lot of volume going through the Twitter-sphere.

Interested to find out a bit more about the volumes, I used this excellent (and well used) Python script to pull the set of tweets from the beginning of the show through to 9.30am GMT this morning.

First off, a few head line figures:

  • Between 8pm and midnight there were more than 93,000 tweets and retweets.
  • Tweets per minute peaked at 2,608 at 10.30pm (excluding “RT” retweets). That’s more than 43 tweets per second on average.
  • There were more than 10 retweets per second at 10.45pm of @steven_moffat’s “#sherlock Yes of course there’s going to be a third series – it was commissioned at the same time as the second. Gotcha!” tweet.

The data, along with a little Tableau time series visualisation are below.

Update: Tableau Public doesn’t seem to play nicely with a WordPress-hosted blog, so click here to open in a new tab.

sherlock

Machine Learning and Analytics based in London, UK