Problems worthy of attack prove their worth by hitting back. —Piet Hein

Monday, 31 December 2012

How far away is the sea?

I wanted an app to answer this question, so I wrote one:

You can try it out at It works well on phones too, so you can use it when you are out and about.

How does it work?

I used the dataset of land polygons from Natural Earth, which as the name suggests covers the whole world. The scale is 1:10 million, so inevitably there is some inaccuracy near the coast, particularly where it's wiggly.

The app uses your current location (or a location you selected by clicking on the map) and computes the closest point in the set of land polygons. This calculation is performed using the JTS Topology Suite, a library for 2D spatial work, and it runs as a Java webapp hosted on Google App Engine.

Originally I used Geotools to perform the geospatial calculations, but unfortunately it doesn't run on GAE, so I wrote an offline tool to convert the Natural Earth shapefiles to a JTS binary format. JTS works fine on GAE, but it lacks a distance calculator. Luckily spatial4j has the requisite distance functions, and it too works on GAE.

The webapp exposes a simple query endpoint, so a request for the following URL, for example:

will return a JSON document with the closest point on the coast, whether the (origin) location is on land or at sea, and the distance in metres to the coast:


The page that the user sees is a simple static HTML page that uses the Google Maps API (v3) to render the map and the markers, and jQuery to query the Java webapp.

The complete source code is on Github at

Further ideas

Some of the polygons are a poor approximation to the coastline, so it would be nice to get a higher-resolution dataset. There are likely many potential sources, such as this one for the UK.

It would be interesting to use the dataset to answer the question: "which is the furthest point from the sea [in the UK/in X/in the world]?". I'd like to find time to do that sometime. Adding in spatial indexes might be helpful too.

If you liked this app then you might like...

Is it day or night?

Sunday, 16 December 2012


IFTTT, pronounced "ift", and which stands for "if this then that", is a great service for wiring bits of the internet together. The idea is that you create rules for performing actions, based on triggers.

If this [trigger] occurs then perform that [action].

There are lots of triggers and actions, provided by channels. For example, the Weather Channel provides a trigger which fires at sunset. And the Google Talk Channel provides an action to send a chat message. I combined the trigger and action into a recipe called "Did you put the chickens to bed?" which will remind me (and Eliane) to close the chicken shed in the evening.

I love the simplicity of the whole thing. I quickly added a recipe to send a weekly SMS to remind me to put the rubbish out. And one to send an email to Lottie when there is a full moon. Emilia created a recipe to send her an email when a friend of hers posts something on his blog. I fear the recipe that tells me when it has started raining will be deleted soon due to email overload.

When you start thinking in this way, the more interesting uses invariably involve the the physical world in some way. I want to have a recipe that says "if we're running out of coffee beans then order some more", or "if I'm on Skype light up a lamp outside my office so the kids know not to come in" (this one is close with the blink(1) device), or even "it's actually dark now and you still haven't closed the chicken shed door".

Sunday, 9 December 2012


[I wrote this in July, but never got round to posting it.]

Last weekend I visited the U.S. Capitol in Washington, D.C., with my family, and I learned that the House of Representatives has 435 seats which are appointed so that each state has a number of seats that is proportional to its population. It sounded simple when the tour guide said it, but I wondered how are fractions handled fairly? Simply rounding off quotas doesn't work—firstly because some states could get no seats, which would be unfair, and secondly, how do you make sure that the rounding is both fair and assigns all 435 seats?

When I got home I read about the apportionment problem, as it is known, which has a long and interesting history. Wikipedia [1] is a good read, as usual; and [2] goes into the history and mathematics of different apportionment algorithms in depth, at least one of which suffers causes a paradox. Here I'm interested in looking at the algorithm that is used today to calculate apportionments for the House of Representatives, and why it is considered to be the fairest.

The Algorithm

The algorithm in use today for apportioning seats is due to Huntington and Hill and is known as the Huntington-Hill method, or the method of equal proportions. It's best understood as a dynamic process, which works as follows:

To start, each state is given one seat. (This ensures that states with relatively small populations, like Wyoming, get at least one seat.) Then, each remaining seat is allocated in turn to the state is allocated to the state with the highest priority, where the priority of a state of population \(P\) and \(n\) previously-allocated seats is defined as

\begin{align} \frac {P} {\sqrt{n(n+1)}}\label{pri} \end{align}

We'll see why the priority is defined as it is below, but for now notice that it is approximately \(P/n\), so the seat is given to the state that has the least number of representatives per person, roughly speaking.

Results for the 2010 Census

Running the algorithm for the state populations from the 2010 Census (using a program I wrote [5]) gives the following apportionment, which agrees with the U.S. Census Bureau [3]. (The quota column is the percentage of the population for each state.)

State Seats Population Quota People per representative
Alabama 7 4802982 6.76 686140
Alaska 1 721523 1.02 721523
Arizona 9 6412700 9.02 712522
Arkansas 4 2926229 4.12 731557
California 53 37341989 52.54 704565
Colorado 7 5044930 7.10 720704
Connecticut 5 3581628 5.04 716325
Delaware 1 900877 1.27 900877
Florida 27 18900773 26.59 700028
Georgia 14 9727566 13.69 694826
Hawaii 2 1366862 1.92 683431
Idaho 2 1573499 2.21 786749
Illinois 18 12864380 18.10 714687
Indiana 9 6501582 9.15 722398
Iowa 4 3053787 4.30 763446
Kansas 4 2863813 4.03 715953
Kentucky 6 4350606 6.12 725101
Louisiana 6 4553962 6.41 758993
Maine 2 1333074 1.88 666537
Maryland 8 5789929 8.15 723741
Massachusetts 9 6559644 9.23 728849
Michigan 14 9911626 13.94 707973
Minnesota 8 5314879 7.48 664359
Mississippi 4 2978240 4.19 744560
Missouri 8 6011478 8.46 751434
Montana 1 994416 1.40 994416
Nebraska 3 1831825 2.58 610608
Nevada 4 2709432 3.81 677358
New Hampshire 2 1321445 1.86 660722
New Jersey 12 8807501 12.39 733958
New Mexico 3 2067273 2.91 689091
New York 27 19421055 27.32 719298
North Carolina 13 9565781 13.46 735829
North Dakota 1 675905 0.95 675905
Ohio 16 11568495 16.28 723030
Oklahoma 5 3764882 5.30 752976
Oregon 5 3848606 5.41 769721
Pennsylvania 18 12734905 17.92 707494
Rhode Island 2 1055247 1.48 527623
South Carolina 7 4645975 6.54 663710
South Dakota 1 819761 1.15 819761
Tennessee 9 6375431 8.97 708381
Texas 36 25268418 35.55 701900
Utah 4 2770765 3.90 692691
Vermont 1 630337 0.89 630337
Virginia 11 8037736 11.31 730703
Washington 10 6753369 9.50 675336
West Virginia 3 1859815 2.62 619938
Wisconsin 8 5698230 8.02 712278
Wyoming 1 568300 0.80 568300

The Mathematics

The algorithm finally settled on by Congress was chosen because it was thought to be the fairest. There are different ways of defining what "fair" means, and so it cannot be settled mathematically. In this context "fair" is taken to mean "minimizes the relative difference in representatives per person between states".

To see how the algorithm meets this definition of fairness, let's see what happens when we examine any two states to see if transferring one seat between them would improve the apportionment. This is the argument published by E. V. Huntington in [4].

Suppose after the apportionment, state \(A\) has received \(x+1\) seats, and state \(B\) has received \(y\) seats. Furthermore, also suppose that \(A\) is over-represented because the number of people per representative is less than for \(B\):

\begin{align} \frac {A} {x+1} &\lt \frac {B} {y}\label{Aover} \end{align}

We can check this in the case of California and New York:

\begin{align} \frac {37,341,989} {53} &\lt \frac {19,421,055} {27}\nonumber \end{align}


\begin{align} 704565.83 &\lt 719298.33 \nonumber \end{align}

Now let's see what happens if we try to transfer one seat from \(A\) to \(B\)—does that make things fairer?

In the round when \(A\) won its last seat (number \(x+1\)), we know that its priority (defined by (\ref{pri})) was higher than \(B\)'s. That is,

\begin{align} \frac {A^2} {x(x+1)} &\gt \frac {B^2} {y(y+1)}\label{priority} \end{align}

(Note that even if \(B\) hadn't won its last seat (number \(y\)) at that point, the inequality still holds, since the number of seats it had would be less than \(y\).)

Again we can check this in the case of California and New York:

\begin{align} \frac {37,341,989^2} {52 \times 53} &\gt \frac {19,421,055^2} {27 \times 28}\nonumber \end{align}

Which is true. (The numbers also tally with the U.S. Census Bureau [6], and my program to calculate apportionments [5], where the priority value for California's last seat is \(711,308\), which is \(37,341,989/\sqrt{52 \times 53}\).)

Dividing (\ref{priority}) by (\ref{Aover}) we get

\begin{align} \frac {A} {x} &\gt \frac {B} {y+1}\label{Bover} \end{align}

which we can interpret as saying that \(B\) would be over-represented if one seat were transferred to it from \(A\). For our example of California and New York, this becomes

\begin{align} 718115.17 &\gt 693609.11 \nonumber \end{align}

The question now is, which over-representation is the smallest? That is, which is fairer, and therefore, to be preferred?

Using (\ref{Aover}), we calculate the relative difference before the transfer as

\begin{align} \newcommand{\slfrac}[2]{\left.#1\middle/#2\right.} \slfrac{ \left( \frac {B} {y} - \frac {A} {x+1} \right) } {\frac {A} {x+1}} = \frac {B(x+1)} {Ay} - 1 \label{Adiff} \end{align}

And, using (\ref{Bover}), the relative difference after the transfer is

\begin{align} \newcommand{\slfrac}[2]{\left.#1\middle/#2\right.} \slfrac{ \left( \frac {A} {x} - \frac {B} {y+1} \right) } {\frac {B} {y+1}} = \frac {A(y+1)} {Bx} - 1 \label{Bdiff} \end{align}

To compare these relative differences, note that we can rewrite (\ref{priority}) as

\begin{align} \frac {A(y+1)} {Bx} &\gt \frac {B(x+1)} {Ay} \end{align}


\begin{align} \frac {A(y+1)} {Bx} - 1 &\gt \frac {B(x+1)} {Ay} - 1 \end{align}

and the relative difference is smaller before the seat transfer (using (\ref{Adiff}) and (\ref{Bdiff})). So the original apportionment is optimal. There was nothing special about the choice of \(A\) and \(B\), so we can conclude that the apportionment is optimal overall.

Again, this checks out for our example. The relative difference for 53 seats for California and 27 for New York is \(0.021\), versus \(0.035\) for 52 for California and 28 for New York.


[1] United States congressional apportionment, Wikipedia.

[2] Apportionment: Introduction, American Mathematical Society.


[4] The Apportionment of Representatives in Congress, E. V. Huntington, Transactions of the American Mathematical Society, Vol. 30, No. 1. (Jan., 1928), pp. 85-110.

[5] A program to calculate apportionments, Tom White, July 2012.

[6] PRIORITY VALUES FOR 2010 CENSUS, U.S. Census Bureau.

Monday, 3 December 2012


I wrote a visualization of the populations of the largest cities in the US over the years. I got the idea when I was in Detroit in the summer, and read about the huge decline in Detroit's city population since the 1950s when the automotive industry was at its peak. According to Wikipedia's Shrinking cities in the USA page, Detroit has declined by 61.4% from its peak population. This is a decline of over 1 million, which makes it the largest decline among US cities in absolute numbers, but not in percentage terms since St. Louis has a slightly higher percentage decline (62.7%, or 537,502 people).

These figures are all for city populations, not for the greater urban or metropolitan areas, which, in the case of Detroit, have both significantly increased in size since the 1950s. Detroit has therefore seen one of the largest population shifts to the suburbs since the middle of the 20th century. Much has been written on the dramatic decline of Detroit's city population, and the impact on life there. These are some pieces that I found interesting:

How I wrote the visualization

I used the wonderful d3 library to write the visualization, combining elements of the Population Pyramid example with an updating bar chart. The documentation on Object Constancy was critical to getting the visualization to work.

The population data is from the US Census Bureau, although I found the actual files linked from Wikipedia's Largest cities in the United States by population by decade. I had to write some simple scripts to turn the source data into CSV files that d3 could read.

Thursday, 10 May 2012


I've just finished reading "Super Volcano: The Ticking Time Bomb Beneath Yellowstone National Park" by Greg Breinin. Despite the hyperbolic title, it's a really good introduction to the subject. Actually, the title is entirely appropriate, since the previous Yellowstone eruption around 600,000 years ago was one thousand times as powerful as the 1980 Mount St. Helens eruption. And it's likely to erupt again, but no one knows when.

We've been on a bit of a volcano tour recently. First we visited Lassen Volcanic National Park in October (climbing the Cinder Cone was a highlight), and we stopped in on Mount St. Helens visitor center on our way to Seattle last month. Yesterday we ventured into the Yellowstone caldera (the bit that blew out in the last eruption).

Before reading the book I hadn't appreciated how recent our understanding of Yellowstone's geology is. It was only in the 1960s that scientists combined new empirical data about the ages of different rock formations in the park with the then emerging theory of plate tectonics. One of the scientists was Robert Christiansen of the U.S. Geological Survey, who, with Richard Blank, collected samples from all over Yellowstone and pieced together the puzzle of how Yellowstone formed. (He also wrote the definitive account of Yellowstone's geology in 2001.)

They realized that the series of calderas between Oregon and Wyoming were all eruptions caused by what is now known as the Yellowstone hotspot over the last 16 million years. The continental plate is moving south west, which makes the newer volcanoes appear in the north east.

This diagram from Wikipedia summarizes it nicely: