## Monday, May 03, 2010

### Saying Things with Graphs

Some of you may be familiar with sites like the always clever Indexed. But long before the internets, people have always been condensing lots of information into graphs and other visual representations.

The graph above shows the relationship between the width and height of a collection of Oreohelix shells. One might look at the information and see a closely linear relationship between the individual points of data and conclude they represent one species. We could follow this hypothesis by looking at the pattern of other dimensions of these shells. We might become very confident in our conspecific hypothesis if we could obtain genetic or soft-tissue in addition to the data from the shells. Unfortunately for paleontologists, soft-tissues and genetic sequences are hard to come by.

Here we see a relationship between the number of pirates and the amount of treasure they can obtain in a day. In this case, the trend appears to be linear for one to three pirates, but quickly approaches a maximum limit. The amount of booty that six pirates can get is not much more than the booty for ten pirates. This graph does not go above 10, however. It may be that the limiting factor (about 6,000 of booty per day) is the ship. If we had a sufficient number of pirates and boats, perhaps we could see another "plateau" at around 20 or 30 pirates. Again, we don't know for sure from this data, but it allows us to make testable hypotheses about the system.

Perhaps we have a hypothesis that about 20% of all pirates (1 in 5) have eyepatches. This graph shows that as the number of observed pirates increases, the number of eyepatches also increases. The slope of the line is almost 0.2 - what we would expect if our 20% hypothesis were correct. But, the "# of Eyepatches" does NOT tell us that each pirate is wearing the eyepatch. If this is just an independent observation of eyepatches, it tells us nothing about whether any actual pirate is wearing one. We must be careful - both in our interpretation of the data, but also the display of the data. Perhaps the eyepatch value IS related to a particular pirate. By not telling the reader this, we have actually made our findings harder to understand.

Finally, what about complex systems or behaviors? If we know that pirates like rum, we might be able to infer pirate behavior based on the amount of rum. In this graph, we can see that the amount of rum available decreases with time. If there are only a few pirates, the run decreases in a slow, linear pattern. If there are many pirates, the amount of rum decreases rapidly in a non-linear fashion. Perhaps the more pirates there are, the more intense the drinking. Finally, if there is a ninja (a well-known enemy of pirates) nearby, the amount of rum does not change much at all. But then decreases in a non-linear pattern as if there were more pirates. It may be that the ninja forces the pirates to do battle, distracting them from any drinking. But after the battle, the remaining pirates celebrate more intensely than they would otherwise.

Keep in mind, these conclusions are not conclusively shown by this data. But they could be tested with further observations and data. In many ways, that's what good science does. It formulates hypotheses and interpretations based on existing information - interpretations that can be supported or falsified with additional information. If there was no way to corroborate or reject the interpretation, it is not science. It might be a good story, but it does nothing to further our understanding.