Adding depth to flat data with Needle
Written by Chuck McCallum Thursday, 09 September 2010 20:19
Much of the data on the web seems to be in tables, and for the most part, these tables work pretty well. While we've made the case that in many cases a graph structure (like Needle's) is a better way of representing the data, does Needle oblige you to use anything more complicated than a table? No. Are there advantages to using a more complex structure, even when the data comes from a flat table? Often, yes.
Take our Aviation Accidents domain as an example. On the NTSB website, the data is available in a table like this:
| Accident Number | Aircraft Damage | Date | Location | Event ID | Aircraft Registration Number | Model | Make |
|---|---|---|---|---|---|---|---|
| DCA91MA010A | Destroyed | 12/3/1990 | Romulus, MI | 20001212X24751 | N3313L | DC-9-14 | McDonnell Douglas |
| DCA91MA010B | Substantial | 12/3/1990 | Romulus, MI | 20001212X24751 | N278US | 727-251 | Boeing |
| CHI90FA278 | Minor | 9/28/1990 | Detroit, MI | 20001212X24215 | N278US | 727-251 | Boeing |
- Accident Number
- Aircraft Damage
- Date
- Location
- Event ID
- Aircraft Registration Number
- Model
- Make
Could this model be improved? Certainly. Note that "Location" and "Date" are equal in rows that have the same "Event ID": These rows represent an accident which involved multiple aircraft, and the location and date are necessarily duplicated across these rows. On the other hand, when rows have the same "Aircraft Registration Number", the "Make" and "Model" are repeated: In this case a single aircraft was involved in multiple accidents.
With that in mind, here's a simplified view of the model the Aviation Accidents domain actually uses:
- Event ID
- Date
- Location
- Accident Number
- Aircraft Damage
- Aircraft Registration Number
- Model
- Make
- Model
(The full version can be reviewed here.) The properties of the event, "Date" and "Location", are distinguished from the property of a particular aircraft in the event, "Aircraft Damage". That in turn is distinguished from the enduring characteristics of an aircraft, its "Model" and "Make". Finding the best model for your domain can be a little tricky, but once that's done, actually bringing in your data is just as easy as with the simple model. Once your data is in Needle, this complex structure actually makes it easy to ask hard questions: For instance, over the length of the record, what model of aircraft has been involved in the most accidents? Or, what are the totals of uninjured passengers involved in accidents for each carrier?
Feel free to explore all our sample domains, or sign up for a Needlebase account and explore your own data.
| < Prev | Next > |
|---|



