Adding depth to flat data with Needle

Much of the data on the web seems to be in tables, and for the most part, these tables work pretty well. While we've made the case that in many cases a graph structure (like Needle's) is a better way of representing the data, does Needle oblige you to use anything more complicated than a table? No. Are there advantages to using a more complex structure, even when the data comes from a flat table? Often, yes.

Take our Aviation Accidents domain as an example. On the NTSB website, the data is available in a table like this:

Accident NumberAircraft DamageDateLocationEvent IDAircraft Registration NumberModelMake
DCA91MA010A Destroyed 12/3/1990 Romulus, MI 20001212X24751 N3313L DC-9-14 McDonnell Douglas
DCA91MA010B Substantial 12/3/1990 Romulus, MI 20001212X24751 N278US 727-251 Boeing
CHI90FA278 Minor 9/28/1990 Detroit, MI 20001212X24215 N278US 727-251 Boeing
(There are actually many more columns available.) The simplest model to accomodate the data would just be
  • Accident Number
    • Aircraft Damage
    • Date
    • Location
    • Event ID
    • Aircraft Registration Number
    • Model
    • Make

Could this model be improved? Certainly. Note that "Location" and "Date" are equal in rows that have the same "Event ID": These rows represent an accident which involved multiple aircraft, and the location and date are necessarily duplicated across these rows. On the other hand, when rows have the same "Aircraft Registration Number", the "Make" and "Model" are repeated: In this case a single aircraft was involved in multiple accidents.

With that in mind, here's a simplified view of the model the Aviation Accidents domain actually uses:

  • Event ID
    • Date
    • Location
    • Accident Number
      • Aircraft Damage
      • Aircraft Registration Number
        • Model
          • Make

(The full version can be reviewed here.) The properties of the event, "Date" and "Location", are distinguished from the property of a particular aircraft in the event, "Aircraft Damage". That in turn is distinguished from the enduring characteristics of an aircraft, its "Model" and "Make". Finding the best model for your domain can be a little tricky, but once that's done, actually bringing in your data is just as easy as with the simple model. Once your data is in Needle, this complex structure actually makes it easy to ask hard questions: For instance, over the length of the record, what model of aircraft has been involved in the most accidents? Or, what are the totals of uninjured passengers involved in accidents for each carrier?

Feel free to explore all our sample domains, or sign up for a Needlebase account and explore your own data.

 

with a Google account


Explore sample
Needlebase domains

 

 

Mass Technology Leadership Council - 2010 Finalist

badge150x50-finalist

Follow needlebase on Twitter

Careers at ITA Software

Copyright © 2010-2011 ITA Software, Inc. · Careers · Contact · Terms of Use · Privacy