Thread: An Introduction
Needle's data-exploration and analysis run, behind the scenes, on the system's query language, called Thread. A Thread query describes a path through the data, much like the one you would follow as a person browsing a web site: starting somewhere and then doing something: following paths, diving further down some that seem promising, backtracking out of ones that don't lead where you want. Here's an introduction to how it works.
Starting SomewhereAll Thread queries begin their paths somewhere in the data. One of the most common beginning points would be everything of a particular type. A query to get all the songs in the Pazz & Jop dataset would be simply:
"Song" is the name of a type, and thus a complete query in itself.
A query can also start at some particular piece of data by doing "@" and then the data's node ID. For example, here's a query to get us directly to the song "Stillness Is the Move", by the Dirty Projectors:
Every piece of data in Needle is a node, with its own unique numeric ID, so a query can start at a song, an artist, a voter, a ballot, a vote, a point-value, etc.
Each node has two parts: a value and some arcs. The value is the text you see: the name of the album or artist or voter, the year, a point-value, etc. The "arcs" are the relationships or paths between the nodes. So a song has an "Artist" arc to its artist, a "Label" arc to its label(s), etc. All arcs go both directions, so for each song with an artist, the artist has a corresponding "Song" arc to all their songs, etc. Arcs (and queries) always lead to ordered lists of nodes, whether there are many, just one, or none in any particular case.
Doing Something
To any such list of nodes, then, a Thread query can apply any series of operations. All Thread operations work the same way: they take a list of nodes, do something, and return a new list of nodes, potentially larger or smaller or empty. The operations can be chained in any order, at any length.
Following Paths
The most basic operation in Thread is following. This takes the incoming list of nodes, follows the specified arcs for all of them, and returns the list of unique targets. Syntactically Thread is optimized for compactness, with operators represented by punctuation. Following is done with the period (.). So to get the artist for the song above we could extend the ID query to be:
Or to get the artists for all songs, we could add the same thing to the all-song query:
Filtering
Following paths is only really interesting if you can follow some paths and not others. The second core Thread operation is Filtering, which is done with the colon (:). If we didn't already know the ID for "Stillness Is the Move", we could look for it this way:
This is a complete two-operation Thread query. The first (start) operation returns the list of all songs, the second (filter) takes this list and returns the ones whose titles match the string "Stillness Is the Move".
This query says nothing about artists, so if different bands had recorded songs called "Stillness Is the Move", it would return all of them. And, in fact, there turn out to be two. To find out which artists recorded them, we could do:
Song:=Stillness Is the Move.Artist
Aha. The title is no coincidence, Solange Knowles did a cover of this Dirty Projectors song. But if what we really wanted was just the one by the Dirty Projectors themselves, we can filter using a nested subquery:
Song:=Stillness Is the Move:(.Artist:=Dirty Projectors)
The parenthesized subquery is applied to each of the two "Stillness Is the Move" nodes produced by the query up to that point, and the filter passes those for which it finds (any) results. In this case the first one is Solange's version, so the ".Artist" follow-operation produces the one-node list [Solange], and the ":=Dirty Projectors" filter-operation produces an empty list, so the filter rejects that song.
Queries can be nested arbitrarily as well as chained, so we could ask for all songs by any artists who had songs with the word "Stillness" in them like this:
Song:(.Artist:(.Song:Stillness))
But since all Thread arcs are bidirectional, all operations can be chained, and every piece of data is a node, we could also ask the same question this way:
Grouping
The grouping operator is the slash (/). If artists have songs, and songs have labels, we could group artists by their song-labels like this:
This produces a group node for each label, with a "Key" arc to the genre, and a "Nodes" arc to the albums whose artists have that genre. Artists with songs on multiple labels appear in multiple groups. Groups in Thread are nodes themselves, not part of some external reporting scheme, so a grouping query can perform further operations on them. To get only the label-groups with at least 5 artists, we could do:
Artist/(.Song.Label):(.Size:>=5)
Sorting
The sorting operator is the caret (^). By itself, this sorts a list of nodes by their own values, so to get a list of all artists in alphabetical order, we could do:
Given an arc, or arcs, the sort operation orders the list of nodes by the values of those arcs. So to sort songs by 2009 votes, sub-sorted by artist name, we could do:
Check the four-way tie at 27 to see that those are subsorted by artist.
All operations actually allow multiple pieces like this, so you can start from multiple nodes and/or multiple types, follow multiple arcs at once, filter on several or conditions (or chain multiple filters for and), group by combinations of keys, etc.
Statistics
The basic statistical functions (count, sum, average, variance, deviation, etc.) appear in Thread as arcs that apply to a list of nodes as a whole, not each node individually. To count voters we could just do:
And because _Count is an arc, it can also be used inside nested subqueries, so to get artists with more than 2 songs (that got votes in the poll) we could do:
Etc.
Those are the basics. Everything you see and click through in Needle is found or constructed this way, and on any page you can click the "advanced query" link in the top toolbar to see its underlying query. You can edit these queries and see what happens, or even write your own.
For example, say you wanted to know who would have won the album-poll in an alternate universe where you disregarded any ballot that had (winner) Merriweather Post Pavilion on it. This advanced query sorts the album list by non-MPP counts and then gets the top 10:
Album^(.Vote.Ballot:(.Year:=2009):!(.Album Vote.Album:merriweather post pavilion)._Count):#<=10
If you come up with any interesting new queries for this poll data, This e-mail address is being protected from spambots. You need JavaScript enabled to view it . Or if you're interested in being able to use the system to ask these kinds of questions in your own data, we're taking beta customers!

