Frequently Asked Questions

Needlebase is ITA Software’s innovative new platform for acquiring, integrating, cleansing, analyzing and publishing data on the web.

Google's acquisition of ITA Software, including ITA's Needle­base project, was completed on April 12, 2011. We are continuing to operate and innovate as usual. To learn more about the acquisition, read Google's blog entry and ITA's press release.

Well, the Internet is large and chaotic, and finding the right data can be like finding... you guessed it.  Moreover, a needle is used to stitch things together—and stitching together multiple data sources with different formats, schemas, and vocabularies into a unified tapestry of data is just what our Needlebase does best.

Needlebase service is free to anyone with a Google account. If you have a Google account, you can log in now or visit needle-login for more info about creating an account.

ITA Software's main business is indeed serving the travel industry. Our flagship product, QPX, revolutionized airfare search, and now powers the shopping path on most major airline and travel search websites.  Now, with Needlebase, we're aiming to revolutionize other forms of vertical search, too.  By dramatically lowering the time and cost required to aggregate a comprehensive database of any domain, we hope to extend our impact beyond air travel to literally any domain where the data is currently too disaggregated, unpolished, or difficult to use efficiently.

Most web-scraping tools are brittle and have severe problems dealing with web pages that are even slightly irregular in format. Needlebase uses advanced machine-learning techniques to extract data from a wide range of web-pages and other data sources quickly, reliably and robustly, without requiring any specialized knowledge of programming concepts, HTML structure, or regular expressions.

Moreover, other web-scraping tools consider their work done once they've extracted each site's structured content.  When it comes to the rest of your job—transforming each site's content into your target database schema, finding and reconciling duplicates, correcting mistakes, and publishing a unified view of the data from all sources—then you've been on your own.  Until now.

Yes, if you create a Needlebase database, your data can only be accessed by those you designate, ranging from a fixed number of individuals to everyone on the public Internet, as you see fit. Needlebase servers are maintained in the same proven datacenters, and by the same proven operations staff, that have run ITA Software's airfare search engine for years.

Needlebase operates as a hosted online service, so there is nothing to download. All Needlebase operations can be controlled through web-based interfaces, and you can connect your own applications and systems to Needlebase using web-services APIs.

Needlebase honors standard robots.txt restrictions. Needlebase's user agent is called ITABot, so you can block Needlebase from accessing your website by putting the following lines into a robots.txt file at the top level of your website:

User-agent: ITABot
Disallow: /

You can learn more about robots.txt restrictions from robotstxt.org -- for example you may wish only to exclude robots from parts of your website.

 

 

 

 

with a Google account


Explore sample
Needlebase domains

 

 

Mass Technology Leadership Council - 2010 Finalist

badge150x50-finalist

Follow needlebase on Twitter

Careers at ITA Software

Copyright © 2010-2011 ITA Software, Inc. · Careers · Contact · Terms of Use · Privacy