by Audrey Watters, O’Reilly Radar
June 2, 2011
What makes the endeavor challenging, if not the size of the archive, is its composition: billions and billions and billions of tweets. When the donation was announced last year, users were creating about 50 million tweets per day. As of Twitter’s fifth anniversary several months ago, that number has increased to about 140 million tweets per day. The data keeps coming too, and the Library of Congress has access to the Twitter stream via Gnip for both real-time and historical tweet data.
Each tweet is a JSON file, containing an immense amount of metadata in addition to the contents of the tweet itself: date and time, number of followers, account creation date, geodata, and so on. To add another layer of complexity, many tweets contain shortened URLs, and the Library of Congress is in discussions with many of these providers as well as with the Internet Archive and its 301works project to help resolve and map the links.
Read more at radar.oreilly.com
by Jack Loechner, MediaPost
June 2, 2011
Today’s e-book power buyer, someone who buys an e-book at least once a week, is a 44-year-old woman who loves romance and is spending more on buying books now than in the past. She uses a dedicated e-reader like a Kindle instead of reading on her computer.
Read more at mediapost.com