The Economist had an interesting article this week on the data deluge, in which it argued that, to help users feel like they retain control over their online data, sites need to make more data available to their users:
First, users should be given greater access to and control over the information held about them, including whom it is shared with.
I totally agree that sites should provide greater transparency with respect to tracking and data collection / storage. The Economist highlights Google which allows its users to see what information Google holds about them, and lets them delete search histories or modify the targeting of advertising.
Other sites are increasingly doing this too. For instance, I really like how the Newstogram technology has been implemented on DailyMe.com with a dedicated “My Newstogram” page which shows me what data the site is stored about me, explains how the data will and will not be used, and gives me the ability to correct the data or to opt out of tracking altogether.
Yahoo has similar functionality available through its Ad Interest Manager page (although Yahoo is either tracking a lot less about me or is not as good at determining my interests as they only have me pegged as a generic sports fan).
December is a tough time of the year to get anything done and I now realize that my plan to develop stacked graph visualizations of Newstogram data (which required learning a new programming language) was overly optimistic.
I’ve pushed this out to 2010 but to keep me motivated I have added two stacked graph visualizations to my data wall:
1. Visualization of my Last.fm listening history (full PDF)
The feedback on my tree map visualization was very insightful. Colleagues pointed out that, while interesting, the tree map suffered from the same problem as my other attempts to visualize the Newstogram data: namely it doesn’t address the time dimension. While tree maps and other ’static’ visualizations (such as bar charts) can display data over a number of different time periods, they don’t really show how the data is changing over time. In the case of many data sets, including the ‘news interest’ data we are tracking through Newstogram, this is the most interesting aspect of the data.
My first visualization effort is a treemap showing the popularity of sub-categories within DailyMe.com based on Newstogram data for October 2009 (built upon the Protovis treemap example).
The colors represent primary categories, while the size of each sub-category corresponds to its popularity as measured by the ‘Digital News Affinity’ (DNA) score for October 2009.
The search field at bottom of the treemap highlights certain categories / sub-categories (e.g. searching for “sports” highlights the 14 sports sub-categories).
Check out the working demo (requires a modern browser e.g. Firefox, Safari).
The title isn’t a metaphor for information overload or filter failure… the walls of my office are literally covered in print-outs or drawings of histograms, pie charts, bar charts, treemaps, mindmaps and various other types of data display.
As DailyMe gets access to more and more data through the Newstogram platform, I am becoming increasingly focused on data visualization and specifically how to make our data visually appealing, easy to understand and (most importantly) useful for our clients.
One of my tasks for next week is to check out the open-sourced Protovis SVG Visualization Library and learn how to make my own treemaps (inspired by the team at Google Analytics who just released the video below showing how to create treemap visualizations of data extracted through their APIs) – I foresee more print-outs getting pinned to the walls….
I wrote a piece titled “Hey Microsoft…. backup much?” over on DailyMe.com about the data loss experienced by T-Mobile Sidekick customers this weekend as a result of a “server failure” by Danger, the Microsoft-owned subsidiary that makes the Sidekick.
Danger/Microsoft’s failure to have a backup (or multiple backups) of their customer’s data is absolutely mind-blowing and is certainly a wakeup call to the ever-increasing number of businesses whose products/services rely on such data.
Techcrunch’s Nik Cubrilovic wrote a (lengthy) opinion about the incident in which he argues that Sidekick customers who lost their data may only have themselves to blame since:
if you didn’t care enough to take care of it yourself, then you didn’t really need it.
Cubrilovic’s advice for those Sidekick customers who find themselves without their contacts, photos, calendars and to-do lists:
The solution may be to do nothing, certainly not to panic. The biggest problem is that we hoard data. We produce more data and information than we ever have, and we are all vain enough to believe that the data we create is so fantastic that it should live on for eternity. Losing the contact list on your phone shouldn’t be a problem – you should know who your friends are anyway. If you are losing sleep because you can’t find an old email you wrote, you likely have deeper issues to address.
However, as Dave Winer and others point out in the comments, in the case of Sidekick customers this kind of misses the point.
Companies who charge customers to have their data stored and available via ‘the cloud’, as was the case with Sidekick customers, have an obligation to protect that data regardless of its ‘value’. And, given all the PR that Danger/Microsoft and T-Mobile have received over the last few days, it is pretty clear there is a strong business incentive for companies who are trusted with such data to make sure they don’t lose it.
Next New Networks published some interesting data last week about viewership to their family of online video channels, which include Indy Mogul, Barely Digital (home of Obama Girl) and new addition Hungry Nation. The study, conducted with the help of web video measurement firm Visible Measures, showed that the peak period for video viewership was the six hours between 12pm ET to 3pm PT, when many North Americans are presumably looking for a short distraction from work.
Source: Visible Measures via Silicon Alley Insider
This trend is hardly surprising given the type of content that Next New Networks specializes in…. short-form entertainment videos. I recall from my time at Channel 4 that short-form videos were popular during the day and long-form videos were popular during the evening (and I bet if you looked at data for Hulu you’d see a similar trend).
By contrast, the general trend for most online news sites is still a morning peak (for instance, DailyMe.com has a readership peak most days between 7am ET and 10am PT). However, I suspect this general trend masks differences between different types of content on online news sites, some of which may provide a similar lunchtime ‘outlet’ to the Next New Networks videos. Newstogram, our soon-to-launch analytics / intelligence platform, will provide an easy way for online news sites to drill down and find the popularity of different categories, topics, people etc. throughout the day in order to identify the types of content where Lunchtime is the new Primetime.
It was a big weekend in the online news field with the Online News Association’s annual conference (“ONA09″) and the Online Journalism Awards (congrats to all the winners especially the awesome folks at Publish2 who won the Gannett Foundation Award for Technical Innovation in the Service of Digital Journalism).
While I didn’t go this year (DailyMe was represented by President / Chief Product Officer — and ONA Vice President — Neil Budde), I followed proceedings closely via the #ONA09 hashtag. I was particularly interested in the various sessions on the use of data and metrics by online news organizations (#ONAmtrx and #ONAdata). As Dana Chinn of USC’s Annenberg School of Journalism (and author of the NewsNumbers blog) pointed out, web analytics is a complex area for online news sites and measuring user engagement requires looking at a number of different metrics (Dana’s presentation is available here).
One of the hits of the #ONAdata strand seemed to be the alpha demo of DocumentCloud, the Knight News Challenge-funded investigative journalism tool that plans to “turn documents into data”. The ‘data’ part of DocumentCloud is powered by OpenCalais. The semantic processing for Newstogram, DailyMe’s analytics / intelligence platform for online news sites, is also powered in part by OpenCalais, so I was glad to see it getting exposure at ONA09. OpenCalais is a great opensource resource and I’m sure it won’t be long until OpenCalais-powered functionality is widespread in the online news industry.
Social Homes