Over one month ago, we launched a product called the SocialRank Index. The Index tracks the Twitter activity of the world’s biggest brands, and our goal with it is to build a tool that monitors the “pulse” of how people are engaging with brands online.
Since the Index has launched, we’ve learned myriad lessons on what makes for a truly compelling and useful data product (hint: it’s a lot harder than just pulling some graphs together).
Here are three big things we’ve learned after getting feedback from users, marketers, and data scientists/statisticians.
1. Tweet Annotations: There’s a story to uncover in the data
When we got the first working version of the Index up and running, we were really excited to see the Engagement graph in action. This graph showed a particular industry’s hourly flow of retweets, mentions, and replies. Our first “Wow, this is really interesting” moment happened the morning after Obama’s State of the Union address. We took a look at the Tech Media Index (which consists of big hitters like the New York Times, Wall Street Journal, and BuzzFeed, among others):
Around 9pm on Tuesday night, there was an unexpected spike in activity in the Tech Media Index. After some deep deliberation, we concluded the spike was due to the State of the Union, which had begun … at 9pm.
But what about situations where there is no immediately recognizable reason for a spike in Twitter activity? Why leave this kind of “aha!” to guesswork and inference? The data clearly is telling a story, and so we should do our best to uncover what that story is.
Thanks to some technical wizardry by our co-founder Michael, the Index now automatically locates and annotates the largest peaks in the engagement graph. Can you guess when the Oscars were?
If you hover over these annotations, you can see the “highest velocity Tweet” at that particular hour. This is our best guess at the Tweet that got shared/retweeted/faved most frequently within that given hour.
According to the Tech Media Index, BuzzFeed and Lady Gaga won the Oscars. This is the Tweet that we captured at the peak of the graph:
Data isn’t very useful without context (see Jen Lowe’s great talk “Data Needs Memory”). Being able to correctly identify which events contributed to an anomalous piece of the data is crucial. Continually looking deeper at the data and trying to articulate which stories are being told (or not told) makes the data itself more insightful and valuable.
We still have a lot more work to do in this regard: our current system isn’t foolproof, doesn’t identify every single peak, and doesn’t answer every relevant question we might have.
For example: looking at the graph above, I notice that the mini-peaks throughout the week tend to fall at around the same time (around 11am). What’s the insight from that? I could deduce from anecdotal evidence that this is the time many tech media outlets push out new stories in order to maximize attention time (when people are about to break for lunch or take a mid-morning break). But of course, that is me just guessing– I would love to have something more to support this inkling.
Lesson learned: keep asking what the data is really telling or not telling us.
2. List View: Most “Big Data problems” are actually “Display problems”
One of the first things you learn the hard way when shipping product is that not everything is as obvious as you think it is. One common piece of feedback we get on the Index is “Wait, what exactly am I looking at?” To us, that is obvious — the graphs and charts show you what the average company in an index looks like on Twitter. But it became very clear after early rounds of feedback that this wasn’t crystal.
We’ve focused a lot of our efforts on the specific metrics to track, the specific types of graphs to plot, and the specific brands and industries to monitor. But the overarching issue of usability remains a sore spot. Our “Big Data” problem is a display problem. It isn’t that we aren’t pulling in enough data. Rather, we aren’t being as clear as we should be with how we show all of this data.
While this is still a major work in progress for us, our feedback from users told us something important about display problems: they happen when people are required to jump through too many cognitive hoops to figure out what’s going on. Our friends would first ask us “What exactly am I looking at?” and then follow up with “Wait, so which companies are in this index? Why can’t I just see how Adidas is doing?”
Now you can not only see which companies are in an index, but also what their specific stats are. We still need to tweak and retool the rest of the Index from a UX/UI standpoint to make everything more obvious. But we feel the “List View” will go a long way in helping people get a more intuitive understanding of what’s going on.
Lesson learned: it’s not that you don’t have enough data, it’s that you’re showing it all wrong.
3. Mean vs. Median: Hunt for a less misleading way to show data
A quick look at the two numbers above should raise some eyebrows. The mean (or average) number of followers for companies in the Tech Media Index is over 1 million. The median number of followers for companies in this index is just under 245,000. The difference between the mean and the median is over 700,000 followers — that is a lot of followers.
When we first began building the Index, the way we processed data made it much more practical to calculate averages (or means), and in the spirit of shipping things fast, we settled on the mean for all of our stats.
But the Index is supposed to display data for the “average company” in an index. And the average company in the Tech Media Index definitely doesn’t have over a million followers. In fact, only 24 out of the 95 brands in this index have over 1 million followers. Due to outliers such as the New York Times and the Wall Street Journal, using the mean to represent what an average company looks like was terribly misleading.
Here is the distribution of followers for brands in the Tech Media Index:
@Medium has 1.09mm followers, which is right around what the mean was that we calculated. There are only 23 other companies in this index that have more followers than Medium. Meanwhile, @PCWorld has some 244,000 followers, which also happens to be the median here. Notice how much more representative PCWorld is of the companies in the Tech Media Index than Medium is.
There’s too much misleading and lazy data out there that goes viral and gets morphed into “truth.” And when certain numbers get repeated enough, the desire to check if they actually represent reality grows stale.
We don’t want to contribute to that.
So we’ve switched all of the numbers in the Index to median measurements. We could’ve opted for more advanced statistical maneuvers, but we highly value simplicity, and we also recognize the difference between accuracy and precision.
Obviously this “mean vs. median” discussion is very basic compared to some of the more challenging problems other analytics products might be struggling with. But the lesson holds all the way through, regardless of the type of data problem.
Things to further consider: the size of each index. Right now, each index has about 100 members, but maybe this arbitrarily determined total is skewing the data (example: does the Tech Media Index need 100 members? Or is looking at just the top 50 or top 25 most useful?)
Lesson learned: Be simple, be useful, don’t mislead.
4. Retail & Music Index Launch: You’re building this for customers, not yourself.
At first, our process for determining which indexes to launch was internally determined — which ones did we think were cool and awesome and completely relevant to marketers?
And so we launched with indexes for Global Brands, Tech Media, Tech Companies, and Tech Startups. Each of these have a high amount of pop culture value, and journalists for the most part loved seeing them.
But when we started showing these indexes to existing customers, they kept asking whether there would be an index coming out for music or for retail or for their specific industry. Which is when we realized that we should’ve asked the market before building the product in the first place. While our initial indexes displayed data on the world’s “hottest brands,” marketers and strategists are more interested in relevant brands in their own specific industries.
Today, we are listening to our users and launching the Retail Index and the Music Index. The Retail Index consists of companies in the National Retail Federation’s annual top 100 list (think Amazon, Apple, Walmart, and the rest of the gang). The Music Index is comprised of artists from the Billboard 200. These indexes are equipped with all the updates listed above (List View, Median, and Annotations).
All of the lessons we’ve learned so far are some version of DJ Patil’s advice to “put the human back in the equation.” We’re excited to keep developing the Index into a place where marketers, brand strategists, and community managers can get real with data and start using it more meaningfully. If you are interested in what we are building, please don’t hesitate to shoot us an email at [email protected]