Stop waiting – Start analyzing big data yourself

Over the past several months, with the industry abuzz with the importance of big data analytics and how it will be shaping the change in market research over the next several years, the big question on the minds of most market research executives is where to turn to help drive this aspect of their business; which company out there is going to provide the technology answers that market research so sorely needs right now?  There are a few companies that aim to help with just that.

A couple of start-ups are competing neck-and-neck with each other in the business intelligence sphere of big data analytics at the moment, both with several big-name investors and partners who are very interested in how they will continue to develop.  Datameer, according to their own marketing, does on its own what previously it would have taken 3 separate vendors, processes, and teams to do.  Datameer combines data integration (the combining of disjointed and disconnected data sets), dynamic data management (creating relationships among that data), and data analysis into one single package.  It presents an easy-to-use package that allows a data analyst to fairly quickly integrate and visualize data from multiple sources and find correlations between datasets that would otherwise be very difficult and time-consuming.  In one of their easist hands-on examples, they show you how to upload an example file containing some data about several individuals (Name, age, location, etc), how to easily strip away the data you don’t need, and then create a bar graph showing the ages of the individuals in Chicago by their name.  Now, this example and of itself is not impressive.  Anybody with intermediate Excel skills could accomplish the same result.  What’s impressive is where you can go with this, and how each of the simple features along the way can be modified to make the product live up to its name of “big data” analytics.  The results that you can pull from the previous example need not be static – if you had a database with a constantly updating set of metrics, like an online database of users, you could connect it to Datameer to dynamically continue polling for new names, locations, and ages, and update the graphs accordingly, so you always know how your data is changing.  For a panel company, this could be a truly fantastic tool to keep your panel numbers up to date, so you know exactly the kinds of demographic split you have: where your members are located, how old they are, which industry they work in.  Go one step further, and you have dynamic graphs illustrating the precise breakdown of the ages of all your panel members who work in the mining industry, for example; all without having to re-run your queries.  As soon as a new member signs up and fills out the registration, your graphs update automatically with the results, and all you do is send your clients a link to the breakdowns, so they can see in real-time how your panel looks and drill-down into specific metrics if you allow them to do so.  No more updating and sending PDFs with millions of different numbers on them.

Additionally, there are features that allow you to combine all of these data sources that are within your control with social data as well.  Using the Twitter integration feature, you’re able to pull all the tweets for a particular user and use that data together with your other sources.  As in Datameer’s example, you can pull filmmaker Michael Moore’s Twitter handle and get everything he has ever posted, and check it against the FBI “Monitored Words” list to see how frequently hot-button words are mentioned by him.  You can even graph that over time, to see which words are suddenly “on the rise”, or which topics were more or less popular during a particular time.  Think about this on a bigger scale – being able to deliver “small data” survey results to your clients along with “big data” general trends to form an “overall picture”.  Your yearly tracker survey shows that 5% of the population is no longer drinking sugary carbonated drinks.  You want to know why.  Well, your survey respondents said “they don’t like the taste anymore”, “it’s too unhealthy”, and “I prefer coconut water now”.  You have a lot of little responses, and you can put them into a nice pie chart saying that 31% of respondents said it’s too unhealthy, 24% said they prefer other drinks, and so on.  Does that really give your client actionable data?

What if, when you looked at your “big data” analysis from the Twitter universe, you saw a timeline of the tweets that were hastagged under the topic “#healthyliving”, and saw that compared to last year, most of the comments this year regarding carbonated sugary drinks were more negative than positive.  And what if you could pinpoint the top 10 “influencers” (people with the largest amounts of followers) who were making these comments?  Wouldn’t you then have a more constructive game plan to give your client, with some exact goals and suggestions on who should be engaged to potentially reverse these damaging trends?

There is a whole realm of possibility here, and this is just scratching the surface from an initial analysis of the platform.  The barrier to entry is not that high: a personal account starting at a very-affordable $299/year, limiting the buyer mainly in the amount of users allowed (just 1) and the amount of data volume used in the system (100 GB/year).  Datameer uses the spreadsheet model of data presentation, so anybody familiar with Excel should feel quite at home.

Just about all of these big data analytics solutions are built on the open-source framework Hadoop, which is a white hot topic right now in very many industries, as it features an easily-scalable, commodity hardware-based database management and work distribution system that is very well-suited to working with large, distributed datasets that are not easily related using the traditional relational database model.  This makes it especially suited for analyzing things like social media traffic, where data is “all over the place” and may not be logically joined by any keys or identifiers.  Hadoop is used as the backbone for their databases by many internet giants, like Twitter, Google, and Facebook. 

Sentiment analysis with Sentiment140

Here’s an interesting tool for you to check out: Sentiment140 twitter sentiment analysis tool.Made by 3 Stanford University students, Sentiment140 employs the latest in machine learning and natural language processing techniques to analyze the sentiment of tweets on a particular keyword or topic.  That is, it will analyze large quantities of Twitter data on the term you type into it, and give you back a statistic, either positive or negative, on that term depending on the Twitter data it obtained.  There are other services doing sentiment analysis out there, often together with a larger package for social media monitoring, but Sentiment140’s implementation is a little unique, as described in their whitepaper.  For example, while Sentiment140’s solution uses several machine learning algorithms to gain greater and greater accuracy over time, similar services like Twitrratr use a different implementation that focuses on a list of keywords that have been designated as being either positive or negative.

Their entire implementation is done using cloud computing – they use Amazon EC2 for the hosting, and various Google services and APIs to arrange and visualize their data.  Being the academics they are, the team is gracious enough to provide an API for would-be developres to tap into their system and integrate it with their own applications (I’m looking at you, market research companies!).

Sentiment140 also has a premium subscription service called Sentiment140 Alerts.  For a very modest $18/month, users can receive notifications to their email inbox, SMS, or via voice when a largely negative sentiment is detected for a particular keywords – useful for those monitoring a particular brand, or making sure their service is running smoothly.

We’re hoping to have the Sentiment140 team do an interview or guest blog for us soon.

Google is missing the boat on monetizing mobile

Readers may recall from r from recent news headlines that mobile devices are an incredibly fast-growing computing segment, and that in general, the mobile data consumption on devices has been increasing very quickly for the last several years.

This presents a big problem for the established internet advertising giants out there, who have for over a decade relied on a fairly consistent sort of internet user who searches, retrieves, and inputs information using a browser on their desktop or laptop computer.  In recent years, largely because of the huge rise in popularity of smartphones, this expectation is no longer true.  Users are using their tablets, smartphones, and other devices while on the move, and they’re doing different sorts of things with them than they do on their desktops.  Take a look at the below infographic with data from the Pew Research Center:

Interestingly, but perhaps unsurprisingly, text messaging and taking photos are done by nearly all of users at 92%, followed by a bevy of various uses such as internet browsing, gaming, social networking and tweeting.  While all of these tasks, with the exception of device specific ones like texting and taking photos, may certainly be done just as well on a desktop as on a phone, the mobile device is an entirely different beast, with a different input method, different screen size and configuration, different usage scenarios, and different expectations.  Although the users are the same, what they’re doing and what they expect out of these devices may be very different now than it once was, as our sponsors at mobileZEN specialize in.

What’s interesting is that search giant Google is seemingly so well-positioned to take advantage of this shift.  After all, they are the owners of the #1 mobile operating systems worldwide.  They collect vast amounts of data about their users passively through the devices: their locations, browsing and purchasing habits, age, sex, etc.  They use all of this information presumably to help advertisers position their products and display them to the right customers.  But there’s a problem: last week, Google’s share price dropped by 9% in a single trading day – enough so that Google halted trading early, blaming a leaked earnings report that didn’t paint a great picture.  Google CEO Larry Page blamed the leaked report for the stock price tumble, which became public before the top brass could hold their quarterly conference call and pre-empt the news. 

Still, the numbers paint an interesting picture.  During their April 2012 conference call, there was already a sudden chill in the room that Larry Page and Sergey Brin fumbled about and failed to optimistically paint: Aggregate cost-per-click (CPC) growth was down 12% and down 6% quarter-over-quarter; largely blamed on mobile devices.  Page had absolutely nothing productive to say about it, other than they were “bullish on the future”, but a cohesive actual growth strategy was absolutely nowhere in sight.

Now, four months later, the numbers are looking even worse. While in April 2012, aggregate cost-per-click (that is, what an advertiser is willing to pay Google for an average ad click) was down 12% and down 6% quarter-over-quarter, in October 2012 it is down further to 15% and 3% quarter-over-quarter, even while the number of paid clicks increased by 33%: a likely reflection of the continuing growth of these devices and the increase in ads being shown.

“All of these mobile devices are generating clicks that are just less valuable to advertisers,” said Colin Gillis, an analyst at BGC Partners, who said mobile ad clicks cost half of what clicks on desktop Web ads cost. “The supply part is doing so well, but the supply’s going to continue and continue to grow and they could devalue their inventory.” (NY Times)

Google is certainly not giving up – they are scrambling and working as diligently as they can to keep their ads valuable to advertisers with all sorts of new ideas and iterative improvements.  But one has to wonder, is what’s needed to derive value from the growing mobile user segment not an iterative improvement, but a completely new way of thinking about mobile users?

Announcing B2B Verification Services

Market Research Technology is now offering “B2B Verification” services to agencies and registered individuals.

With our B2B Verification, companies can gain an additional level of confidence that the respondents are really who they say they are.  Online B2B work is notoriously difficult to verify, where agencies are often forced to take the respondent’s word as truth.  Market Reseach Technology, through its partnership with LinkedIn, is able to process large batches of respondents either before or after survey, using minimal PII.  Even if a panel member truthfully filled out their profile, their situation may have changed since then and their data may not have been updated.  Using the B2B Verification service, you quickly receive a report of which respondents passed verification and which ones failed based on their most current LinkedIn data.

Automated sentiment analysis is critical to your business

Sentiment analysis is getting more and more attention all over the world in many different areas of business.  What was previously the realm of market researchers alone is now directly affecting areas such as finance and investing.  While sentiment analysis in and of itself is certainly not new for researchers and has been a central pillar to their value offering for many decades, the ways in which it can be measured and the accuracy of that data are evolving fairly quickly.  In the past and still most generally, when a market research company receives open-ended data, it is sent for manual or semi-automated processing to identify the important keywords and overall sentiment of the response.  While it can be somewhat costly and time-consuming, open-ended coding is a very important part of research that, while most companies try to minimize, cannot be done away with.  After all, while ranking-style questions and precise numerical inputs are much easier to quantify and deal with, the double-edged sword is that the structured data you obtain is only going to be as accurate as the researchers is able to foresee when creating the answer codes or subquestions. 

Certainly open-ended responses are not going anywhere and well they shouldn’t.  The data that is obtained from them is incredibly valuable simply because the respondent is far more able to give a free and candid response without the constraints of a closed-ended question type.  For the researcher, that data is far more problematic precisely because of its freedom: it is not easily quantified into a precise number, it is more prone to garbage data, and responses must be processed in a secondary step, as opposed to a closed-ended question type.  The problem with closed-ended ranking-style questions, however, is that they may try to too-precisely quantify the sentiment of a respondent into the way that the researcher wants them to think.  A respondent asked a question like “On a 1 to 5 scale, where 1 is not at all tasty and 5 is completely delicious, how would you rate our chocolate?” is forced to give a response that fits into the expected scale of the researcher.  For the researcher, the obtained data is much easier to deal with simply because of its constrained nature, but it is an easy and dangerous trap to believe that a respondent can accurately convert his feelings into the precise input type demanded by a closed-ended question.  It’s for this reason that greater use of open-ended responses should be embraced, but to identify their information and sentiment in realtime is a great necessity.

Another fantastic benefit to automated sentiment analysis is the ability to examine sentiment at the macro level (the general social media trends), the micro level (your specific survey), or both.  For example, if you ran a survey with a nationally representative sample to find out how well your company’s chocolate is being received, you may wish to compare that with the sentiment data you obtain from social media about your company’s chocolate.  If the results are greatly incongruent, it may be a good idea to figure out why. 

Whether looking at the specific data of your survey or the data you see in the social media universe, with the right sentiment analysis tools, it’s possible to not only see and identify the trends and influencers, but actually use the information to produce actionable results.  At the micro level, combining sentiment analysis with the quantifiable information in the rest of your survey produces a fuller spectrum of data in your final deliverable.  When this micro level data is used in combination with macro social media data, a “bigger picture” is shown in your data.

Researchers need to begin filling in the “bigger picture” in order to stay relevant: combining the micro view with the macro view paints a much broader portrait than either one by itself.  Of course, this is not relevant in every type of survey.  Very specific topics or niche discussions are not necessarily well represented in the social media universe.  However, more efficient sentiment analysis at the survey level as well as the public internet on a set of topics of keywords are absolutely critical to remain competitive and offer as broad a data deliverable as possible.  Researchers should allow as much flexibility as possible to respondents, but they must also be confident in the efficiency and accuracy of the data that they collect.  The industry as a whole should be embracing the new technological capabilities that can be used to complement or upgrade their existing data collection methods, but as with all new things, it is better to adapt step-by-step and verify the new additions to be confident in the final data.