Last October I wrote about emerging data analytic services that could be offered under the term Insight as a Service. More recently Mark Suster wrote about the need for data as a service, and Gordon Ritter in his post added his thoughts on Insight as a Service. In my post I had proposed three types of data that can be used for the creation of such insights: internal company data, web usage data, and third party syndicated data. That same month I invested in Exelate, a company that provides a marketplace for online data and a Data Management Platform (DMP) that can both be used to improve the targeting effectiveness of display advertising. Online advertising and financial analytics are two areas where data marketplaces are succeeding. Exelate and other marketplaces demonstrate that significant companies can be built in this area. Companies like Thompson/Reuters and Bloomberg, and startups like cloud-based Xignite, have also been successful in selling financial data to application vendors. More recently we are seeing the emergence of startups like Gnip, Factual, Infochimps, WebServius, as well as Microsoft's Azure DataMarket introduced cloud-based marketplaces offering a broad variety of commercial and open source data sets. The success of data-driven application companies such as Zillow, Recorded Future Payscale and a few others provide proof that innovative solutions can be developed from third-party data. But, the cost of the base data on top of which these solutions are developed, is miniscule compared to the cost of processing and augmenting the data to provide unique value to the user of these solutions. So, outside the online marketing and financial services areas, under what conditions can venture investors make money investing in companies that offer data marketplaces?
Large quantities of data are made available daily. Even though Hadoop and other emerging open source Big Data management technologies are significantly reducing the cost of storing, managing and processing such data, I claim that creating a data marketplace from scratch is still an expensive proposition. For data offered through a marketplace to be valuable it must be unique, and complete. For the marketplace to be successful in addition to having data with such characteristics it must solve the data distribution and monetization problem. This is actually a chicken-and-egg problem. The marketplace must contain enough variety of unique and complete data to attract buyers, but it needs data buyers to attract such data sources. Companies like Factual and Infochimps solve this problem by investing heavily on marketing, offering free onboarding of data, regardless of the data’s revenue potential, and focusing primarily to open source data that is typically coming from governments.
Uniqueness is often created by augmenting a data set in a variety of ways. For example, Zillow processes Google Earth data. Exelate’s analytics group does the same to data contributed by online publishers increasing its value and information content for the DSPs that use it, by identifying trends, attributes that are predictive of specific desired outcomes, etc. It is not clear whether general purpose data marketplaces will be able to provide such augmentation because it will imply that they become experts in several different application areas. This kind of processing costs money and is proprietary, along with the resulting data. As a result, the processed data many never find its way to a data marketplace.
Completeness of a data set is also very important. For example, if I want to compare the room prices of major hotel chains across the US in order to determine whether one chain is consistently more expensive than the others, I will need to obtain prices from every US city where the hotel chains being compared have properties. If I want to be comprehensive I can’t be satisfied having prices for only 60% of the cities, and 50% of the properties of the hotel chain. I need a complete set.
In addition to distribution, monetization, uniqueness and completeness, privacy, data ownership and data location are a few of the other issues data marketplaces must address. The marketplace must have clearly articulated policies about who owns the data and what operations can be performed on the licensed data. For example, I give my data to Facebook and feel relatively comfortable with their data privacy policies or at least the setting I can control. We are now seeing applications being built on top of Facebook with completely undefined privacy policies, (Examples: What data are these applications capturing in addition to what is provided by Facebook? How are they using it? Are they selling the data they capture?). Contributors to data marketplaces must also understand where their data is stored and what can be done on the data. For example, is their data stored in a country where it can be easily subpoenaed, or distributed with no control?
If these are indeed the necessary conditions for creating a financially-viable data marketplace, then rather than focusing on open source data, marketplaces must focus on licensing and distributing premium data that may be coming from companies like Zillow, Experian, Nielsen, etc. However, focusing on such premium data providers could imply a vertical, industry-specific approach to building marketplaces rather than a horizontal, general-purpose approach.
We are generating ever increasing quantities of data that can be used in a new generation of innovative solutions. In the presence of all this data, data marketplaces offer to the data owners an attractive model for distributing and monetizing on such data. However, early indications from companies that have built valuable data-driven solutions, as well as from application-specific data marketplaces show that the type of processing and augmentation that needs to be performed on data before it can be effectively used by such solutions necessitates a vertical marketplace approach rather than a horizontal one.