Companies that provide on-premise BI solutions, like Business Objects, Microstrategy etc. connect to data warehouses that are built on top of third-party databases like Oracle, DB2, SQLServer, PostgressSQL and MySQL. These database management systems, as well as data warehouse engines such as Teradata’s, provide the functionality and utilities for the creation, access, and management of a data warehouse or data mart. The current generation of on-demand BI solutions like those offered by Pivotlink, Lucidera and Birst depart from this model by integrating a query application with a database management system both of which reside in the cloud. More recently, data warehouse appliance vendors such as Aster Data, and Vertica started offering cloud-based options of their appliances. Through these options they allow companies to develop data warehouses and data marts completely in the cloud and access them via on-premise or on-demand query applications.
Cloud-based BI solutions that integrate a query application with a data mart are not entirely new. Data marts containing web analytics data have been offered almost exclusively as cloud-based solutions by companies like Omniture and Google. Corporations using these SaaS solutions are storing increasingly more complex and mission-critical data in these data marts, including customer data. Should on-demand BI vendors continue offering solutions that integrate the query application with their own data management system, or should they start partnering with the vendors that offer cloud-based data management systems which can be used to be standalone data warehouses in the cloud? The answer to this question depends on your point of view regarding on-demand BI and data warehousing. I’ll consider three perspectives: the on-demand BI solution vendor’s, the end user’s, and the IT user’s.
The On-Demand BI Solution Vendor
For the on-demand BI solution vendor this is not an easy question to answer. Having full control of both the application layer and the database layer offers its advantages but also has drawbacks. The major advantage comes from the vendor’s ability to better control how data is loaded to the data warehouse, how it is organized and how it is queried.
The ability to load data quickly allows the vendor to work with larger data sets during smaller time windows as dictated by the customer, as well as service more customers at any one time. While smaller companies may only refresh the data in their data mart weekly, larger customers typically refresh data daily. Moreover, the transactional database of such larger customers can only stay offline for short time periods during which data must be transferred to the data marts. For example, a retailer may only provide a 3-4 hour window daily during which sales data can be uploaded to a data mart. If such an operation has to be performed for hundreds of the SaaS BI vendor’s customers, then one can appreciate the importance of fast data loading.
Data loaded in the data warehouse has to be organized in the best possible way to enable the optimal execution of queries. The managers of on-premise data warehouses and marts constantly look at the queries executed against their databases to determine how to best organize the data to achieve best query execution times. Some of these optimizations can be performed automatically by the database management system but most require manual intervention. Having full control of the database management system enables the SaaS BI vendor to better optimize the organization of the stored data.
Finally, the expressiveness of the language used to query the data mart determines the range of analyses that can be performed and reports that can be created by the BI application, and also contributes to the speed with which queries are executed. Even though SQL is a standard all database management vendors implement their own extensions, including several that are specific to data warehousing, in order to add to the language’s expressiveness.
Creating a strong database management system to support data warehouses and data marts for a variety of analytical applications is hard. The areas mentioned above such as data loading, query optimization, query management, etc. are difficult to address and require specialized knowledge. As many database management vendors have shown over the years (from Teradata to Red Brick, Netezza and Aster Data) good database implementations require significant R&D investments. By decoupling the database from the application, the on-demand BI vendor will be able to focus on improving the application’s functionality without having to worry about the underlying database system. However, despite the higher investment necessary, I think that in the short term (at least the next 2-3 years) it will be difficult for on-demand BI vendors to decouple their applications from the databases and go with a third-party offering. In order to provide strong ROI to their customers, on-demand BI vendors will need full control of the stack as they learn about the application areas where on-demand BI will provide a strong alternative to its on-premise equivalent.
The End User
An end user subscribes to on-demand BI applications either because his organization doesn’t have an in-house BI solution and developing such a solution internally is deemed as too expensive and requiring too much time, or it does but the solution is hard to access. Such a user doesn’t care whether the data is stored in the cloud-based data warehouse of the vendor who provides the on-demand solution, or in a third-party’s cloud-based data warehouse.
The end user cares about convenience and analysis effectiveness. “Convenience” means that the end user would prefer an on-demand vendor that provides a one-stop to BI. “Effectiveness” means that the user is interested in quickly analyzing data to achieve a specific goal, e.g., determining which merchandize to discount in which regions and for what time period. During the analysis task the user is concerned about securely loading the right data to the data warehouse, quickly and easily interacting with that data to produce the desired results and reports, and keeping the data stored safely for as long as it’s necessary.
The on-demand BI solutions available to date have been developed with such a user in mind.
The IT User
The IT user has only recently started entering the conversation around on-demand BI solutions. This is happening as more of the larger companies that actually have IT organizations have started using SaaS BI solutions in conjunction with (or some times instead of) their on-premise ones. IT organizations worry about the security of the data that moves from their companies’ internal databases to the on-demand vendor’s data warehouse, the integrity of the data sent to the vendor, since they must guarantee that the data stored in a company’s internal databases is always synchronized with the corporate data stored in SaaS vendor’s data warehouse, and their ability to take back the data when the need arises.
IT must feel confident that the on-demand BI vendor uses the strongest possible security measures to safeguard the data in the warehouse. The SaaS vendor is expected to perform at a higher standard than the IT organization itself when it comes to security. CIOs are starting to fear that as SaaS vendors become better established with growing client rosters that include prominent corporations, they will become prime targets for hackers (individuals or groups, independent or government-sponsored), who would want to exploit the existence of large databases with prized corporate data in a single place.
Today every on-demand BI vendor requires that they be provided either with behind the firewall access to corporate databases so that they can extract data as necessary, or with the appropriate data sets to load into their cloud-based data warehouse. Most corporations provide the SaaS vendors with data extracts because it’s safer than giving them access to the corporate databases. However, this approach, while safer, results in multiple copies of the corporate data. Keeping this data updated and synchronized, so that the analyses resulting from the extracts can be impactful, can become a hard and expensive task that further taxes IT resources.
Finally, IT must ensure that the corporate data will not become hostage of the on-demand BI vendor. To this end, the vendor must make it easy for the customer to take back its entire data set either because of deciding to switch on-demand BI vendors, or because the vendor is going out of business. An approach to addressing this problem may be provided by Cloudkick. In addition, the on-demand BI vendor must make it easy for the customer to obtain subsets of the data stored in the cloud-based data mart to use it with other applications.
I wouldn’t be surprised if the majority of IT users would prefer a cloud-based BI application that accesses data stored behind the corporate firewall and doesn’t move the data permanently to the cloud.
The growing success of on-demand BI applications will lead to a lively dialog among these three constituencies on how such applications should be partitioned among the various types of cloud-based vendors and the clients themselves.