Companies that provide
on-premise BI solutions, like Business Objects, Microstrategy etc. connect to
data warehouses that are built on top of third-party databases like Oracle,
DB2, SQLServer, PostgressSQL and MySQL. These database management
systems, as well as data warehouse engines such as Teradata’s, provide the
functionality and utilities for the creation, access, and management of a data
warehouse or data mart. The current generation of on-demand BI solutions
like those offered by Pivotlink, Lucidera and Birst depart from this model by
integrating a query application with a database management system both of which
reside in the cloud. More recently, data warehouse appliance vendors such
as Aster Data, and Vertica started offering cloud-based options of their
appliances. Through these options they allow companies to develop data
warehouses and data marts completely in the cloud and access them via
on-premise or on-demand query applications.
Cloud-based BI
solutions that integrate a query application with a data mart are not entirely
new. Data marts containing web analytics data have been offered almost
exclusively as cloud-based solutions by companies like Omniture and
Google. Corporations using these SaaS solutions are storing increasingly
more complex and mission-critical data in these data marts, including customer
data. Should on-demand BI vendors continue offering solutions that integrate
the query application with their own data management system, or should they
start partnering with the vendors that offer cloud-based data management
systems which can be used to be standalone data warehouses in the cloud?
The answer to this question depends on your point of view regarding on-demand
BI and data warehousing. I’ll consider three perspectives: the on-demand
BI solution vendor’s, the end user’s, and the IT user’s.
The On-Demand
BI Solution Vendor
For the on-demand BI
solution vendor this is not an easy question to answer. Having full
control of both the application layer and the database layer offers its
advantages but also has drawbacks. The major advantage comes from the
vendor’s ability to better control how data is loaded to the data warehouse,
how it is organized and how it is queried.
The ability to load
data quickly allows the vendor to work with larger data sets during smaller
time windows as dictated by the customer, as well as service more customers at
any one time. While smaller companies may only refresh the data in their
data mart weekly, larger customers typically refresh data daily.
Moreover, the transactional database of such larger customers can only stay
offline for short time periods during which data must be transferred to the
data marts. For example, a retailer may only provide a 3-4 hour window
daily during which sales data can be uploaded to a data mart. If such an
operation has to be performed for hundreds of the SaaS BI vendor’s customers,
then one can appreciate the importance of fast data loading.
Data loaded in the data
warehouse has to be organized in the best possible way to enable the optimal
execution of queries. The managers of on-premise data warehouses and
marts constantly look at the queries executed against their databases to
determine how to best organize the data to achieve best query execution
times. Some of these optimizations can be performed automatically by the
database management system but most require manual intervention. Having
full control of the database management system enables the SaaS BI vendor to
better optimize the organization of the stored data.
Finally, the
expressiveness of the language used to query the data mart determines the range
of analyses that can be performed and reports that can be created by the BI
application, and also contributes to the speed with which queries are
executed. Even though SQL is a standard all database management vendors
implement their own extensions, including several that are specific to data
warehousing, in order to add to the language’s expressiveness.
Creating a strong
database management system to support data warehouses and data marts for a
variety of analytical applications is hard. The areas mentioned above
such as data loading, query optimization, query management, etc. are difficult
to address and require specialized knowledge. As many database management
vendors have shown over the years (from Teradata to Red Brick, Netezza and
Aster Data) good database implementations require significant R&D
investments. By decoupling the database from the application, the
on-demand BI vendor will be able to focus on improving the application’s
functionality without having to worry about the underlying database
system. However, despite the higher investment necessary, I think that in
the short term (at least the next 2-3 years) it will be difficult for on-demand
BI vendors to decouple their applications from the databases and go with a
third-party offering. In order to provide strong ROI to their customers,
on-demand BI vendors will need full control of the stack as they learn about
the application areas where on-demand BI will provide a strong alternative to
its on-premise equivalent.
The End User
An end user subscribes
to on-demand BI applications either because his organization doesn’t have an
in-house BI solution and developing such a solution internally is deemed as too
expensive and requiring too much time, or it does but the solution is hard to
access. Such a user doesn’t care whether the data is stored in the
cloud-based data warehouse of the vendor who provides the on-demand solution,
or in a third-party’s cloud-based data warehouse.
The end user cares
about convenience and analysis effectiveness. “Convenience” means that
the end user would prefer an on-demand vendor that provides a one-stop to
BI. “Effectiveness” means that the user is interested in quickly
analyzing data to achieve a specific goal, e.g., determining which merchandize
to discount in which regions and for what time period. During the
analysis task the user is concerned about securely loading the right data to
the data warehouse, quickly and easily interacting with that data to produce
the desired results and reports, and keeping the data stored safely for as long
as it’s necessary.
The on-demand BI
solutions available to date have been developed with such a user in mind.
The IT User
The
IT user has only recently started entering the conversation around on-demand BI
solutions. This is happening as more of the larger companies that
actually have IT organizations have started using SaaS BI solutions in
conjunction with (or some times instead of) their on-premise ones. IT
organizations worry about the security of the data that moves from their
companies’ internal databases to the on-demand vendor’s data warehouse, the
integrity of the data sent to the vendor, since they must guarantee that the
data stored in a company’s internal databases is always synchronized with the
corporate data stored in SaaS vendor’s data warehouse, and their ability to
take back the data when the need arises.
IT
must feel confident that the on-demand BI vendor uses the strongest possible
security measures to safeguard the data in the warehouse. The SaaS vendor
is expected to perform at a higher standard than the IT organization itself
when it comes to security. CIOs are starting to fear that as SaaS vendors
become better established with growing client rosters that include prominent
corporations, they will become prime targets for hackers (individuals or
groups, independent or government-sponsored), who would want to exploit the
existence of large databases with prized corporate data in a single place.
Today
every on-demand BI vendor requires that they be provided either with behind the
firewall access to corporate databases so that they can extract data as necessary,
or with the appropriate data sets to load into their cloud-based data
warehouse. Most corporations provide the SaaS vendors with data extracts
because it’s safer than giving them access to the corporate databases.
However, this approach, while safer, results in multiple copies of the
corporate data. Keeping this data updated and synchronized, so that the
analyses resulting from the extracts can be impactful, can become a hard and
expensive task that further taxes IT resources.
Finally,
IT must ensure that the corporate data will not become hostage of the on-demand
BI vendor. To this end, the vendor must make it easy for the customer to
take back its entire data set either because of deciding to switch on-demand BI
vendors, or because the vendor is going out of business. An approach to
addressing this problem may be provided by Cloudkick.
In addition, the on-demand BI vendor must make it easy for the customer to
obtain subsets of the data stored in the cloud-based data mart to use it with
other applications.
I
wouldn’t be surprised if the majority of IT users would prefer a cloud-based BI
application that accesses data stored behind the corporate firewall and doesn’t
move the data permanently to the cloud.
The
growing success of on-demand BI applications will lead to a lively dialog among
these three constituencies on how such applications should be partitioned among
the various types of cloud-based vendors and the clients themselves.
Recent Comments