Big Data Analytics: Trends to Watch For in 2012
Over the last several years, there has been a massive surge of interest in Big Data Analytics and the groundbreaking opportunities it provides for enterprise information management and decision making. Big Data Analytics is no longer a specialized solution for cutting-edge technology companies — it is evolving into a viable, cost-effective way to store and analyze large volumes of data across many industries. But how will this translate to adoption of these new technologies? How will companies incorporate Big Data into their existing business intelligence and data warehouse (BI/DW) infrastructure? How can end users take advantage of the power Big Data has to offer?
What is Big Data?
Big Data technologies like Apache Hadoop provide a framework for large-scale, distributed data storage and processing across clusters of hundreds or even thousands of networked computers. The overall goal is to provide a scalable solution for vast quantities of data (terabytes/petabytes/exabytes) while maintaining reasonable processing times. These systems are incredibly effective for storing and analyzing large volumes of structured as well as unstructured or semi-structured data such as text, web or application logs, email, web pages, documents, and images.
Big Data in the Enterprise
Companies are capturing and digitizing more information than ever before. According to IDC, the world produced one zettabyte (1,000,000,000,000 gigabytes) of data in 2010. Fueling this data explosion are over five billion mobile phones, 30 billion pieces of content shared on Facebook per month, 20 billion Internet searches per month, and millions of networked sensors connected to mobile phones, energy meters, automobiles, shipping containers, retail packaging and more. Big Data is a platform for transforming all of this data into actionable items for business decision making.
The barriers to entry for Big Data analytics are rapidly shrinking. Big Data cloud services like Amazon Elastic MapReduce and Microsoft’s Hadoop distribution for Windows Azure allow companies to spin up Big Data projects without upfront infrastructure costs and allow them to respond quickly to scale-out requirements. Commercial vendor support from companies like Cloudera can speed development and deliver more value from Big Data projects. Bundled server options such as Oracle’s Big Data Appliance offer fast setup and scale-out solutions. Finally, modular data center designs are emerging as a way to efficiently manage hardware and scale-out rapidly and cost-effectively.
Companies likely to get the most out of Big Data analytics include:
- Supply chain, logistics, and manufacturing — With RFID sensors, handheld scanners, and on-board GPS vehicle and shipment tracking, logistics and manufacturing operations produce vast quantities of information offering significant insight into route optimization, cost savings and operational efficiency
- Online services and web analytics — Internet companies invented Big Data specifically to handle processing information at Internet scale. Implementation of these analytical platforms is now viable for smaller online services companies to provide an edge over competitors for advertising, customer intelligence, capacity planning and more. Companies who don’t offer online services but do have an ecommerce or other online presence will benefit greatly from understanding customer behavior and buying patterns via clickstream, cohort analysis and other advanced analytics.
- Financial services — Financial markets generate immense quantities of stock market and banking transaction data that can help companies maximize trading opportunities or identify potentially fraudulent charges, among various other uses. New regulations also require detailed financial records to be maintained for longer periods.
- Energy and utilities — Smart instrumentation such as “smart grids” and electronic sensors attached to machinery, oil pipelines and equipment generate streams of incoming data that must be stored and analyzed quickly to uncover and fix potential problems before they result in costly or even disastrous failures.
- Media and telecommunications — Streaming media, smartphones, tablets, browsing behavior and text messages are captured at ever-increasing rates all over the world, representing a potential treasure trove of knowledge about user behavior and tastes.
- Health care and life sciences — Electronic medical records systems are some of the most data-intensive systems in the world and making sense of all this data to provide patient treatment options and analyze data for clinical studies can have a dramatic effect for both individual patients and public health management and policy.
- Retail and consumer products — Retailers can analyze vast quantities of sales transaction data to unearth patterns in user behavior and monitor brand awareness and sentiment with social networking data.
Data Warehouse Integration
To apply this new technology effectively, it is important to understand its role and when and how to integrate Big Data with the other components of the data warehouse environment. In a vast majority of cases, Big Data does not replace the data warehouse. Hadoop is built for speed and flexibility across huge sets of often unstructured data, but is best used for fairly simple workloads, such as sorting, aggregating, converting, and filtering. Hadoop is also not intended to manage schema structure, referential integrity or security. Database management systems are therefore still a vital part of the overall solution architecture. So how will Big Data Analytics be incorporated with existing BI/DW investments?
Hadoop provides an adaptable and robust solution for storing large data volumes and aggregating and applying business rules for on-the-fly analysis that crosses boundaries of traditional ETL and ad-hoc analysis. It is also common for the results of Big Data processing jobs to be automated and loaded into the data warehouse for further transformation, integration and analysis. This allows Big Data to be integrated with data from other sources and exposed to users via BI tools, dashboards and reports. Several options are available for extracting data from Hadoop into the data warehouse. IBM, Informatica, Microsoft, Oracle and SAP have released or announced tools to interface between Hadoop and relational database management systems.
User-Friendly Tools for Big Data
Tools like Apache Pig and Apache Hive provide SQL-like frameworks for advanced data analysts to run queries directly against data stored in Hadoop. This is an effective way to do targeted, one-time analysis, perform exploratory data mining, or develop queries that may later be automated and loaded into a data warehouse. However, these tools require technical expertise and do not cater to end users.
Luckily, there are some exciting end-user tools coming in 2012. Tableau has support for drag and drop Hadoop reporting currently in beta and Microsoft recently announced the Hive ODBC driver and the Hive add-in for Excel which will allow end-user access to data stored in Hadoop through Excel, PowerPivot and Analysis Services. Tools that enable end users to slice, dice and visualize data in Hadoop will become increasingly important components of a company’s Big Data analytics arsenal over the coming years.
Big Data adoption will continue to be driven by large and/or rapidly growing data being captured by automated and digitized business processes. Successful adoption of this technology requires turning this raw information into usable knowledge throughout the enterprise. To accomplish this, companies will need to intelligently incorporate Big Data into their existing information management systems and take advantage of the developing ecosystem of integration and analysis tools. As we move into the age of Big Data, companies that are able to put this technology to work for them are likely to find significant revenue generating and cost savings opportunities that will differentiate them from their competitors and drive success well into the next decade.
Harlan Smith is a Manager in the Business Intelligence and Performance Management practice at Hitachi Consulting, specializing in business intelligence engineering, architecture and project/program management. Harlan is a graduate of the University of Puget Sound in Tacoma, WA, and currently lives in Seattle where he has been a consultant since 2005. Follow him @smithharlan on Twitter.