Hadoop is set to expand massively, with ambitious predictions suggesting it might hold up to half the data in the enterprise by 2020. However that turns out, users and developers were busy applying and extending the protean big data platform at the Hadoop Summit in Brussels.
“Fifty percent of enterprise data will be in Hadoop by 2020,” said Rob Bearden, Hortonworks CEO from the stage, followed by Forrester analyst Mike Gualtieri, who added some detail to that prediction, while users including British Gas and JustGiving explained how they are using the system. Meanwhile the universe of Hadoop projects both inside and outside the Apache Foundation, continued to expand.
The positive news for the platform quickly drowned out the arguments about Hadoop standardization.
An operating system for data
“I think of Hadoop as an operating system for data,” said Gualtieri. Hortonworks architect Arun Murthy explained that Hadoop democratizes access to data, just as in the early days of computing popular operating systems broke the monopoly on access to computers.
Changing the role of data is unleashing creativity, said Murthy: if data is held by individual applications, it can be used in limited ways, or may even be ”dark data”, stored away and never used. By contrast, ”a data operaitng system makes a lot of sense in a world where data is prevalent.”
What Gualtieri called “Hadooponomics” makes adoption necessary, he said. Hadoop clusters distribute real time processing to large pools of storage, and scale using open source economics, delivering a cheap and fast alternative to vastly expensive and slow data warehouse products.
Dee Mitra, head of big data services at British Gas, said the energy utility is adopting Hadoop to manage the vast amounts of data which will be created by the UK’s smart meter program. Continuous informaiton from these meters will enable “real time customer service”, said Mitra.
British Gas has been using Hadoop from Hortonworks for eighteen months, and even in this time the ease of use and readiness of the ecosystem has expanded greatly, said Mitra: ”A year ago, British Gas had to do a lot of work on Hadoop. Now it is pretty much there.”
Meanwhile, online fundraising site JustGiving has built a “GiveGraph” relating users, good causes and influenciers, said Mike Bugembe, its chief analytics officer.
“It’s not as simple as just showing people causes,” said Bugembe. Relationships between people affect their giving, he explained: ”We can’t just do gross analytics, we need to a fine-grained understanding of influencers.”
The GiveGraph uses Hadoop with analytics software HDInsight from Microsoft, running on the Azure cloud. Around 22 million people have raised $3 billion through JustGiving, said Bugembe.
Management and governance
Handling and controlling Hadoop clusters was another main theme, along with ensuring what it does with data can be tracked in ways that satisfy compliance rules. Several different frameworks exist within the Hadoop universe and the Apache Spark was one of those receiving attention.
Janos Matya of Hortonworks’ recent purchase SequenceIQ demonstrated how his outfit’s technology can set up Hadoop clusters on the Amazon Web Services cloud, while rivals Cloudera and MapR both described technology to build on public clouds - with MapR covering all the major cloud services and Cloudera operating on Amazon.
MapR also used the event to announce a deal which takes Hadoop into the territory of traditional data warehouses - a partnership agreement with business intelligence long-timer Information Builders.