It is the second day of Oracle OpenWorld and it's really exciting to be here. No surprise that a big focus of the sessions are about big data with Hadoop receiving a good bit of attention. There are 924 sessions of the 2013 total which mention Big Data. Most of the ones I've been to have two themes: data explosion/usage and infrastructure components.
First, there is tremendous talk in these sessions as well as keynotes about the explosion of data. For example, 90% of the world's data was created in the last two years. A growing segment of this is from sensors whether they are on mobile phones, jet engines or a slew of newer devices which are built with more sensors that have greater sensitivity. Today this is obvious with a smart phone that has thermometer, accelerometer, GPS, gyroscope, multiple network connections, finger printer reader, two cameras, microphone, light meter and I suspect more that I don’t know about. Ironically, all of this data has very distinct structure but is sometimes referred to unstructured data. Those who leverage capabilities of Oracle, SQL Server or other robust industrial strength to structure--this will get the most value. Those who just dump to the data lake will find themselves treading water at best. Conversely, those putting in the time upfront will ride above the waves, getting the advantage.
The sessions covering Hadoop infrastructure describe different components with some interesting names such as yarn, Sqoop, flume and more. The Sqoop part is particularly relevant for Oracle users. According to some on the web (it must be true if it is on the internet ;-0 ), “Sqoop”, short for “SQL to Hadoop”. Some of the folks at Dell through an acquisition (Quest Software) produced something called oraoop which moves data from Oracle to Hadoop. A google search conducted for “oraoop” while writing this, brought me to: http://archive.cloudera.com/cdh/3/adapters/oraoopuserguide.pdf. This shows the Quest Software user guide for OraOop on Cloudera’s website.
Here at Dell, we’ve built on this earlier work and I’m also excited about SharePlex 8.5.5 which increases our capabilities with its SharePlex Connector for Hadoop. This not only contains OraOpp but combines that instantiation capability with replication to keep Hadoop up to date. This is not our first entry into replication from Oracle to Hadoop. This newest version which went to general availability on September 18th, just in time for Oracle OpenWorld, includes a number of enhancements to provide a comprehensive solution to replicate the most current data into Hadoop.
Some of the new features in SharePlex 8.5.5’s Connector for Hadoop include:
- Ability to write to partitions defined as integer or date values when partitioned by a value or range. This can substantially reduce an “update” where the whole file has to be written . In this case, it is writing to that partitioned file.
- Support for both sequence and Avro file types.
- Additional platform support by adding IBM’s distribution called BigInsights as well as Cloudera CDH 5.1.0.
As always, all SharePlex is available for download for trial from our website.