To perform a proof of concept of how our data warehouse data would run on Hadoop, I decided to try out the Hortonworks Hadoop Sandbox for VMware, that can be downloaded from Hortonworks Sandbox.
Downloading and importing the .ova file into VMWare Fusion is straighforward. During startup, you get a glimpse of what’s installed on this sandbox vm.
For me the most interesting piece of the Hadoop stack is Hive, since it is the Hadoop component that transforms SQL like statements into map-reduce jobs for the Hadoop core.
After startup of the vm, the following screen is presented:
Lets log into Ambari.
Next I want to perform the following steps:
- create a data model
- load data into this model
- run some queries
- compare performance and features with our current database server PostgreSQL 9.5
- load streaming data
This will be covered in future posts.