Connect to HDFS using a proxy

As stated in Yesterday’s blog post I am currently working with a Hadoop cluster running up in the AWS cloud. I’m still not happy with the decision of running it in the cloud, but that’s a different story. In addition in our company we do have to use a proxy server to access the internet and there are no exceptions to that. As our Hadoop cluster is mainly used for development tests at the moment it would be a great benefit if we could directly connect to it from our local developer computers and that’s what I’m going to describe in this article. Continue reading


Connect to HDFS running in EC2 using public IP addresses

Recently I’ve faced an issue with connecting from a corporate network server directly into a Hadoop cluster running in the Amazon Cloud. Let’s not discuss if it makes sense to have a permanent Hadoop cluster running as EC2 instances – I might anyway touch base on that in a later blog article with more detailled performance comparisions and so on.

So as per HADOOP-985 the NameNode did refer to the DataNodes as ip:port instead of hostname:port which was used previously. Continue reading

DropzoneJS with JAX-RS, JSF2 and Bootstrap

Recently I had to do a Drag&Drop file upload in a JSF2 application, instead of building everything from scratch I started sticking together some components which are around for a while already, namely:

  • DropzoneJS (for the drag&drop)
  • JAX-RS (for the rest service accepting the file upload)
  • Primefaces JSF2 (to refresh the existing JSF image gallery after a file upload)
  • Bootstrap

That’s how it looks like:

Gallery Screenshot

Gallery Screenshot

Continue reading

Value Generator Plugin for Datanucleus HBase will become part of Datanucleus 3

As promised I in my last blog post (by the way – I hope you liked it), I’ve released an enhanced version to GitHub so every one of you can download the source code and around with it. I also included a maven build file, so  it should not be too difficult to build the version locally and resolve all the dependencies.

The GitHub version contains the following enhancements:

  • better table name
  • use of fully qualified field name instead of name argument
  • enhanced logging
  • maven build file

The second news I wanted to share with you is that Andy Jefferson from Datanucleus suggested, that he will make this plugin code part of Datanucleus version 3. As I like this idea very much I gave him permission to do so and created a JIRA (NUCHBASE-26). So from Datanucleus version 3 on everyone will be able to use increment value generator strategy natively, without having to use a plugin. Checkout the Datanucleus Blog to see what other features they are working on for version 3.

HBase with JPA and Spring Roo

Inspired by Matthias Wessendorf’s blog entry “Apache Hadoop HBase plays nice with JPA” I started playing around with integrating HBase into Spring Roo.

Spring Roo is a lightweight  Java development tool which uses convention-over-configuration principles to provide rapid application development of Java-based enterprise software. It provides all the nice things like auto generating getters, setters, unit tests and persistence methods, scaffolding and so on. Therefore it makes heavy use of AspectJ to put all the auto generated code into separate files (with extension .aj) so you can safely change all Java files without interfering with Roo’s code generation engine. One of the best things in Roo (compared to Grails) is that everything generated is plain Java, so if you at some point in time don’t want to use Roo anymore you can easily merge all the aspects into your Java files and continue without using Roo (even though I wouldn’t recommend that). If you have not worked with Roo up to now check out the quick tutorial in the Roo Reference Documentation.

HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. It is developed as part of Apache Software Foundation’s Hadoop project, providing BigTable-like capabilities for Hadoop. Therefore it’s a good alternative for all folks who do not want to host their application on the Google App Enginge.

The installation of Hadoop and HBase is relatively straight forward and well documented in the HBase wiki. According to the documentation there is also a way to set it up using windows and Cygwin, but I’ve to tried it. I did my test installation in a VMWare with Ubuntu Server 10.04 LTS.

The Datanucleus guys are offering a JPA and JDO integration for HBase and many other databases under the Apache 2 open source license. Event though the HBase plugin from Datanucleus still has some limitations (like no auto generated IDs), you can either work around those restrictions by using JPA in a slightly different way or writing a simple plugin to one of the dozens plugin points offered by Datanucleus. In my next blog I’ll show you how to auto generate IDs using the JPA annotation @GeneratedValue by writing your own simple Datanucleus plugin .

Continue reading