HBase with JPA and Spring Roo

Inspired by Matthias Wessendorf’s blog entry “Apache Hadoop HBase plays nice with JPA” I started playing around with integrating HBase into Spring Roo.

Spring Roo is a lightweight  Java development tool which uses convention-over-configuration principles to provide rapid application development of Java-based enterprise software. It provides all the nice things like auto generating getters, setters, unit tests and persistence methods, scaffolding and so on. Therefore it makes heavy use of AspectJ to put all the auto generated code into separate files (with extension .aj) so you can safely change all Java files without interfering with Roo’s code generation engine. One of the best things in Roo (compared to Grails) is that everything generated is plain Java, so if you at some point in time don’t want to use Roo anymore you can easily merge all the aspects into your Java files and continue without using Roo (even though I wouldn’t recommend that). If you have not worked with Roo up to now check out the quick tutorial in the Roo Reference Documentation.

HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. It is developed as part of Apache Software Foundation’s Hadoop project, providing BigTable-like capabilities for Hadoop. Therefore it’s a good alternative for all folks who do not want to host their application on the Google App Enginge.

The installation of Hadoop and HBase is relatively straight forward and well documented in the HBase wiki. According to the documentation there is also a way to set it up using windows and Cygwin, but I’ve to tried it. I did my test installation in a VMWare with Ubuntu Server 10.04 LTS.

The Datanucleus guys are offering a JPA and JDO integration for HBase and many other databases under the Apache 2 open source license. Event though the HBase plugin from Datanucleus still has some limitations (like no auto generated IDs), you can either work around those restrictions by using JPA in a slightly different way or writing a simple plugin to one of the dozens plugin points offered by Datanucleus. In my next blog I’ll show you how to auto generate IDs using the JPA annotation @GeneratedValue by writing your own simple Datanucleus plugin .

Note: I’ve discovered one issue in the Datanucleus code preventing any Roo classes from persisting, I’ve temporarily fixed this bug locally and everything worked fine. So I’m now in the process of following up this issue together with the Datanucleus guys and waiting for an official bug fix from them. I just got the confirmation from Datanucleus that this bug is fixed and according to my tests everything seems to work fine. The response time from Datanucleus was totally amazing, they fixed not even 12 hours after submitting a test case for this issue.

I’m using the following tools and versions:

  • Spring STS (Source Tool Suite) 2.5.0 RELEASE
  • Spring Roo 1.1.0.RELEASE
  • hbase 0.89.20100924+28 (Cloudera CDH3 Beta)
  • hadoop 0.20.2+737 (Cloudera CDH3 Beta)
  • Datanucleus Core 2.2.0-release
  • Datanucleus JPA 2.1.4
  • Datanucleus HBase 2.0.2-SNAPSHOT 2.0.1 (Only available via sourceforge SVN – I downloaded the tarball and built it myself)
  • Datanucleus Enhancer 2.1.2
  • ASM 3.3.1
  • JDO API 3.1
  • JPA 2 (javax.persistence 2.0.1.v201006031150 – I took this version out of EclipseLink)

Ok, so let’s get started:

  1. Create a Spring Roo project using Spring STS
  2. Create a simple (e.g. Person) by executing the following command in the Roo shell (if you are using STS you have the Roo shell directly embedded in the IDE)
    entity --class ~.domain.Person --testAutomatically
  3. Add Datanucleus support
    This can either be done by adding Datanucleus support for a different database and adjusting it to hbase (in this case you can skip the items (b) and (c) below), or by manually adding the following information 

    1. Adding all the dependencies to the maven build file pom.xml
    2. Adding the following plugin to the maven build file pom.xml
      <plugin>
       <groupId>org.datanucleus</groupId>
       <artifactId>maven-datanucleus-plugin</artifactId>
       <version>2.1.0-release</version>
       <configuration>
       <fork>false</fork>
       ${basedir}/src/main/resources/log4j.properties
       <mappingIncludes>**/*.class, **/*Roo_Entity.class</mappingIncludes>
       <verbose>true</verbose>
       <enhancerName>ASM</enhancerName>
       <api>JPA</api>
       </configuration>
       <executions>
       <execution>
       <phase>compile</phase>
       <goals>
       <goal>enhance</goal>
       </goals>
       </execution>
       </executions>
       <dependencies>
       <dependency>
       <groupId>org.datanucleus</groupId>
       <artifactId>datanucleus-core</artifactId>
       <version>${datanucleus.version}</version>
       <exclusions>
       <exclusion>
       <groupId>javax.transaction</groupId>
       <artifactId>transaction-api</artifactId>
       </exclusion>
       </exclusions>
       </dependency>
       <dependency>
       <groupId>org.datanucleus</groupId>
       <artifactId>datanucleus-hbase</artifactId>
       <version>2.0.0-release</version>
      </dependency>
       <dependency>
       <groupId>org.datanucleus</groupId>
       <artifactId>datanucleus-enhancer</artifactId>
       <version>2.1.3</version>
       </dependency>
       </dependencies>
       </plugin>
      
    3. If it does not yet exist create a file called applicationContext.xml in the src/main/ressources/META-INF/spring/ folder with the following content
      <?xml version="1.0" encoding="UTF-8" standalone="no"?>
      xmlns="http://www.springframework.org/schema/beans" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context" xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd   http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd   http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd   http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd   http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">
      <context:property-placeholder location="classpath*:META-INF/spring/*.properties"/>
      <context:spring-configured/>
       <context:component-scan base-package="org.my.test">
       <context:exclude-filter expression=".*_Roo_.*" type="regex"/>
       <context:exclude-filter expression="org.springframework.stereotype.Controller" type="annotation"/>
       </context:component-scan>
       <bean id="transactionManager">
       <property name="entityManagerFactory" ref="entityManagerFactory"/>
       <!--<span class="hiddenSpellError" pre=""-->bean>
       <tx:annotation-driven mode="aspectj" transaction-manager="transactionManager"/>
       <bean id="entityManagerFactory">
       <property name="persistenceUnitName" value="persistenceUnit"/>
       </bean>
      </beans>
      
    4. If it does not yet exist create a file called persistence.xml in the src/main/ressources/META-INF/ folder with the following content
      <?xml version="1.0" encoding="UTF-8" standalone="no"?>
      xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd">
       <persistence-unit name="persistenceUnit" transaction-type="RESOURCE_LOCAL">
       <provider>org.datanucleus.jpa.PersistenceProviderImpl</provider>
       <properties>
       <property name="datanucleus.jpa.addClassTransformer" value="false"/>
       <property name="datanucleus.managedRuntime" value="false"/>
       <property name="datanucleus.ConnectionURL" value="hbase"/>
       <property name="datanucleus.ConnectionUserName" value=""/>
       <property name="datanucleus.ConnectionPassword" value=""/>
       <property name="datanucleus.autoCreateSchema" value="true"/>
       <property name="datanucleus.validateTables" value="false"/>
       <property name="datanucleus.Optimistic" value="false"/>
       <property name="datanucleus.validateConstraints" value="false"/>
       </properties>
       </persistence-unit>
      </persistence>
      
    5. If it does not yet exist create a file called hbase-site.xml in the src/main/ressources/ folder with the following content (whereby you replace the IP address and port numbers with the ones of your HBase server)
      <configuration>
       <property>
       <name>hbase.zookeeper.quorum</name>
       <value>192.168.88.128</value>
       </property>
       <property>
       <name>hbase.zookeeper.property.clientPort</name>
       <value>2181</value>
       </property>
       <property>
       <name>hbase.master</name>
       <value>192.168.88.128:60000</value>
       </property>
      </configuration>
      
  4. Now lets open our domain object “Person.java” and add the following and save the file afterwards so Roo can update the AspectJ files – as already mentioned above Datanucleus HBase does not ship with support for the @GeneratedValue annotation by default, but I’ll show you in my next blog how to write your own Datanucleus plugin to support this annotation.
    @Id
    @Column(name = "id")
    private Long id;
    private String firstName;
    private String lastName;
    
    public Long getId() {
     return this.id;
     }
    
     public void setId(Long id) {
     this.id = id;
     }
    
    @Transactional
     public void persist() {
     this.id = new java.util.Random().nextLong();
     if (this.entityManager == null) this.entityManager = entityManager();
     this.entityManager.persist(this);
     }
    
  5. Run the JUnit test cases
    Note1: If you are facing application Context issues telling you that files from your resources directory are not find, please verify in the Eclipse project settings page > Java Build Path > Source that the resources folder has no excludes set
    Note2: If you are getting an “java.lang.IllegalArgumentException: out of field index :-1″ exception, you are using an old version of Datanucleus HBase which does not have this bugfix included
    Note3: If it’s not working at all please post a comment here, as I’ve created this blog after my own tests I might have missed something.

As always any comments and/or feedback are welcome and I’m open for any ideas how to do things better, easier and faster.

Thanks for reading my blog,

Peter

About these ads

9 thoughts on “HBase with JPA and Spring Roo

  1. Stefan Schmidt

    Hi Peter,

    This looks great! I was not aware that HBase had support for JPA.

    Of course it would be much easier for a Spring Roo user to simply install your add-on and get all of these configuration steps done with a single command ;).

    I am part of the Spring Roo team and have a bit of experience around add-on development so feel free to contact us via the forum or directly if you need help getting started. It should be a quite easy to develop add-on if you use the ‘addon create simple’ command (http://static.springsource.org/spring-roo/reference/html-single/index.html#internals). Once your add-on does what it is supposed to do you can easily make it available to the Roo shell by publishing it to RooBot (see the docs).

    Not sure if you are aware but SpringSource also has the newly created Spring Data project which offers a level of abstraction for NoSQL databases (https://github.com/springsource).

    Anyway, good work!

    Cheers,
    Stefan Schmidt
    Spring Roo team

    Reply
    1. Peter Rainer Post author

      Hi Stefan,

      as add-ons for Roo are anyway on my list to look into, I’ll take a look at the documentation link you provided and play around a little bit. But before that I need to package my Datanucleus plugin in order to support auto generated IDs in HBase.

      In the meanwhile I got confirmation from Datanucleus that they found the bug and fixed it, so I’ll try that one out today evening as well.

      I’ve already heard about Spring Data project, but I’ve not had time up to now to look into it, but it’s definitely on my watch list.

      Cheers,
      Peter

      Reply
  2. Goli

    Hi Peter,

    Great job. Thank you for this article.
    Meanwhile, have you had a chance to write the Roo add-on yet? What about packaging your Datanucleus plugin?

    Cheers,
    Goli

    Reply
    1. Peter Rainer Post author

      Hi Goli,

      No, I’ve not been able to write the Roo add-on yet, because it was really busy the last couple of weeks (I’ve been sick for nearly 1 full week, my current employer is moving all the software development to a different city, …) – but it’s still on my list to work on. But before doing so I’ll work on Datanucleus Bug NUCCORE-595, which causes objects to be newly created within HBase on an update (if performed the way Spring Roo does it).

      Regarding the Datanucleus, plugin – no I’m not planning on packaging it, as Datanucleus has included it in version 3 anyway, the only thing I did is putting it onto GitHub.

      Cheers,
      Peter

      Reply
  3. Alex McLintock

    I’m keen to access Hadoop data (probably hBase) through Spring (preferably set up through Roo).

    If I can help with a spring roo plugin just ask.

    Reply
    1. Peter Rainer Post author

      Hi Alex,

      I’ve not had any time in the last couple of weeks/months to step further into hBase and Spring Roo. So at the moment I’m rather busy and won’t have much time to work on such a plugin – anyway before that there are a few things which need to be fixed in Datanucleus as well. At the moment this integration is nice for playing around and proving that it actually works, but nothing for than that.

      If you want to start writing such a plugin please feel free to do say – you may also reuse any piece of code I’ve posted in my blog therefore. But I’m afraid, I’ll not have time to continue working on this topic over the next couple of weeks either.

      Reply
  4. technojab

    I followed your directions, chose HYPERSONIC_IN_MEMORY as the db initially. Changed it to Hbase as per your directions.

    I know everything is fine as it can connect to HBase fine. But I get this when I run the tests:

    java.lang.NullPointerException
    at org.datanucleus.query.evaluator.JavaQueryEvaluator.execute(JavaQueryEvaluator.java:293)
    at org.datanucleus.store.hbase.query.JPQLQuery.performExecute(JPQLQuery.java:279)
    at org.datanucleus.store.query.Query.executeQuery(Query.java:1791)
    at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694)
    at org.datanucleus.api.jpa.JPAQuery.getResultList(JPAQuery.java:185)

    If I look at that line in the source, its looking like the result[0] is null. But if I look at the resultSet, its giving me the 10 entries (which means the data retrieved fine).

    So I’m now stumped trying to debug some issue in the Java Query Evaluator in data nucleus.

    HELP?!

    Reply
  5. Andy

    @technojab, I’d hardly think it the best place to post a problem report about some query on a third party blog. Posting on the DataNucleus forum with a testcase that shows how such a thing is reproduced makes way more sense.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s