Wednesday, March 24, 2010

Hibernate/JPA Best Practices

Here come some of Hibernate/JPA best practices.
Please notice that this guide does not intend to cover Hibernate/JPA at all, but only to provide some best practices. Personally I learned Hibernate using Hibernate in Action.

I really appreciate any comments saying what do you think about this practice, why it's wrong, and what additional practices should be added.

So let's start:

Override hashCode() and equals()

The Hibernate reference states that "It is recommended that you implement equals() and hashCode() to compare the natural key properties of the entity."
The reason is simple: different instances of the class may represent the same record in the database. Therefore, when comparing these two instances, you'd like to get equal result, while default equals implementation will return not-equal since it's not the same instance.

This becomes really important, when working with collections, especially with Sets. You don't want the same object to appear twice in set, right?

Overriding the hashCode() and equals() is not a very complex goal, but you must be very careful:
1. Remember that two equal objects must return the same hash code. Therefore, you cannot use auto-generated Hibernate id in hashCode() - this value is not assigned for the newly created objects. After persistence occurs, the value is assigned, so the hashCode will change, when the object actually wasn't changed!
2. Changing the fields that participate in hashCode() will change the hash code value. So if your object is stored in a Set (or it's key of a Map), you won't be able to retrieve it from the set anymore - one hash was used for insert and another one was used for retrieve.
So basically you'll need to remember not to change objects that are stored in sets! And this is really important!
So you may ask: how will I know who stored my object in a set?
My answer is simple: you cannot know this, unless you don't give your objects out. So you are the only person who is using these objects, so you know how they are kept, right?
Storing the objects without giving them out is not so weird idea: keep the persistence layer away from the business logic and return a copy of object when required.
Another option: return immutable objects to the business tier. So the business tier won't be able to change them. When the change is required - provide a special API. Thus the objects won't change accidentally.

Try to Make All Object Immutable

This may sound weird, how exactly the persistent objects can be immutable. But in the previous part I described why it's important. And actually it can be quite easily achieved:
1. Make all setters private. Thus it will be impossible to call them without using reflection. (Hibernate will use reflection and populate the properties during the object retrieval)
2. When returning collections, wrap them with Collections.unmodifiable. So the user won't be able to modify your collections.
3. Allow changes only via special methods.

Return Copy of Persistent Objects to the Business Tier

So no accidental change in hash code may occur.
Additionally when filling the business objects, a lot of potential problems may be resolved. Consider that the persistent object contains a lazy collection. If the object is returned as is to the business tier, the lazy elements in the collections can be accessed after the transaction was closed, therefore the query to the database will fail and user will get an exception.

h3. Change Data in Collections Only via Special Methods
When having associations, take care of this association via special method: for example, Parent class will have method addChild(Child child).
When returning the values of collections wrap them using Collections.unmodifiable to prevent accidental changes.
This is useful both to handle bidirectional associations correctly and to prevent accidental changes in hash code.

Summary

1. Override hashCode() and equals() of the entities using the natural key properties of the entity.
2. Don't compare auto-generated id in hashCode().
3. Don't change the properties that participate in the hashCode() for objects stored in collections that use hash code (especially Sets or Maps). If such a change must occur, reinsert object into collection. Remember that remove must occur before the property is updated.
4. Keep the persistence tier away from the business logic as much as possible. Don't pass the persistence objects to the business at all.
5. Make your objects immutable (or semi-immutable): make all setters private, return collection values only wrapped with Collections.unmodifiable, make changes in collections only via special methods.


Recommended Reading

1. Hibernate in Action (In Action series)
2. Java Persistence with Hibernate
3.The Best Software Writing I: Selected and Introduced by Joel Spolsky (v. 1)

6 comments:

Ran said...

I think the most important thing my company has learned about Hibernate is that it's not good for everything. In particular, we've found that Hibernate does not handle large transactions very well (tending either to be very slow, or to run out of memory, or both), and that (despite what Hibernate in Action might have you believe) it's frequently subject to the n+1 selects problem. Both of these are theoretically solvable, but the solutions usually introduce other problems (and sometimes run afoul of flat-out bugs and limitations in Hibernate; for example, fetch joins, though well documented, are only partially supported).

There are many aspects of our applications that benefit greatly from our use of Hibernate, and I certainly don't mean to discourage people from using it for the many things it's good for; but there are many places where we were eventually forced to strip it out, as well as many places where we do use it, but would have been better off stripping it out. (One problem, though, is that Hibernate can be somewhat "viral": sometimes it's hard to strip Hibernate out of one part of an application without having to significantly alter other parts of the application.)

Matt Coffey said...

I've worked on a large Struts2 based commercial web application where it was was actually simpler to break these rules.

This is because all business logic was performed in request scope with the cost of reloading a page being far higher than the cost of a hibernate transation. So when you press the save button on a web page, the app is performing an atomic transaction on the set of entities that the user is attempting to save.

In practice, this meant you would never need to add an object to a collection without persisting it and reading it from the database first and therefore it was absolutely fine and indeed simpler to use the generated id in equals and hashcode as well as having public setters.

All entities in the application extended one class which handled equals and hashcode, freeing the business logic developer from the task of working out which fields should be used in the equals operation and then updating those methods whenever the object changed.

Tarlog said...

Hi Ran,

Thank you for your feedback and the concerns.
So do you continue to use Hibernate?
Did you try other alternatives that may handle the large transactions better?

Personally I thought of using iBATIS, but the project I'm working on must support different databases, so with iBATIS I run into problem of supporting multiple SQL scripts.

Tarlog said...

matt.coffey,

thank you for your feedback.
So basically you say: make sure to persist your objects before you start actually using it, so you won't need these annoying guidelines? :)

Ran said...

We do still use Hibernate for the things it's good for. The main alternative that we've adopted, when Hibernate doesn't work for us, is to dispense with ORM entirely. (Er, I suppose that we're still using objects, and still using relational databases, and still mapping between them. I just mean that we haven't adopted an actual alternative ORM product/package/library/whatever. You know what I mean.)

Matt Coffey said...

The point was that if you are always persisting your entities before using them anyway (which happens when the business logic is done in request scope) the negative effects of the guidelines outweigh the positive.

Allowing developers on a large project to write equals and hashcode is a vector for bugs to enter the system. I can see the need for these guidelines in other situations, but I wouldn't call them universal :)