Musings on Technology: Best Practices for XML and POJO Binding

Here goes few basic guidelines to choose from different Data Binding Technologies :

XMLBeans :

XMLBeans is the best candidate - when we just depend upon underlying XML for the actual data and don't really need to manipulate the Java code or modify the xml.

Say we just need to convert an WSDL into Java Code and then use the generated Java API for aaceesing WSDL Operations and Interfaces.

XMLBeans actually stores the entire set of parsed events that represent the document in memory. XMLBeans is very fast for just reading and writing documents. It is really XML-Centric.

"XMLBeans is based on an efficient XML token stream, and it keeps underlying XML infoset intact. Since XMLBeans keeps the data in memory as XML, overhead of unmarshalling and marshalling is reduced.

To address varying data access and data transformation requirements, XMLBeans provides a flexibility of navigating through xml data using xPath or xQuery, and manipulating xml data using either xml cursor or XMLBeans generated classes."

XMLBeans performs poorly in terms of handling memory because it stores a live mapping of objects and underlying XML Document in memory.

JiBX:

If we really want to manipulate the way XML Data should be mapped to Java Objects, JiBX scores high !

JiBX uses mapped bindings to automatically generate code and does not care much about the XML Schema Support. It is a Java-centric approach to data binding. It uses ByteCode enhancement to add XML generation and parsing support directly into the bytecode of Java Class.

"It does not tie the Java class to the structure of the XML document. Just because an element lives within another element in the document, does not imply that the data contained in the sub element must be an equivalent level down in the Java structure. This allows the deep XML structure (which makes sense in XML) to be mapped easily to a more shallow, but not necessarily flat, structure (which makes sense in the Java object model)."

It uses fast Pull-Parser and has a very compact Runtime Distribution.

JiBX isolates XML document formats from Java language object structures.

JAXB:

JAXB generates code from Schema. Very much XML dependent, but still provides ways to manipulate mappings through annotations. Its easy to work but difficult to maintain and is not as elegant as XMLBeans or JiBX. To some extent its between Java-Centric JiBX and XML-Centric XMLBeans.

" JAXB among other things allows you to customize binding through schema annotation such that generated classes are more than just XML containers - they can represent objects with real behavior (and ability to be instantiated from XML)."

JAXB has issues with deeply nested (and large) xsds. It generates too many classes and for large xmls simply grow unmangeable. (JAXB does not make any use of inner classes while generatiing code).

JAXB generated classes can not parse xmls, whereas XMLBeans allow parsing underlying xml. (its more xml centric)

EMF :

EMF produces an Ecore model given an XML Schema and then uses template-based generator technology to generate a rich Java API (of hand written quality).The XML Schema to Ecore conversion can be tailored, the templates used to generate the Java API can be tailored, and the resulting Java API can be tailored.The generator supports merging regeneration so that it will preserve your hand written changes.In other words, EMF is far richer and more flexible, and supports a broader subset of XML Schema.

XStream - simplest to use

. XPP3 is a very fast XML pull-parser implementation. User is also free to choose any other parser like JAXP DOM inside XStream.

XStream is designed for configurationless serialization. This makes it painless to serialize any type of objects, without the need for mappings. Ideal for things like persistence, configuration and over-the-wire transports. However, because there are no mappings, you have little control of how the serialized object is represented.

** solution : using XStream this way to process SOAP messages based on complex XML schemas. So XSLT acts as your mapping definition and it is much more powerful than plain declarative "binding" mappings, since it is a full-blown templating language.

** custom converters are powerful

** XStream: how to serialize objects to non XML formats

Xstream can marshal to / unmarshal from not just xml but also json, tree structure etc.

The modular design allows other output formats. XStream ships currently with JSON support and morphing.

So the bottomline - If you need precise control of how your objects are represented and are happy to spend time defining mappings, use JiBX. Otherwise stick to Xstream.

Reference : http://www.ibm.com/developerworks/library/x-databd4/

Musings on Technology

Saturday, April 16, 2011

Best Practices for XML and POJO Binding

No comments:

Julian Hyde on Streaming Data, Open Source OLAP. And stuff.

SpringSource Team Blog

Favourite Blog List

Adventures with Open Source BI

Wired Magazine

Twitter Engineering