Monday, February 20, 2012

Detecting presence of CDATA in XML node using Java

many times while performing compare 2 XML or parsing XML we encounter scenario in which node value is enclosed in CDATA tag and we need to determine if node value is enclosed in CDATA tag.Unfortunately have dosent have any inbuilt method as hasCDATA() or something on similar lines.


What is CDATA?

The term CDATA is used about text data that should not be parsed by the XML parser.
Characters like "<" and "&" are illegal in XML elements.
"<" will generate an error because the parser interprets it as the start of a new element.
"&" will generate an error because the parser interprets it as the start of an character entity.

Now consider a XML as below

 
  select * from t_employees
  
 


When we try to read node value of both sql using getNodeValue()  method we get same output as

select * from t_employees
However in normal circumstances is fine but in certain scenarios we need to know if node value is enclosed in CDATA or not.Consider below code in which you can decide whether the code has CDATA section or not
public static void main(String[] args) {
  try {

   DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
     .newInstance();
   docBuilderFactory.setNamespaceAware(true);
   docBuilderFactory.setIgnoringElementContentWhitespace(true);
   docBuilderFactory.setIgnoringComments(true);
   DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
   Document doc = docBuilder.parse(new File("book.xml"));
   NodeList nd = doc.getElementsByTagName("sql");
   for (int i = 0; i < nd.getLength(); i++) {
    Element e = (Element) nd.item(i);
    if (e.getFirstChild().getNodeType() == Node.CDATA_SECTION_NODE) {
     System.out.println("Node with value "
          + e.getFirstChild().getNodeValue().trim()
       + " is enclosed in CDATA tag");
    }
    else{
     System.out.println("Node with value "
       + e.getFirstChild().getNodeValue()
       + " is not in CDATA tag");
     
    }
   }

  } catch (Exception e) {
   e.printStackTrace();
  }

 }

Above code when run would give output as 
Node with value select * from t_employees is not in CDATA tag
Node with value select * from t_employees is enclosed in CDATA tag
In above code we use Node type to determine if the given node is a text node alone or text node with CDATA section.getNodeType() would return value as 4 od datatype short if it is enclosed in CDATA section or else will return value as 3 from formal text node..