BudiBadu Logo
Samplebadu

COBOL by Example: XML Parsing

COBOL 2002

Parsing XML documents with event-based XML PARSE statement, implementing SAX-style processing with PROCESSING PROCEDURE callbacks, handling XML-EVENT and XML-TEXT registers, processing large files efficiently without loading entire DOM trees, and managing state manually for nested elements.

Code

       IDENTIFICATION DIVISION.
       PROGRAM-ID. XML-PARSE.
       
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  XML-DOC     PIC X(50) VALUE "<item><id>1</id></item>".
       
       PROCEDURE DIVISION.
           XML PARSE XML-DOC PROCESSING PROCEDURE XML-HANDLER.
           STOP RUN.
           
       XML-HANDLER.
           EVALUATE XML-EVENT
               WHEN 'START-OF-ELEMENT'
                   DISPLAY "Start: " XML-TEXT
               WHEN 'CONTENT-CHARACTERS'
                   DISPLAY "Content: " XML-TEXT
               WHEN 'END-OF-ELEMENT'
                   DISPLAY "End: " XML-TEXT
           END-EVALUATE.

Explanation

XML PARSE is a powerful, event-based parser (similar to SAX). Instead of loading the entire document into a tree, it scans the XML sequentially and triggers a callback procedure for every significant event, such as finding a start tag, content text, or an end tag.

You provide a PROCESSING PROCEDURE that handles these events. The system automatically populates special registers like XML-EVENT (the type of event) and XML-TEXT (the data associated with the event) for you to inspect.

This approach is extremely memory efficient, allowing COBOL programs to process XML files that are gigabytes in size without running out of memory. However, it requires you to manage the state manually (e.g., knowing which tag you are currently inside).

Code Breakdown

9
XML PARSE XML-DOC. Initiates the parsing of the document stored in XML-DOC.
9
PROCESSING PROCEDURE XML-HANDLER. Tells the parser to jump to the 'XML-HANDLER' paragraph for every event.
13
EVALUATE XML-EVENT. The core switch statement. XML-EVENT contains values like 'START-OF-ELEMENT', 'CONTENT-CHARACTERS', etc.
14
WHEN 'START-OF-ELEMENT'. Triggered when a tag like <item> is encountered. XML-TEXT would contain "item".
16
WHEN 'CONTENT-CHARACTERS'. Triggered when text between tags is found. XML-TEXT contains the actual value (e.g., "1").
18
WHEN 'END-OF-ELEMENT'. Triggered when a closing tag like </item> is found.