YV&C Current Issue


ADS BY GOOGLE
YV&C Recommended Yacht Charter Links

Designing an Open, Standards-Based Reporting System - XML meets the challenges and design goals of a business reporting system
Designing an Open, Standards-Based Reporting System - XML meets the challenges and design goals of a business reporting system

As XML has grown more prevalent as a data delivery mechanism, so too has the need to use it for presentation in a wide variety of reporting formats. XML is useful for more than just the delivery of information, however. It can be used to help solve a wide range of problems encountered when designing a business data reporting solution, from specifying the layout of the reports themselves to controlling where the data used in the report comes from.

At Panscopic, we develop an enterprise-class data reporting and analytics product that consists of two main elements: the Panscopic Scope Server, which takes report definition files and executes them to produce finished output, and the Scope Creation Suite, a set of client-side authoring tools that create the report definitions that the server executes (reports are called "Scopes" in Panscopic's parlance). Some of the challenges and design goals that we faced when designing the product architecture were:

  • Report definitions needed to be in a form, preferably text-based, that was familiar to developers and could be edited outside of our tools if necessary.
  • The report definition syntax needed to maintain a clear separation between the report data content and its presentation, and promote the reuse of basic report objects such as queries, layouts, and parameters.
  • The system had to be extensible, so that new data sources, layout components, etc., could be introduced over time and from outside of our development organization.
  • The product needed to be capable of extracting information directly from XML-formatted data in a natural, standardized way.

    We found that XML was particularly well suited to solving these design challenges. For example, the reports themselves are created and stored as XML files, which are then loaded and executed by the Scope server. Also, each report file contains a reference to one or more data sources that are exposed by the server. The server maintains this list of available data sources in another XML file, which can be edited by an administrator to point to whatever data source is desired. Since the reports' references to these data sources go through this abstraction layer, administrators can change the data source pointers as often as they like without affecting the reports (as long as they follow the same schema, of course). Finally, we used XPath, the W3C standard for referring to tags within an XML structure, to extract information from within XML-formatted data.

    The Report Definition Language
    To achieve our higher-level design goals, such as separating content from layout, promoting object reuse, and extracting specific information from XML data, we needed to go a little further with our design than just finding a representation for everything as XML tags. The solution we came up with was the Report Definition Language, or RDL (pronounced "riddle"). RDL is an XML-based file format that describes how a report retrieves its data, manipulates it, and realizes the result. RDL files are divided into top-level sections, each represented by an XML tag, that contain the different parts of the report. The most important sections of an RDL file are <rdl:parameters>, <rdl:content>, and <rdl:layout>. Listing 1 shows an example RDL file.

    The parameters section contains descriptions of the reports parameters, which act essentially like variables passed to the report at runtime. Parameters can either be fixed-form, meaning that the value is restricted to one from a predefined list of values, or free-form, in which case the value can be anything. Each of these parameters can be assigned to a form control on a Web page that supplies its value, or the value can be directly assigned in the URL that is used to request the report from the server. Parameters can also be assigned a default value to be used if none is supplied by the user, and can be marked as mandatory, indicating that the user must supply a value for it.

    The content section indicates where the data for the report will be drawn from. Inside the content section are one or more <rdl:data> tags, each of which specifies a data source that supplies data to the report. These data sources are maintained in a list by the server (as described earlier); Listing 2 shows an example of a data source entry that might appear in the server's configuration file. Each data tag contains sub-tags that are specific to the type of data source being accessed. The example shown in Listing 1 is using a connection that resolves to a relational data source (as indicated by the <rdl:rdbms> tag). The type of data source and the way it exposes data columns is kept abstracted from the layout by the <rdl:return> section contained within the data section. It is the job of the <rdl:return> section to expose the columns of data returned by the data source to the layout section in a uniform way. This approach allows vastly different types of data and layout components to be hooked together seamlessly.

    The layout section determines how the report will be visualized for the user. Of course, not all reports are necessarily consumed by humans: the report may be delivered in XML format for consumption by another service. Inside the layout section are one or more <rdl:useComponent> tags, which refer to layout components used to format the data specified in the content section. Layout components are specified and configured in XML, but are implemented on the back end as JSP pages. Each component has the built-in ability to realize the report as one of a number of different formats, such as HTML, XML, or PDF.

    By keeping these sections distinctly separate, the report is broken up into its constituent parts, each of which can be saved and reused in other reports. For example, a query that is written for one report can easily be saved and stored in the server's network-accessible catalog for use by another developer in a different report, possibly with an entirely different layout. Similarly, a particular layout can be used again and again with other data queries.

    Extracting Data from XML Data Sources
    To address the design requirement of being able to extract information directly from an XML data source, we turned to XPath, the W3C standard for navigating among the nodes of an XML structure.

    The example shown in Listing 3 illustrates a content section that is using an XML data source (indicated here by the <rdl:xmlsource> tag). You can see from this example that the <rdl:return> section mentioned earlier makes use of certain attributes that contain XPath syntax within them. The <rdl:return> tag itself has a "selectNode" attribute, and each of the <rdl:column> tags has a "fieldPath" attribute. These attributes contain XPath expressions that refer to specific tags in a returned XML data structure.

    The selectNode attribute identifies a set of nodes in the XML data that corresponds to repeating, "record-style" information from which data is to be extracted. The Scope server iterates over the set of nodes that matches this expression and evaluates the <rdl:column> tags' fieldPath attribute expressions against each of those nodes. In this way, the data is extracted from the XML and presented to the layout section in the same way that two-dimensional data from traditional JDBC sources is, reducing the complexity and required learning curve for the developer. In addition, the XPath expressions can be written to provide further filtering and processing on the returned data.

    Conclusion
    Using XML to solve our design requirements had several beneficial results. First, we were able to take advantage of a wide range of available open-source code to perform common XML operations, such as parsing the code to build DOM trees for editing and using SAX to process the files on the server. Second, using XML allowed us to keep the format of our reports open and text-based, which in turn allows developers to use whichever tools they are comfortable with and to work with a syntax with which they are already familiar. This also made it easy for us to define extensibility APIs that allow customers to add their own components to the product in a uniform, easily understood way, and that simplify administration tasks such as adding new data sources to the system. Finally, we are better able to take advantage of new XML technologies as they become available, such as XPath and XQuery, for working with native XML data.

    About Joe Marini
    Joe Marini is a senior engineer at Panscopic Corporation (www.panscopic.com) an XML- and J2EE standards–based reporting solution provider. Joe has written and collaborated on a series of books about Web development.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Couldn't agree with Joe more strongly. We have just completed design of a complete legislative information system fo a large state Senate, using XML as the foundation of a unified information life cycle. The Senate took the innovative step of also mandating integrtation of many of its legacy systems in the XML environment. We found appropriate software to support this and have developed the entire enterprise information model based on XML. My own conviction is that this the future of complex data.




    YV&C Recommended Yacht Charter Links

    ADS BY GOOGLE

    ADS BY GOOGLE