neděle 4. září 2022

Parsing XML with missing namespace

Today I needed to parse some JUnit reports, generated from some old code, so they are missing namespaces. I created a trivial XSD file, but it could not validate the XML as it did not contain matching namespace.

Just a side note - if you need current XSD file, you can find it here.

This is able to parse JUnit-like reports from old Ant-based Jakarta EE 10 tests, so I can integrate the TCK to more modern build in GlassFish (another part of my "army of zombies" which ensures I will not do any mistakes in GlassFish refactoring; these TCK tests are waiting to be refactored too).

import jakarta.xml.bind.JAXBContext;
import jakarta.xml.bind.Unmarshaller;
import java.io.File;
import java.io.FileInputStream;
import javax.xml.XMLConstants;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.util.StreamReaderDelegate;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import xxx.generated.Testsuite;
public class JUnitResultsParser {

public Testsuite parse(final File xml) {
    try (FileInputStream inputStream = new FileInputStream(xml)) {
        JAXBContext context = JAXBContext.newInstance(Testsuite.class);
        Unmarshaller unmarshaller = context.createUnmarshaller();
        SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        Schema schema = sf.newSchema(getClass().getResource("/junit-results.xsd"));
        unmarshaller.setSchema(schema);
        unmarshaller.setAdapter(new ClassAdapter());
        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = new NamespaceURIFixer(xif.createXMLStreamReader(inputStream));
        Testsuite testSuite = (Testsuite) unmarshaller.unmarshal(xsr);
        return testSuite;
    } catch (Exception e) {
        throw new IllegalArgumentException("Could not process the XML file: " + xml, e);
    }
}


private static class NamespaceURIFixer extends StreamReaderDelegate {
    public XsiTypeReader(XMLStreamReader reader) {
        super(reader);
    }

    @Override
    public String getNamespaceURI() {
        String uri = super.getNamespaceURI();
        if (uri == null) {
            // same as in xsd
            return "urn:org:junit:results";
        }
        return uri;
    }
}
}

Oh, and if you want to see how I generated remaining classes, it is this:
<plugins>
    <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>jaxb2-maven-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
            <execution>
                <id>xjc</id>
                <goals>
                    <goal>xjc</goal>
                </goals>
                <configuration>
                    <addGeneratedAnnotation>true</addGeneratedAnnotation>
                    <clearOutputDir>true</clearOutputDir>
                    <locale>en</locale>
                    <sources>
                        <source>src/main/resources/junit-results.xsd</source>
                    </sources>
                    <xjbSources>
                        <xjbSource>src/main/resources/junit-results.xjb</xjbSource>
                    </xjbSources>
                    <packageName>xxx.generated</packageName>
                </configuration>
            </execution>
        </executions>
    </plugin>
</plugins>

Ok, ok, you wanna know what I did in that XJB file ... I noticed there are Class values and I don't want to get just String. So I created a trivial XmlAdapter<String, Class<?>> using Class.forName() and Class.getName() to convert String and Class, and that's it! Btw I found a minor bug - to my surprise I could use generics in the XML, but in generated code there's missing space after the class name. And when I added it to the XML, it remained even in the generated source code. Perhaps I should create an issue for that ...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<jaxb:bindings version="3.0"
    xmlns:jaxb="https://jakarta.ee/xml/ns/jaxb"
    xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    jaxb:extensionBindingPrefixes="xjc"
>
    <jaxb:bindings schemaLocation="junit-results.xsd" node="/xsd:schema">
        <jaxb:bindings node="//xsd:simpleType[@name='javaClassName']">
            <xjc:javaType name="java.lang.Class<?> " adapter="org.glassfish.main.tests.tck.ant.xml.ClassAdapter" />
        </jaxb:bindings>
    </jaxb:bindings>
</jaxb:bindings>

The javaClassName is defined this way in XSD:
    <xsd:simpleType name="javaClassName">
        <xsd:restriction base="xsd:string" />
    </xsd:simpleType>

And now comes the cake - I can run the TCK tests against GlassFish snapshot right from the Eclipse. Yeah, it is not without issues, they simply cannot run one after another as they leave some garbage behind ... but still - I can easily reproduce some issue locally, configure logging, or even attach debugger (I did not try yet, maybe it would need some settings).


Last note: it was rather for fun, because my side doesn't see test classes inside the TCK, so obviously javaClassName cannot be converted to a Class<?>, because the required class is not on my classpath. However - I did not do any XJB customizations for several years, so I had to try it :-)

Žádné komentáře:

Okomentovat