neděle 4. září 2022

Parsing XML with missing namespace

Today I needed to parse some JUnit reports, generated from some old code, so they are missing namespaces. I created a trivial XSD file, but it could not validate the XML as it did not contain matching namespace.

Just a side note - if you need current XSD file, you can find it here.

This is able to parse JUnit-like reports from old Ant-based Jakarta EE 10 tests, so I can integrate the TCK to more modern build in GlassFish (another part of my "army of zombies" which ensures I will not do any mistakes in GlassFish refactoring; these TCK tests are waiting to be refactored too).

import jakarta.xml.bind.JAXBContext;
import jakarta.xml.bind.Unmarshaller;
import java.io.File;
import java.io.FileInputStream;
import javax.xml.XMLConstants;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.util.StreamReaderDelegate;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import xxx.generated.Testsuite;
public class JUnitResultsParser {

public Testsuite parse(final File xml) {
    try (FileInputStream inputStream = new FileInputStream(xml)) {
        JAXBContext context = JAXBContext.newInstance(Testsuite.class);
        Unmarshaller unmarshaller = context.createUnmarshaller();
        SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        Schema schema = sf.newSchema(getClass().getResource("/junit-results.xsd"));
        unmarshaller.setSchema(schema);
        unmarshaller.setAdapter(new ClassAdapter());
        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = new NamespaceURIFixer(xif.createXMLStreamReader(inputStream));
        Testsuite testSuite = (Testsuite) unmarshaller.unmarshal(xsr);
        return testSuite;
    } catch (Exception e) {
        throw new IllegalArgumentException("Could not process the XML file: " + xml, e);
    }
}


private static class NamespaceURIFixer extends StreamReaderDelegate {
    public XsiTypeReader(XMLStreamReader reader) {
        super(reader);
    }

    @Override
    public String getNamespaceURI() {
        String uri = super.getNamespaceURI();
        if (uri == null) {
            // same as in xsd
            return "urn:org:junit:results";
        }
        return uri;
    }
}
}

Oh, and if you want to see how I generated remaining classes, it is this:
<plugins>
    <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>jaxb2-maven-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
            <execution>
                <id>xjc</id>
                <goals>
                    <goal>xjc</goal>
                </goals>
                <configuration>
                    <addGeneratedAnnotation>true</addGeneratedAnnotation>
                    <clearOutputDir>true</clearOutputDir>
                    <locale>en</locale>
                    <sources>
                        <source>src/main/resources/junit-results.xsd</source>
                    </sources>
                    <xjbSources>
                        <xjbSource>src/main/resources/junit-results.xjb</xjbSource>
                    </xjbSources>
                    <packageName>xxx.generated</packageName>
                </configuration>
            </execution>
        </executions>
    </plugin>
</plugins>

Ok, ok, you wanna know what I did in that XJB file ... I noticed there are Class values and I don't want to get just String. So I created a trivial XmlAdapter<String, Class<?>> using Class.forName() and Class.getName() to convert String and Class, and that's it! Btw I found a minor bug - to my surprise I could use generics in the XML, but in generated code there's missing space after the class name. And when I added it to the XML, it remained even in the generated source code. Perhaps I should create an issue for that ...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<jaxb:bindings version="3.0"
    xmlns:jaxb="https://jakarta.ee/xml/ns/jaxb"
    xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    jaxb:extensionBindingPrefixes="xjc"
>
    <jaxb:bindings schemaLocation="junit-results.xsd" node="/xsd:schema">
        <jaxb:bindings node="//xsd:simpleType[@name='javaClassName']">
            <xjc:javaType name="java.lang.Class<?> " adapter="org.glassfish.main.tests.tck.ant.xml.ClassAdapter" />
        </jaxb:bindings>
    </jaxb:bindings>
</jaxb:bindings>

The javaClassName is defined this way in XSD:
    <xsd:simpleType name="javaClassName">
        <xsd:restriction base="xsd:string" />
    </xsd:simpleType>

And now comes the cake - I can run the TCK tests against GlassFish snapshot right from the Eclipse. Yeah, it is not without issues, they simply cannot run one after another as they leave some garbage behind ... but still - I can easily reproduce some issue locally, configure logging, or even attach debugger (I did not try yet, maybe it would need some settings).


Last note: it was rather for fun, because my side doesn't see test classes inside the TCK, so obviously javaClassName cannot be converted to a Class<?>, because the required class is not on my classpath. However - I did not do any XJB customizations for several years, so I had to try it :-)

středa 10. srpna 2022

Asciidoc Cleanup in GlassFish Documentation

 I always feel like Frankenstein when I am doing such things 🤣

When you take a look here on the pull request I created, you will perhaps understand why I did it. 

First I tried to read+search+fix everything one by one, then I tried to use regular expressions in Eclipse, but with some 15 seconds on every change ... I would spend year with that. So after several hours I resigned, time to invent the wheels.

#!/bin/bash
# each line means a set of ids of the same anchor
# the last id is usually the most descriptive and should be used
# the others should be removed
ids=$(grep -h -o -r './' --include=\*.adoc --exclude-dir=target -e '^\(\[\[[0-9A-Za-z_\\-]\+\]\]\)\+' | tr -d "[" | tr -s "]]" ",")

This found all those blocks like this one with labels. I was interested in multiple labels of the same place. Why would someone need three labels plus the implicit one? He didn't. But if some tool generated them, you simply added another. And at that time disks were slow, replacing a label by fulltext search ... eh, damn it, let's create another one.
[[ghmrf]][[GSACG00088]][[osgi-alliance-module-management-subsystem]]


for line in ${ids} ; do
IFS=','
labels=($line); 
unset IFS;
len=${#labels[@]};

  • IFS is a separator.  Don't forget to unset it, because it affects further parsing otherwise.
  • () is an array.
  • length of the array ... don't let me explain this syntax, please ...
So, what we have now:
  • number of labels on the same line
  • if the number is 1, everything is alright
  • if the number is greater, we want to choose one of them (the last one was usually the most descriptive), and get rid of the rest.
  • but how? I can't keep everything in my head, so let's give names to all variables.

if [[ $len != 1 ]]
then
correctId=${labels[$len-1]};
maxIncorrectIdIndex=$(($len-2));

Do you see that evil thing? Bash doesn't subtract 2 from len without braces, ha! It took me a while until I found what to do with that. 

for i in $(seq 0 $maxIncorrectIdIndex) ; do
redundantId=${labels[$i]};
if [[ "$redundantIds" == *",$redundantId,"* ]]; then
echo "Duplicit id must be fixed first: ${redundantId}";
grep -o -r './' --include=\*.adoc --exclude-dir=target -e '^\(\[\[[0-9A-Za-z_\\-]\+\]\]\)\+' | grep "\[${redundantId}\]";
exit 1;
fi
redundantIds="${redundantIds},${redundantId},"

This was quite funny, originally I tried to google some Set implementation for bash, but finally I came to a conclusion that all those implementations are bit overkill. I needed just a string containing all found labels and when I found a redundant label colliding with another redundant label, I forced user, myself, to resolve these conflicts first.

If I would replace them automatically with something else ... it could create invalid xrefs. 

Time for changes. Truth is that these commands could be optimized, but why would I do that? I needed just to pass it once and then commit-push-drink a beer/ice-coffee ... while script went through some 500 files, replacing redundant label usages by the usage of chosen one, and then delete all those declarations.

echo "Replacing $redundantId by $correctId and removing [[$redundantId]] labels...";
find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -- "s/#${redundantId}/#${correctId}/g" {} +;  
find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -- "s/\[\[${redundantId}\]\]//g" {} +;
done;
fi;
done;

And finally yet one thing ... replace link: by xref: where possible, because then Asciidoctor can validate these references. I found that some types of mistakes still can pass (remember those collisions?), but it is still an awesome tool.

echo "Replacing link references by xref where it is possible.";

find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -r -- 's/link:([a-zA-Z0-9\-]+)\.html#/xref:\1.adoc#/g' {} +;

find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -r -- 's/link:#/xref:#/g' {} +;

echo "Done.";

Then I started maven clean install, it failed reporting some remaining issues, so I went back to Eclipse and fixed them in an hour. And this is the result in PDF (Okular) and HTML (Opera).



úterý 2. srpna 2022

How To Find Suspend/WakeUp State Changes in Syslog

Some time ago I required to find out when my laptop "went to bed" and when it "woke up". And I have found that it is not so simple as I would expect, and even worse, I didn't find a response on StackOverflow. So I responded to a similar question.

But in Kubuntu 22.04 the log message changed, so I changed the command to the following form. The zgrep command can search also in compressed syslog files:

sudo zgrep -e 'systemd-sleep' /var/log/syslog* | grep -e "Entering sleep state\|System returned from sleep"

The output is not ordered, but I am lazy to implement it, it would be also much more than I needed.



středa 13. července 2022

Native2ascii Maven Plugin Is Still Used

This will be rather a short message - today I spent some time with the plugin as someone created a security issue related to obsoleted dependencies. So I did a bit more ...

  • Migrated from Travis to GitHub Actions
  • Excluded obsoleted Velocity and Struts transitive dependencies (unused, however ...)
  • Replaced deprecated usages of commons-lang3 classes
  • Replaced "closeSilently" by try-with (do you remember years where there weren't suppressed exceptions?)
Despite you can use property files with UTF-8 now, more usual is still Latin1+escapes. Then is better to write property files in human language and use this plugin to convert them to the "language of machines". Eclipse integration with this plugin works well too.

See https://github.com/mojohaus/native2ascii-maven-plugin