středa 10. srpna 2022

Asciidoc Cleanup in GlassFish Documentation

 I always feel like Frankenstein when I am doing such things 🤣

When you take a look here on the pull request I created, you will perhaps understand why I did it. 

First I tried to read+search+fix everything one by one, then I tried to use regular expressions in Eclipse, but with some 15 seconds on every change ... I would spend year with that. So after several hours I resigned, time to invent the wheels.

#!/bin/bash
# each line means a set of ids of the same anchor
# the last id is usually the most descriptive and should be used
# the others should be removed
ids=$(grep -h -o -r './' --include=\*.adoc --exclude-dir=target -e '^\(\[\[[0-9A-Za-z_\\-]\+\]\]\)\+' | tr -d "[" | tr -s "]]" ",")

This found all those blocks like this one with labels. I was interested in multiple labels of the same place. Why would someone need three labels plus the implicit one? He didn't. But if some tool generated them, you simply added another. And at that time disks were slow, replacing a label by fulltext search ... eh, damn it, let's create another one.
[[ghmrf]][[GSACG00088]][[osgi-alliance-module-management-subsystem]]


for line in ${ids} ; do
IFS=','
labels=($line); 
unset IFS;
len=${#labels[@]};

  • IFS is a separator.  Don't forget to unset it, because it affects further parsing otherwise.
  • () is an array.
  • length of the array ... don't let me explain this syntax, please ...
So, what we have now:
  • number of labels on the same line
  • if the number is 1, everything is alright
  • if the number is greater, we want to choose one of them (the last one was usually the most descriptive), and get rid of the rest.
  • but how? I can't keep everything in my head, so let's give names to all variables.

if [[ $len != 1 ]]
then
correctId=${labels[$len-1]};
maxIncorrectIdIndex=$(($len-2));

Do you see that evil thing? Bash doesn't subtract 2 from len without braces, ha! It took me a while until I found what to do with that. 

for i in $(seq 0 $maxIncorrectIdIndex) ; do
redundantId=${labels[$i]};
if [[ "$redundantIds" == *",$redundantId,"* ]]; then
echo "Duplicit id must be fixed first: ${redundantId}";
grep -o -r './' --include=\*.adoc --exclude-dir=target -e '^\(\[\[[0-9A-Za-z_\\-]\+\]\]\)\+' | grep "\[${redundantId}\]";
exit 1;
fi
redundantIds="${redundantIds},${redundantId},"

This was quite funny, originally I tried to google some Set implementation for bash, but finally I came to a conclusion that all those implementations are bit overkill. I needed just a string containing all found labels and when I found a redundant label colliding with another redundant label, I forced user, myself, to resolve these conflicts first.

If I would replace them automatically with something else ... it could create invalid xrefs. 

Time for changes. Truth is that these commands could be optimized, but why would I do that? I needed just to pass it once and then commit-push-drink a beer/ice-coffee ... while script went through some 500 files, replacing redundant label usages by the usage of chosen one, and then delete all those declarations.

echo "Replacing $redundantId by $correctId and removing [[$redundantId]] labels...";
find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -- "s/#${redundantId}/#${correctId}/g" {} +;  
find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -- "s/\[\[${redundantId}\]\]//g" {} +;
done;
fi;
done;

And finally yet one thing ... replace link: by xref: where possible, because then Asciidoctor can validate these references. I found that some types of mistakes still can pass (remember those collisions?), but it is still an awesome tool.

echo "Replacing link references by xref where it is possible.";

find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -r -- 's/link:([a-zA-Z0-9\-]+)\.html#/xref:\1.adoc#/g' {} +;

find . -type f -name '*.adoc' ! -wholename '*/target/*' -exec sed -i -r -- 's/link:#/xref:#/g' {} +;

echo "Done.";

Then I started maven clean install, it failed reporting some remaining issues, so I went back to Eclipse and fixed them in an hour. And this is the result in PDF (Okular) and HTML (Opera).



Žádné komentáře:

Okomentovat