The "cool" type of bug

Details at https://issues.liferay.com/browse/LPS-107983

Summary: A Debian patch caused a compiler inside an app to break - inside an OSGi container, running in Tomcat and on top of OpenJDK, isn't that cool?

LPS-107983 is a terrible little "bug" that affects Liferay GA based projects in Debian and Ubuntu, and at the time with no fix for GAs that run in a JDK 11+ (after 11.0.6).

The affected component was responsible for lots of source code on demand compilation, making it a basic, low level and crucial component. As such, this component made multiple instances stop working, even without code changes due to a JDK update making a single line of code inside Liferay's kernel turn into a really obscuros bug.

Let's see the stack trace to get the feel of it.

ERROR javax.portlet.PortletException: org.apache.jasper.JasperException: PWC6033: Error in Javac compilation for JSP
at com.liferay.portlet.internal.PortletRequestDispatcherImpl.dispatch(PortletRequestDispatcherImpl.java:305)
at com.liferay.portlet.internal.PortletRequestDispatcherImpl.include(PortletRequestDispatcherImpl.java:123)
at com.liferay.portal.kernel.portlet.bridges.mvc.MVCPortlet.include(MVCPortlet.java:578)

Caused by: java.lang.NullPointerException: entry
 at java.base/java.util.Objects.requireNonNull(Objects.java:246)
 at java.base/java.util.zip.ZipFile.getInputStream(ZipFile.java:372)
 at com.liferay.portal.kernel.zip.ZipFileUtil.openInputStream(ZipFileUtil.java:36)
 at com.liferay.portal.osgi.web.servlet.jsp.compiler.internal.JarJavaFileObject.openInputStream(JarJavaFileObject.java:39)
 at jdk.compiler/com.sun.tools.javac.api.ClientCodeWrapper$WrappedFileObject.openInputStream(ClientCodeWrapper.java:592)

You go home and when you get back, puff.. there is something wrong with the code that worked yesterday..

This is OK and we usually see that version of "it is working on my computer", what is different here is how low level you need to get to see what is happening and to notice that this a rare case but could easily go into production servers. (apart from the time consumed trying to see what is wrong with the JSP, as usual)

Imagine, that there are different teams, deployment, development, security... a bug like this could be the result of a security patch the administration team issued and for a while ignored by the development guys.. well, we can easily see how this could happen if people do not talk to each other or if there is a bug on the deployment process itself.

As this was the result of a Debian/Ubuntu package update, with removal of the previous package version, the options for mitigation where the downgrade of the JDK or the usage of alternative repositories were involved were tricky as it conflicts with the goal of keeping the alignment between the deployments and security/general Debian issued updates. 

As no GA version is available with a patch to run Liferay on top of JDK 11+, and extension points would be tricky (this was an OSGi module in the static part of Liferay), we developed a custom code with a source code patch.

The source code patch was based on https://github.com/brianchandotcom/liferay-portal/pull/84189, which was applied to Liferay 7.2.0 and 7.2.1, both in GA, to generate the binaries that go inside osgi/static to trigger the override process.

Here are the files for easy access:

To install it, just copy the file into osgi/static and rename the chosen file to remove the version numbers. Inside the folder the file should be named com.liferay.portal.osgi.web.servlet.jsp.compiler.jar

More Blog Entries