Spring reference docs has some coverage for JVM Checkpoint Restore.
Note: At the moment this demo only works on macOS and Linux. On Windows the checkpoint restore fails.
We will use a snapshot version of Spring Boot and Spring Framework that has improved support for JVM Checkpoint/Restore.
Note: You need to have
curl
installed
We can use start.spring.io to generate a new project to use:
curl https://start.spring.io/starter.zip -d "type=maven-project&language=java&bootVersion=3.2.0-SNAPSHOT&baseDir=demo&groupId=com.example&artifactId=demo&name=demo&description=Demo project for Spring Boot&packageName=com.example.demo&packaging=jar&javaVersion=17&dependencies=web,actuator" -o demo.zip
Extract the project
unzip demo.zip
Change to the demo
directory:
cd demo
We need to add an additional dependency to the pom.xml
in the <dependencies>
section:
<dependency>
<groupId>org.crac</groupId>
<artifactId>crac</artifactId>
<version>1.3.0</version>
</dependency>
Create a Java source file for the controller:
touch src/main/java/com/example/demo/DemoController.java
Add the following content to the src/main/java/com/example/demo/DemoController.java
file:
package com.example.demo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class DemoController {
private static Logger logger = LoggerFactory.getLogger(DemoController.class);
@RequestMapping("/")
public String index() {
String ts = java.time.Clock.systemUTC().instant().toString();
String message = "at " + ts;
logger.info("Message: " + message);
return "Hello " + message;
}
}
Since the CRaC feature relies on CRIU support in the Linux kernel, and we want to be able to run this on other systems, we do need to run this in a container.
Note: The processor architecture needs to match the OpenJDK implementation, so we need to pass in an
ARCH
argument with the architecture. We need to specifyaarch64
for ARM based systems like Macs with M1/M2 chips orx64
for AMD/Intel non-ARM systems.
Create a Dockerfile
file in the root of the project:
touch Dockerfile
Add with this content to the Dockerfile:
FROM ubuntu:22.04
# PROCESSOR_ARCH is either "aarch64" for ARM systems or "x64" for AMD/Intel systems
ARG ARCH
# Set ENV
ENV JAVA_HOME /azul-crac-jdk
ENV PATH $PATH:$JAVA_HOME/bin
# Add CRaC JDK
ADD https://cdn.azul.com/zulu/bin/zulu17.44.17-ca-crac-jdk17.0.8-linux_$ARCH.tar.gz $JAVA_HOME/openjdk.tar.gz
RUN tar --extract --file $JAVA_HOME/openjdk.tar.gz --directory "$JAVA_HOME" --strip-components 1; rm $JAVA_HOME/openjdk.tar.gz;
WORKDIR /home/app
# Add App
COPY target/demo-*.jar /home/app/demo.jar
COPY docker/entrypoint.sh /opt/app/entrypoint.sh
RUN chmod +x /opt/app/entrypoint.sh
CMD ["/opt/app/entrypoint.sh"]
We also need an entrypoint.sh
script that can detect if we already have checkpoint files availabel. If that is the case we restore them, if not we need to start the app and then take a checkpoint.
Create a docker
directory with an empty script file:
mkdir docker
Create the empty docker/entrypoint.sh
file:
touch docker/entrypoint.sh
Next, edit the docker/entrypoint.sh
file and add this content:
#!/bin/bash
CRAC_FILES_DIR=`eval echo ${CRAC_FILES_DIR}`
mkdir -p ${CRAC_FILES_DIR}
cd /home/app
if [ ! -f "$CRAC_FILES_DIR/READY" ]; then
echo "Save checkpoint to $CRAC_FILES_DIR"
( echo 128 > /proc/sys/kernel/ns_last_pid ) 2>/dev/null || while [ $(cat /proc/sys/kernel/ns_last_pid) -lt 128 ]; do :; done
java -Dmanagement.endpoint.health.probes.add-additional-paths="true" -Dmanagement.health.probes.enabled="true" -XX:CRaCCheckpointTo=$CRAC_FILES_DIR -jar /home/app/demo.jar &
PID=$!
sleep 5
jcmd $PID JDK.checkpoint
sleep 5
echo "true" > $CRAC_FILES_DIR/READY
else
echo "Restore checkpoint from $CRAC_FILES_DIR"
fi
exec java -Dmanagement.endpoint.health.probes.add-additional-paths="true" -Dmanagement.health.probes.enabled="true" -XX:CRaCRestoreFrom=$CRAC_FILES_DIR
We can now create the image by running the following.
case $(uname -m) in
arm64) arch="aarch64" ;;
*) arch="x64" ;;
esac
echo "Using CRaC enabled JDK with arch $arch"
./mvnw clean package -DskipTests --no-transfer-progress
docker build -t springdeveloper/demo:0.0.1 --build-arg ARCH=$arch .
When the image is built, we can run it locally. We need to define a volume to store the checkpoint files and we also ned to set some extra capabilities for checkpoint/restore to work. Run with the following command:
docker run -it --rm -p 8080:8080 -e CRAC_FILES_DIR=/crac/demo/0.0.1 --name demo \
--mount source=cracvol,target=/crac \
--cap-add CHECKPOINT_RESTORE --cap-add NET_ADMIN --cap-add SYS_PTRACE --cap-add SYS_ADMIN \
springdeveloper/demo:0.0.1
You should see some log ouput like this:
Save checkpoint to /crac/demo/0.0.1
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v3.2.0-SNAPSHOT)
2023-08-11T16:37:36.413Z INFO 129 --- [ main] com.example.demo.DemoApplication : Starting DemoApplication v0.0.1-SNAPSHOT using Java 17.0.8 with PID 129 (/home/app/demo.jar started by root in /home/app)
2023-08-11T16:37:36.414Z INFO 129 --- [ main] com.example.demo.DemoApplication : No active profile set, falling back to 1 default profile: "default"
2023-08-11T16:37:37.056Z INFO 129 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port 8080 (http)
2023-08-11T16:37:37.062Z INFO 129 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2023-08-11T16:37:37.062Z INFO 129 --- [ main] o.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/10.1.11]
2023-08-11T16:37:37.125Z INFO 129 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2023-08-11T16:37:37.126Z INFO 129 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 682 ms
2023-08-11T16:37:37.424Z INFO 129 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 1 endpoint(s) beneath base path '/actuator'
2023-08-11T16:37:37.456Z INFO 129 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path ''
2023-08-11T16:37:37.469Z INFO 129 --- [ main] com.example.demo.DemoApplication : Started DemoApplication in 1.249 seconds (process running for 1.768)
129:
CR: Checkpoint ...
/opt/app/entrypoint.sh: line 18: 129 Killed java -Dmanagement.endpoint.health.probes.add-additional-paths="true" -Dmanagement.health.probes.enabled="true" -XX:CRaCCheckpointTo=$CRAC_FILES_DIR -jar /home/app/demo.jar
2023-08-11T16:37:46.367Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore
2023-08-11T16:37:46.386Z INFO 129 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path ''
2023-08-11T16:37:46.387Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed in 21 ms
We can see that after the app starts up, there is a checkpoint taken followed by the app getting killed and then starting up again restoring the state from the checkpoint files.
From a different terminal run:
curl localhost:8080
Hello at 2023-08-11T16:17:54.991518Z
We can just kill the running container:
docker kill demo
We use the same command for the second run:
docker run -it --rm -p 8080:8080 -e CRAC_FILES_DIR=/crac/demo/0.0.1 --name demo \
--mount source=cracvol,target=/crac \
--cap-add CHECKPOINT_RESTORE --cap-add NET_ADMIN --cap-add SYS_PTRACE --cap-add SYS_ADMIN \
springdeveloper/demo:0.0.1
You should now see some log ouput like this:
Restore checkpoint from /crac/demo/0.0.1
2023-08-11T16:41:38.805Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore
2023-08-11T16:41:38.828Z INFO 129 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path ''
2023-08-11T16:41:38.829Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed in 24 ms
That startup was much faster since we restored the state from the checkpoint files.
When we are done, we can just kill the running container:
docker kill demo
Finally, remove the volume:
docker volume rm cracvol