How does Java implement dynamic scripting?

Posted Jun 16, 202010 min read

Introduction:In the platform-level Java system, dynamic scripting technology is an indispensable part. This article shares a Java dynamic script implementation plan, gives the key technical points, and makes further discussions on class duplication, life cycle, security issues, etc. Students are welcome to communicate together.

image.png

Foreword

Fanxing is a data service platform whose core function is:users configure a piece of SQL, and Fanstar produces corresponding HSF/TR/SOA/Http access interface.

The flow chart of the star engine is as follows:
image.png
A query request passes through the engine's pipeline and is processed by each valve to obtain the corresponding result data. The two valves highlighted in the figure are the focus of this article:pre-script and post-script.

Tips:Dynamic scripting means that the code release skips the company's internal release platform, and the failure to monitor, grayscale, and rollback is easy to cause online failures, so this technology is strongly not recommended in business systems.

Of course, Java dynamic scripting technology is generally used in relatively few scenarios, mainly used in platform-based systems, such as leetcode platform, D2 platform, star data service platform, etc. This article should be technical exploration and exchange.

Function description

Students familiar with Javascript know that the eval() function, for example:

eval('console.log(2+3)')

Will type 5 in the console.

What we want to do here is similar to eval, is that we want to enter a piece of Java code, and the server executes according to the logic in the code. The function of the pre-script in the star is to customize the user's input parameters, and the function of the post-script is to further process the results queried in the database.

Why is Java script?

Groovy

To achieve the requirements of dynamic scripting, you may first think of Groovy, but using Groovy has several major disadvantages:

  • Although Groovy also runs in the JVM, there are some differences between the syntax and Java, and there are certain learning costs for Java-only students.
  • Dynamic type, lack of constraints. Sometimes too flexible is also a disadvantage, especially for platforms.
  • It is necessary to introduce Groovy's engine jar package, the size is 6.2M, which is not small. For me with code obsessive-compulsive disorder, this will be an important consideration.

Java

Using Java to implement the function of dynamic scripts has the following advantages:

  • The learning cost is low. The main language in Ali is Java. Java is almost a necessary skill for every engineer, so the difficulty of getting started is almost zero.
  • Java can specify interface constraints, so that the front and rear scripts written by users are uniform, which is convenient for management and governance.
  • Real-time compilation and error prompts are provided to facilitate users to correct problems in a timely manner.

Method to realize

Code Engineering Instructions

The code engineering of this article:
[ https://kbtdatacenter-read.oss-cn-zhangjiakou.aliyuncs.com/fusu-share/dynamic-script.zip] ( https://kbtdatacenter-read.oss-cn-zhangjiakou.aliyuncs.com/fusu- share/dynamic-script.zip)

--dynamic-script
------advance-discuss //In-depth discussion of some details in the script dynamic technology
------code-javac //Use code to execute compile, load, and run tasks
------command-javac //Demonstrate the dynamic compilation and loading of java classes using the command line
------facade //Provide a separate interface package to facilitate the smooth demonstration process

Design of Implementation Scheme

We first define an interface, such as Animal, and then the user implements the Animal interface in their own code. Equivalent to the user provided is the implementation class Cat of Animal, so that after the system loads the user's Java code, you can easily use the Java polymorphism feature to access the corresponding method. This not only facilitates users to write specifications, but also makes the platform simple to use.

Using the console command line

First review how to use the command line to compile and run Java classes.

First, put a jar package on the facade module to facilitate subsequent dependencies:

cd project root directory
mvn install

Go to the resources folder of the module command-javac(the absolute path varies from person to person):

# Enter the directory where Cat.java is located
cd /Users/fusu/d/group/fusu-share/dynamic-script/command-javac/src/main/resources
# Compile with command line tool javac, use cp delimiter on linux/mac:use windown;
javac -cp .:/Users/fusu/d/group/fusu-share/dynamic-script/facade/target/facade-1.0.jar Cat.java
# Run
java -cp .:/Users/fusu/d/group/fusu-share/dynamic-script/facade/target/facade-1.0.jar Cat
# got the answer
#> I'm Cat Main

Use Process to call javac to compile

With the above console command line operation, it is easy to think of using Java's Process class to call the command line tool to execute the javac command, and then use URLClassLoader to load the generated class file. The code is located in the ProcessJavac.java file under the module command-javac. The core code is as follows:
image.png
image.png

Programmatically compile and load

Both of the above methods have an obvious disadvantage, that is, they need to depend on the Cat.java file, and the Cat.class file must be generated. In the star platform, it is natural to hope that this process is completed in memory, and minimize IO operations, so it is necessary to compile Java code programmatically. The code is located in the CodeJavac.java file under the module code-javac. The core code is as follows:

//Class name
String className = "Cat";
//The path where the project is located
String projectPath = PathUtil.getAppHomePath();
String facadeJarPath = String.format(".:%s/facade/target/facade-1.0.jar", projectPath);

//Code that needs to be compiled
Iterable<? extends JavaFileObject> compilationUnits = new ArrayList<JavaFileObject>() {{
  add(new JavaSourceFromString(className, getJavaCode()));
}};

//Compile options, corresponding to command line parameters
List<String> options = new ArrayList<>();
options.add("-classpath");
options.add(facadeJarPath);

//Use the system's compiler
JavaCompiler javaCompiler = ToolProvider.getSystemJavaCompiler();

StandardJavaFileManager standardJavaFileManager = javaCompiler.getStandardFileManager(null, null, null);
ScriptFileManager scriptFileManager = new ScriptFileManager(standardJavaFileManager);

//Use stringWriter to collect errors.
StringWriter errorStringWriter = new StringWriter();

//Start to compile
boolean ok = javaCompiler.getTask(errorStringWriter, scriptFileManager, diagnostic -> {
  if(diagnostic.getKind() == Diagnostic.Kind.ERROR) {

    errorStringWriter.append(diagnostic.toString());
  }
}, options, null, compilationUnits).call();

if(!ok) {
  String errorMessage = errorStringWriter.toString();
  //Compile error, throw error directly.
  throw new RuntimeException("Compile Error:{}" + errorMessage);
}

//Get the compiled binary data.
final Map<String, byte[]> allBuffers = scriptFileManager.getAllBuffers();
final byte[]catBytes = allBuffers.get(className);

//Use a custom ClassLoader to load the class
FsClassLoader fsClassLoader = new FsClassLoader(className, catBytes);
Class<?> catClass = fsClassLoader.findClass(className);
Object obj = catClass.newInstance();
if(obj instanceof Animal) {
  Animal animal =(Animal) obj;
  animal.hello("Moss");
}

//You will get the result:Hello, Moss! This is Cat.

The code mainly uses the system compiler JavaCompiler. Calling its getTask method is equivalent to executing javac on the command line. The custom ScriptFileManager is used in the getTask method to collect binary results, and errorStringWriter is used to collect information that may be wrong during the compilation process. Finally, a custom class loader FsClassLoader is used to load the class Cat from the binary data.

Discuss in-depth

The key points for the implementation of dynamic scripts are introduced above, but there are still many problems to be discussed. The author throws out the main problems and briefly discusses them.

ClassLoader scope issue

The class loading mechanism of the JVM adopts the parent delegation mode. When the class loader receives the load request, it will delegate its own parent loader to perform the loading task, so all the loading tasks will be passed to the top-level class loader, only when the parent loader When it can't handle it, the subloader will perform the loading task by itself. I believe that the following picture is familiar to everyone.
image.png

The JVM's unique identifier for a class is(Classloader, full name of the class), so it may happen that the interface Animal has been loaded, but when we use the CustomClassLoader to load Cat, it prompts that Animal cannot be found. This is because Animal and Cat are not loaded by the same Classloader.

Since the defineClass method is protected, to use byte[]to load a class, you need to customize a classloader. How to specify the parent loader of this Classloader is more particular.

The company's internal Java system uses pandora. Pandora has its own class loader and thread loader. Therefore, we use the animal loader interface AnimalClassLoader as the standard, set the thread ClassLoader to animalClassLoader, and set the parent of the custom ClassLoader. The loader is specified as animalClassLoader. The code is located under the module advance-discuss, the reference code is as follows:

/*FsClassLoader.java*/
public FsClassLoader(ClassLoader parentClassLoader, String name, byte[]data) {
  super(parentClassLoader);
  this.fullyName = name;
  this.data = data;
}


/*AdvanceDiscuss.java*/

//The class loader of the interface
ClassLoader animalClassLoader = Animal.class.getClassLoader();
//Set the current thread class loader
Thread.currentThread().setContextClassLoader(animalClassLoader);
//...
//Use a custom ClassLoader to load the class
FsClassLoader fsClassLoader = new FsClassLoader(animalClassLoader, className, catBytes);

Through these guarantees, there will be no problem of not finding the class.

Class Duplication

When we only load one class dynamically, we naturally don t have to worry about the duplication of the full name of the class, but if we need to load multiple same classes, it is necessary to perform special processing. You can use regular expressions to capture the user s class name, and then increase Random string method to avoid duplicate names.

From the above, we know that the JVM's unique identifier for a class is(Classloader, full name of the class), so as long as we can ensure that our custom Classloader is a different object, we can also avoid the problem of class name duplication.

Class life cycle issues

Java script dynamics must consider the issue of garbage collection, otherwise as the Class is loaded more and more, the system's memory will soon be insufficient. We know that in the JVM, the object instance will be GC(Garbage Collection garbage collection) after it is not referenced, and Class as a special object in the JVM will also be GC(clear the Class information in the method area and java in the heap area) .lang.Class object. At this time the life cycle of Class is over).

To be recycled, Class needs to meet the following three conditions:

  • NoInstance:All instances of this class have been GC.
  • NoClassLoader:The ClassLoader instance that loaded this class has been GC.
  • NoReference:The java.lang.Class of this class is not referenced(XXX.class, using static variables/methods).

It can be derived from the above three conditions that the classes loaded by the JVM's own class loader(Bootstrap class loader, Extension class loader) will never be GCed in the JVM's life cycle. The Class loaded by the custom class loader can be GCed, so when coding, the custom Classloader must be made into a local variable to allow it to be naturally recovered.

In order to verify the GC situation of Class, we write a simple loop to observe, in the AdvanceDiscuss.java file under the module advance-discuss:

for(int i = 0; i <1000000; i++) {
  //Compile, load and execute
  compileAndRun(i);

  //10000 recycle
  if(i%10000 == 0) {
    System.gc();
  }
}

//Force to recycle
System.gc();
System.out.println("rest 10s");
Thread.currentThread().sleep(10 * 1000);

Open the jvisualvm program(located in JAVA_HOME/bin/jvisualvm) that comes with Java, you can visually watch the JVM.
640.gif
In the above figure, you can see the change graph of the loading class and the heap size are jagged, indicating that the dynamic loading class can be effectively recovered.

safe question

Letting users write scripts and run them on the server, it is a very dangerous thing to think about it, so how to ensure the safety of the script is a problem that must be taken seriously.

Class whitelist and blacklist mechanism
In the Java code written by the user, we need to specify the range of classes allowed by the user. Imagine that the user calls File to operate the file on the server, which is very insecure. The javassist library can analyze Class binary files. With the help of this library, we can easily get the classes that Class depends on. The code is located in the JavassistUtil.java file under the module advance-discuss. The following is the core code:

public static Set<String> getDependencies(InputStream is) throws Exception {

  ClassFile cf = new ClassFile(new DataInputStream(is));
  ConstPool constPool = cf.getConstPool();
  HashSet<String> set = new HashSet<>();
  for(int ix = 1, size = constPool.getSize(); ix <size; ix++) {
    int descriptorIndex;
    if(constPool.getTag(ix) == ConstPool.CONST_Class) {
      set.add(constPool.getClassInfo(ix));
    } else if(constPool.getTag(ix) == ConstPool.CONST_NameAndType) {
      descriptorIndex = constPool.getNameAndTypeDescriptor(ix);
      String desc = constPool.getUtf8Info(descriptorIndex);
      for(int p = 0; p <desc.length(); p++) {
        if(desc.charAt(p) =='L') {
          set.add(desc.substring(++p, p = desc.indexOf(';', p)).replace('/','.'));
        }
      }
    }
  }
  return set;
}

After getting the dependency, you can first use the whitelist to filter. The following packages or classes only involve simple data manipulation and processing, and are allowed:

java.lang,
java.util,
com.alibaba.fastjson,
java.text,
[Ljava.lang(array under java.lang, for example `String[]`)
[D(double[])
[F(float[])
[I(int[])
[J(long[])
[C(char[])
[B(byte[])
[Z(boolean[])

However, the categories under individual packages are also more dangerous and need to be filtered out. At this time, you need to use the blacklist to do another screening. These packages or categories are not allowed:

java.lang.Thread
java.lang.reflect

Thread isolation
It is possible that the user's code contains an infinite loop, or the execution time is particularly long. This problematic logic is not perceptible at compile time, so a separate thread is also required to execute the user's code. When a timeout occurs or memory is used If it is too large, kill it directly.

Caching issues

The above discussion is the complete process from compilation to execution, but sometimes the user's code has not changed, and there is no need to compile again when we execute it, so we can design a caching strategy. When the user code has not changed, just Using the lazy loading strategy, when the user's code changes, release the previously loaded Class and reload the new code.

Timely loading problems

When the system restarts, it means that all the classes are released and need to be reloaded. For some important scripts, the short lazy loading time may also be unacceptable. For this, it needs to be collected separately. According to the system startup The system is loaded into the memory together, so that when the health check passes, it is guaranteed that the class has been loaded, thereby effectively reducing the response time.

Postscript

Due to space issues, caching issues and timely loading issues have only been discussed briefly. Of course, Java dynamic scripting technology also involves many other details, which need to be continuously summarized in the process of use. Everyone is welcome to communicate together~