freemark + dom4j realizes automatic word export

Posted May 25, 20207 min read

Export word we commonly use is to export through POI. POI is best at EXCEL operation. Word style control is still too cumbersome. Today we introduce the export of word templates through FREEMARK.

\ [TOC ]

\ # Development preparation

  • The implementation of this article is based on springboot, so all the products used in the project are derived from springboot. First, we introduce freemark coordinates in the maven project.
<dependency>
    <groupId> org.springframework.boot </groupId>
    <artifactId> spring-boot-starter-freemarker </artifactId>
</dependency>
  • Only need to import the above jar package. The premise is to inherit springboot coordinates. You can export the word through freemark.

Template preparation

  • The above is a template we exported. Filling out the rules is also very simple. We only need to prepare a sample document in advance, and then need to dynamically modify the placeholder through ${}. Just provide the corresponding data when we export. Note here that the format of ${c.no} is actually for traversal of collections in our later stage. Ignore it first. We will focus on the introduction later.

Development test

  • At this step, our preliminary preparations have been completed. For the rest, we just call the method export via freemark.
  • First we build the freemark loading path. Just set the freemark template path. The template path stores what we have written above. It's just that the template here is not a strict word. It is saved as a file in xml format through word.

  • Configure loading path

    //Create a configuration instance
    Configuration configuration = new Configuration();
    //Set encoding
    configuration.setDefaultEncoding("UTF-8");
    //ftl template file
    configuration.setClassForTemplateLoading(OfficeUtils.class, "/template");

  • Get template class
Template template = configuration.getTemplate(templateName);
  • Build output objects
Writer out = new BufferedWriter(new OutputStreamWriter(outputStream, "UTF-8"));
  • Export data to out
template.process(dataMap, out);
  • With the above four steps, we can export. We can do the loading configuration path globally. The remaining three lines of code can be exported. Of course, the exceptions we should do are still needed. Click me to get the source code

Results testing

Functional universal thinking

  • Above we just briefly introduced the process of freemark exporting word. We did not delve into details.
  • Careful friends will find that the above picture is not dynamically set. This kind of function is definitely unreasonable. Picture we want to generate the picture we set ourselves.
  • Another detail is the issue of check boxes. Careful observation will reveal that there are no fields in the check box to control. There is certainly no way to dynamically check.
  • Finally, the main security measure we mentioned above. That piece is our aggregate data. We can't control it through the template.
  • In the above problem, our freemark word template cannot be realized. Having problems is actually a good thing. So that we can make progress. In fact, freemark export is really based on ftl format files. It's just that the syntax of xml and ftl is very similar, so we said above that the export template is xml. Actually we need the ftl file. If it is a ftl file, then the check box and collection of the above problem are solved very well. It can be solved by the if tag and the list tag. We still need to replace the pictures artificially
<#if checkbox ?? && checkbox? seq_contains('choking;')? string('true', 'false') == 'true'> 0052 <#else> 00A3 </# if>

<#list c as c>
dosomethings()
</# list>
  • The above two pieces of code are if and list syntax

Dom4j achieves intelligence

  • Although the above ftl solves the export function problem. But still can not achieve intelligence. What we want to do is actually to generate the ftl file automatically according to the configuration of our word through the program. After all, Baidu found a corresponding method. Dom4j is our final method. We can do special writing by word. Then the program changes the node through dom4j. With dom4j our picture problem is solved. The following mainly talks about the specific treatment details for the above three issues

Checkbox

  • First of all, we agree that the check box of the same type needs to be written in # {} format. Inside is the field name that controls the check box.

  • Then we parse xml through dom4j. Let's look at the original format of the check box in xml

    <w:sym w:font = "Wingdings 2" w:char = "0052" />

  • Then we only need to get the w:sym tag through dom4j. After the tag is obtained, the corresponding text content is # {zhuyaoweihaiyinsu} choking; this content.
  • Match the field name zhuyaoweihaiyinsu for if tag control content
<#if checkbox ?? && checkbox? seq_contains('choking')? string('true', false ') ==' true '> 0052 <#else> 00A3 </# if>

Partial source code

Element root = document.getRootElement();
List <Element> checkList = root.selectNodes("//w:sym");
List <String> nameList = new ArrayList <>();
Integer indext = 1;
for(Element element:checkList) {
    Attribute aChar = element.attribute("char");
    String checkBoxName = selectCheckBoxNameBySymElement(element.getParent());
    aChar.setData(chooicedCheckBox(checkBoxName));
}

set

  • For the same operation, we can obtain the label that needs to be changed. Collections and check boxes are different. The collection is actually a format that we think is prescribed. There is no special label in word. So our agreed format is ${a_b}. First, we traverse the word in the word so the text passes the regular verification to verify whether it meets the collection specification. Meet the current line we get and add the #list tag before the line tag. Then modify ${a \ _b} to ${a.b} As for why the a.b format was not set in the beginning. I just want to say here that it is caused by company culture. I suggest that if you implement this set of functions yourself, it is best to use the a.b format.

Partial source code

Element root = document.getRootElement();
    //Need to get all the label content to judge whether it meets
    List <Element> trList = root.selectNodes("//w:t");
    //rowlist is used to process the entire row of data, because there are multiple columns that meet the standard, and multiple columns need only be processed once in the same row.
    List <Element> rowList = new ArrayList <>();
    if(CollectionUtils.isEmpty(trList)) {
        return;
    }
    for(Element element:trList) {
        boolean matches = Pattern.matches(REGEX, element.getTextTrim());
        if(! matches) {
            continue;
        }
        //Only those who meet the agreed collection format will come here
        //Extract tableId and columnId
        Pattern compile = Pattern.compile(REGEX);
        Matcher matcher = compile.matcher(element.getTextTrim());
        String tableName = "";
        String colName = "";
        while(matcher.find()) {
            tableName = matcher.group(1);
            colName = matcher.group(2);
        }
        //At this time, the content in w:t is obtained. What really needs to be looped is the w:tr where w:t is located. At this time, we need to obtain the current w:tr
        List <Element> ancestorTrList = element.selectNodes("ancestor ::w:tr [1]");
       /* List <Element> tableList = element.selectNodes("ancestor ::w:tbl [1]");
        System.out.println(tableList); * /
        Element ancestorTr = null;
        if(! ancestorTrList.isEmpty()) {
            ancestorTr = ancestorTrList.get(0);
            //Get header information
            Element titleAncestorTr = DomUtils.getInstance(). SelectPreElement(ancestorTr);
            if(! rowList.contains(ancestorTr)) {
                rowList.add(ancestorTr);
                List <Element> foreachList = ancestorTr.getParent(). Elements();
                if(! foreachList.isEmpty()) {
                    Integer ino = 0;
                    Element foreach = null;
                    for(Element elemento:foreachList) {
                        if(ancestorTr.equals(elemento)) {
                            //At this time ancestorTr is the row that needs to be traversed, because we need to expand this label to the loop label pool
                            foreach = DocumentHelper.createElement("# list");
                            foreach.addAttribute("name", tableName + "as" + tableName);
                            Element copy = ancestorTr.createCopy();
                            replaceLineWithPointForeach(copy);
                            mergeCellBaseOnTableNameMap(titleAncestorTr, copy, tableName);
                            foreach.add(copy);
                            break;
                        }
                        ino ++;
                    }
                    if(foreach! = null) {
                        foreachList.set(ino, foreach);
                    }
                }
            } else {
                continue;
            }
        }
    }

image

  • The picture is similar to the check box. Because the word xml is handled by special tags. But our placeholders cannot be occupied by the above placeholders. A real picture is needed to occupy the place. Because only a picture word will have picture tags. We can use @ {imgField} to take place after the picture. Then use dom4j to place the base64 bytecode of the picture with ${imgField}.

Partial source code

//Image index table below
Integer index = 1;
//Get the root path
Element root = document.getRootElement();
//Get image tag
List <Element> imgTagList = root.selectNodes("//w:binData");
for(Element element:imgTagList) {
    element.setText(String.format("${img%s}", index ++));
    //Get the wp tag where the current picture is
    List <Element> wpList = element.selectNodes("ancestor ::w:p");
    if(CollectionUtils.isEmpty(wpList)) {
        throw new DomException("Unknown exception");
    }
    Element imgWpElement = wpList.get(0);
    while(imgWpElement! = null) {
        try {
            imgWpElement = DomUtils.getInstance(). selectNextElement(imgWpElement);
        } catch(DomException de) {
            break;
        }
        //Get the corresponding picture field
        List <Element> imgFiledList = imgWpElement.selectNodes("w:r/w:t");
        if(CollectionUtils.isEmpty(imgFiledList)) {
            continue;
        }
        String imgFiled = getImgFiledTrimStr(imgFiledList);
        Pattern compile = Pattern.compile(REGEX);
        Matcher matcher = compile.matcher(imgFiled);
        String imgFiledStr = "";
        while(matcher.find()) {
            imgFiledStr = matcher.group(1);
            boolean remove = imgWpElement.getParent(). elements(). remove(imgWpElement);
            System.out.println(remove);
        }
        if(StringUtils.isNotEmpty(imgFiledStr)) {
            element.setText(String.format("${%s}", imgFiledStr));
            break;
        }
    }

}

Automatic export based on word(including source code)

  • The above is the process of our export. Through the above logic, we can finally reuse a set of code. Source code download address: https://gitee.com/zxhTom/offi...
Reference network article

dom operation xml
dom generated xml
httpclient get reaction stream
Get jar path
itext implementation kit
ftl common syntax
freemark official website
ftl judgment is not empty
freemark custom function
freemark custom function java
freemark special character escape
Java realizes word to XML various formats

[Join the team](# addMe)

Join the team

WeChat public account

WeChat Official Account