Deploy the TinyMCE Import from Word and Export to Word service server-side component using Docker (individually licensed)

Overview

Import from Word and Export to Word is a set of two converter services that allow for both exporting a structured HTML content (e.g. created with TinyMCE WYSIWYG editor) into a .docx Microsoft Word file and for importing content from .docx and .dotx files and converting it into a styled HTML document.

A valid license key is needed in order to install Import from Word and Export to Word On-Premises. Contact us for a trial license key.

The documentation in this section refers to a simplified version of the Import from Word and Export to Word On-Premises which was designed to be easy to set up and maintain (while preserving all the features). It also lowers the running costs by reducing the number of servers required to run the whole application to the minimum.

The only requirement to run Import from Word and Export to Word On-Premises is a container runtime or orchestration tool e.g. Docker, Kubernetes, Podman.

Requirements

To run Import from Word and Export to Word On-Premises, a Docker environment is required. Alternatively, use a CaaS available from your cloud provider, such as AWS ECS, Google GKE or Azure ACS.

There are many factors that may affect Export to Word On-Premises performance. The most influential are: the size of exported content, the size of images, and the number of concurrent requests. Also, because your application can prioritize fast response times, or it should handle high load, it is impossible to provide one recommended server specification, that will fit all use cases.

Assuming response time below 10 seconds, one server (2CPU 2GB RAM) with 1 docker container can handle:

  • up to 45 concurrent requests with an average content of 1 A4 page (~1k characters)

  • up to 30 concurrent requests with an average content of 5 A4 pages (~7,5k characters)

  • up to 10 concurrent requests with an average content of 20 A4 pages (~30k characters)

The listed concurrent requests numbers are not a hard limit of a Import from Word and Export to Word On-Premises instance. It can handle more concurrent requests, but the response time will be longer.

High availability

One docker container with Import from Word and Export to Word On-Premises benefits from additional CPUs on the machine. It is recommended that 2 CPUs are allocated for every docker container. To scale your app on a single machine, increase the number of CPUs and docker containers, however, TinyMCE recommends scaling on at least three hosts to ensure the reliability of the system.

A load balancer, like HAProxy or NGINX (see the load balancer configuration examples in the SSL communication guide), is required for scaling on several machines. Of course, it is possible to use any cloud provider for scaling, like Amazon ECS, Azure Container Instances or Kubernetes.

Contact us if you have any questions about server resources needed for your use case of Import from Word and Export to Word On-Premises.

Installation

A valid license key is needed in order to install Import from Word and Export to Word On-Premises. Contact us for a trial license key.

Supported technologies

The application is provided as a docker image by default.

It can be run with any Open Container runtime tool e.g. Kubernetes, OpenShift, Podman, Docker and many others.

Refer to the Requirements guide for more information about the hardware and software requirements to run the Import from Word and Export to Word On-Premises.

Setting up the application using a Docker container

  1. The username and password credentials supplied by Tiny are utilized for logging into the Docker registry and retrieving the Docker image.

  2. Containerize the application using docker or docker-compose.

  3. Use a demo page to verify if the application works properly.

Containerize example using docker

Login to Docker registry:

docker login -u [username] -p [password] registry.containers.tiny.cloud

Launch the Docker container:

docker run --init -p 8080:8080 -e LICENSE_KEY=[your_license_key] registry.containers.tiny.cloud/docx-converter-tiny:[version]

If using authorization provide the SECRET_KEY:

docker run --init -p 8080:8080 -e LICENSE_KEY=[your_license_key] -e SECRET_KEY=[your_secret_key] registry.containers.tiny.cloud/docx-converter-tiny:[version]

Read more about using authorization in the authorization section.

Containerize example using docker-compose

  1. Create the docker-compose.yml file:

    version: "3.8"
    services:
      doc-converter:
        image: registry.containers.tiny.cloud/docx-converter-tiny:[version]
        ports:
          - "8080:8080"
        restart: always
        init: true
        environment:
          LICENSE_KEY: "licensekey"
          # Secret Key is optional
          SECRET_KEY: "secret_key"

    For details on SECRET_KEY usage check the authorization section.

  2. Run:

docker-compose up
  • Without a correct LICENSE_KEY the application will not start.

    • If the license is invalid, a wrong license key error will display in the logs and the application will not run.

  • It is advisable to override the SECRET_KEY variable using a unique and hard to guess string for security reasons.

Next steps

Use the http://localhost:8080/v2/convert/html-docx endpoint to export DOCX files. Check out the authorization section to learn more about tokens and token endpoints.

Use the demo page available on http://localhost:8080/demo to generate an example DOCX file.

Refer to the Import from Word and Export to Word REST API documentation on http://localhost:8080/docs for more details.

Authorization

To enable authorization, set the SECRET_KEY environment variable during the installation.

If the SECRET_KEY variable is set, then all requests must have a header with a JWT (JSON Web Token) signed with this key. The token should be passed as a value of the Authorization header for each request sent to the Import from Word and Export to Word REST API.

If the SECRET_KEY is not setup during the installation, then Import from Word and Export to Word On-Premises will not require any headers with tokens when sending requests to the Import from Word and Export to Word REST API. However, this it is not recommend to skip the authorization when running Import from Word and Export to Word On-Premises in a public network.

Generating the token

Tiny recommends using the libraries listed on jwt.io to generate the token. The token is considered valid, when:

  • it is signed with the same SECRET_KEY as passed to the Import from Word and Export to Word On-Premises instance,

  • it was created within the last 24 hours,

  • it is not issued in the future (e.i. the iat timestamp cannot be newer than the current time),

  • it has not expired yet.

Tokens for the Import from Word and Export to Word On-Premises do not require any additional claims such as the environment ID (which is specific for Collaboration Server On-Premises), the token can be created with an empty payload.

If the specific use case involves sending requests from a backend server, then JWT tokens can be generated locally, as shown in the below request example.

In the case of editor plugins or other frontend usages, a token endpoint should be created, that returns a valid JWT token for authorized users.

Using editor plugins

The are are two plugins available for TinyMCE: the Import from Word and the Export to Word plugins. The plugins will automatically request the token from the given tokenUrl variable and will set the Authorization header when making an export request. Refer to the respective guides for details on adding the Import from Word and Export to Word features to your WYSIWYG editor!

Request example with an Authorization header

The following example presents a request that generates a valid JWT token and sets it as an Authorization header:

const fs = require( 'fs' );
const jwt = require( 'jsonwebtoken' );
const axios = require( 'axios' );

const SECRET_KEY = 'secret';

const token = jwt.sign( {}, SECRET_KEY, { algorithm: 'HS256' } );

const data = {
   html: "<p>I am a teapot</p>",
   css: "p { color: red; }",
};

const config = {
   headers: {
      Authorization: token
   },
   responseType: 'arraybuffer',
};

axios.post( 'http://localhost:8080/v2/convert/html-docx', data, config )
   .then( response => {
      fs.writeFileSync('./file.docx', response.data, 'binary');
   } ).catch( error => {
      console.log( error );
   } );

SECRET_KEY: This is the key what has been passed to the Import from Word and Export to Word On-Premises instance.

Please refer to the Import from Word and the Export to Word REST API documentation to start using the service.

If API clients like Postman or Insomnia are used, then set the JWT token as an Authorization header in the Headers tab. Do not use the built-in token authorization as this will generate invalid header with a Bearer prefix added to the token.

API usage

Import from Word and Export to Word On-Premises consists of two services:

  • Export to Word provides conversion from an HTML document to a .docx file via Restful API.

  • Import from Word allows importing .docx file and converting it into a styled HTML document with comments and suggestions attached.

The API is available on http://localhost:[port] (by default the port is 8080).

The REST API documentation is available at http://localhost:[port]/v2/convert/docs. Alternatively you can check specification in our public resources for Import from Word and the Export to Word plugins.

If you have the authorization for the API enabled, you should provide an authorization token. More instructions you can find in the authorization section.

SSL Communication

Its possible to communicate with Import from Word and Export to Word On-Premises using secure connections. To achieve this, the load balancer like NGINX or HAProxy needs to be setup with your SSL certificate.

HAProxy and NGINX configuration examples below.

HAProxy example

Here is a basic HAProxy configuration:

global
  daemon
  maxconn 256
  tune.ssl.default-dh-param 2048

defaults
  mode http
  timeout connect 5000ms
  timeout client 50000ms
  timeout server 50000ms

frontend http-in
  bind *:80
  bind *:443 ssl crt /etc/ssl/your_certificate.pem
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  redirect scheme https if !{ ssl_fc }

  default_backend servers

backend servers
  server server1 127.0.0.1:8000 maxconn 32

NGINX example

Here is a basic NGINX configuration:

events {
    worker_connections  1024;
}

http {
  server {
    server_name your.domain.name;

    listen 443;
    ssl on;
    ssl_certificate /etc/ssl/your_cert.crt;
    ssl_certificate_key /etc/ssl/your_cert_key.key;

    location / {
      proxy_pass http://127.0.0.1:8000;

      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "Upgrade";
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_http_version 1.1;
    }
  }
}

Logs

The logs from Import from Word and Export to Word On-Premises are written to stdout and stderr. Most of them are formatted in JSON. They can be used for monitoring or debugging purposes. In production environments, It is recommend storing the logs to files or using a distributed logging system (like ELK or CloudWatch).

Monitoring Import from Word and Export to Word with logs

To get more insight into how the Import from Word and Export to Word On-Premises is performing, logs can be used for monitoring. To enable these, add the ENABLE_METRIC_LOGS=true environment variable.

Log structure

The log structure contains the following information:

  • handler: A unified identifier of action. Use this field to identify calls.

  • traceId: A unique RPC call ID.

  • tags: A semicolon-separated list of tags. Use this field to filter metrics logs.

  • data: An object containing additional information. It might vary between different transports.

  • data.duration: The request duration in milliseconds.

  • data.transport: The type of the request transport. It could be http or ws (websocket).

  • data.status: The request status. It can be equal to success, fail, warning.

  • data.statusCode: The response status in HTTP status code standard.

Additionally, for the HTTP transport, the following information is included:

  • data.url: The URL path.

  • data.method: The request method.

In case of an error, data.status will be equal to failed and data.message will contain the error message.

An example log for HTTP transport:

{
  "level": 30,
  "time": "2021-03-09T11:15:09.154Z",
  "msg": "Request summary",
  "handler": "postConvert",
  "traceId": "85f13d92-57df-4b3b-98bb-0ca41a5ae601",
  "data": {
    "duration": 752,
    "transport": "http",
    "statusCode": 200,
    "status": "success",
    "url": "/v2/convert/html-docx",
    "method": "POST"
  },
  "tags": "metrics"
}

Docker

The docker has built-in logging mechanisms that capture logs from the output of the containers. The default logging driver writes the logs to files.

When using this driver, use the docker logs command to show logs from a container. Use the -f flag to view logs in real time. Refer to the official Docker documentation for more information about the logs command.

When a container is running for a long period of time, the logs can take up a lot of space. To avoid this problem, you should make sure that the log rotation is enabled. This can be set with the max-size option.

Distributed logging

If running more than one instance of Import from Word and Export to Word On-Premises, It is recommend using a distributed logging system. It allows for viewing and analyzing logs from all instances in one place.

AWS CloudWatch and other cloud solutions

If running Import from Word and Export to Word On-Premises in the cloud, the simplest and recommended way is to use a service that is available at the selected provider.

Here are some of the available services:

To use CloudWatch with AWS ECS, a log group must be created before, and the log driver must be changed to awslogs. When the log driver is configured properly, logs will be streamed directly to CloudWatch.

The logConfiguration may look similar to this:

"logConfiguration": {
  "logDriver": "awslogs",
  "options": {
    "awslogs-region": "us-west-2",
    "awslogs-group": "tinysource",
    "awslogs-stream-prefix": "tiny-docx-converter-logs"
  }
}

Refer to the Using the awslogs Log Driver article for more information.

On-Premises solutions

If using a specific infrastructure such as your own or for some reason cannot use the service offered by a provider, some on-premises distributed logging system can be used.

There are a lot of solutions available, including:

This is a stack built on top of Elasticsearch, Logstash and Kibana. In this configuration, Elasticsearch stores logs, Filebeat reads logs from Docker and sends them to Elasticsearch, and Kibana is used to view them. Logstash is not necessary because logs are already structured.

It uses a dedicated Docker log driver to send the logs. It has a built-in frontend, but can also be integrated with Elasticsearch and Kibana for better filtering.

It uses a dedicated Docker log driver to send the logs. It has a built-in frontend and needs Elasticsearch to store the logs as well as a MongoDB database to store the configuration.

Example configuration

The example configuration uses Fluentd, Elasticsearch and Kibana to capture logs from Docker.

Before running Import from Word and Export to Word On-Premises, prepare the logging services. For the purposes of this example, Docker Compose is used. Create the fluentd, elasticsearch and kibana services inside the docker-compose.yml file:

version: '3.7'
services:
  fluentd:
    build: ./fluentd
    volumes:
      - ./fluentd/fluent.conf:/fluentd/etc/fluent.conf
    ports:
      - "24224:24224"
      - "24224:24224/udp"

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.5
    expose:
      - 9200
    ports:
      - "9200:9200"

  kibana:
    image: docker.elastic.co/kibana/kibana:6.8.5
    environment:
      ELASTICSEARCH_HOSTS: "http://elasticsearch:9200"
    ports:
      - "5601:5601"

To integrate Fluentd with Elasticsearch, you first need to install fluent-plugin-elasticsearch in the Fluentd image. To do this, create a fluentd/Dockerfile with the following content:

FROM fluent/fluentd:v1.10-1

USER root

RUN apk add --no-cache --update build-base ruby-dev \
  && gem install fluent-plugin-elasticsearch \
  && gem sources --clear-all

Next, configure the input server and connection to Elasticsearch in the fluentd/fluent.conf file:

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>
<match *.**>
  @type copy
  <store>
    @type elasticsearch
    host elasticsearch
    port 9200
    logstash_format true
    logstash_prefix fluentd
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 1s
  </store>
  <store>
    @type stdout
  </store>
</match>

The services are now ready to run:

docker-compose up --build

When the services are ready, start the Import from Word and Export to Word On-Premises.

docker run --init -p 8080:8080 \
--log-driver=fluentd \
--log-opt fluentd-address=[Fluentd address]:24224 \
[Your config here] \
registry.containers.tiny.cloud/docx-converter-tiny:[version]
  • Open Kibana in your browser.

  • During the first run, you may be asked about creating an index.

  • Use the fluentd-* pattern and press the “Create” button.

  • After this step, the logs should appear in the “Discover” tab.