Pipelines

In this guide we will discover in detail what a pipeline is in Visual PDF API, all the possibilities that this mechanism offers, with its different options and combinations of tasks.

We will also see the functional limits that must be kept in mind so that you can create your own solution without unpleasant surprises.

What is a pipeline in Visual PDF API

A pipeline is the modeling of an action that will be requested from the Visual PDF API: which files are to be processed, in what way, and what is expected in return.

It is composed as follows:

  1. files: these are the input files, which must be sent to the API so that they can be transformed in any way
  2. tasks: these are the different treatments that will be applied to the input files, with their different specific options
  3. options: these are the parameters which will make it possible to modify the behavior of a query, and to personalize the result it must return (names of output files, saving in a ZIP file, etc.)

It is called "pipeline" because it will allow several tools to be combined into a single request. These tools will be called one after the other, in the order defined in the description of the pipeline.

Thus, it will not be possible for you to combine just any tools: each tool expects a certain type of data as input, and produces a certain type of data as output, which may not be compatible with the next tool in the pipeline.

Creating a pipeline

Pipelines must be sent to the API with a multipart/form-data encoding type. In JavaScript, one can use the FormData object to create a pipeline with this format, but it is only one way among others.

Adding input files

The first thing to do to define a pipeline is to add files that will be processed.

    
      // Creating the form data for the pipeline
      const form = new FormData();

      // Adding PDF files (instances of the JavaScript File class)
      form.append('file-1', myFirstFile);
      form.append('file-2', mySecondFile);
    
  

Remember here that the order in which the files are added is important. It is in this order that the files will be processed in the tasks, which can, for some of them, have an impact on the final result.

If some of your input PDF are password protected, you will need to specify these passwords in the pipeline options field, as described in the "options" part a little further down in this guide.

Here you can only upload PDF and images files, as these are the only formats accepted by our tools. If the format of these input files does not match with the first task requirements, an error will be thrown.

Adding PDF processing tasks

Now that we have our files that need to be processed, we need to describe what processing should be applied to these files.

This is done through the tasks field of a pipeline, where we will list all the treatments that will be applied, one after the other.

Here too, the order of the tasks is critical, since it is in this order that the tasks will be called: the result of the previous task will serve as input data for the next task, and so on (for the first task , these are obviously the initial files which will be used).

Let's try with a pipeline that would

  1. merge the 2 input files we added in the previous step
  2. add a custom watermark to all the pages
  3. protects the output file with a password
    
      // Adding tasks description to the form data
      form.append(
        'tasks',
        [
          // Merging the 2 input PDF documents
          {
            tool: 'merge',
          },
          // Adding a watermark to all pages
          {
            tool: 'watermark',
            options: {
              text: 'confidential',
              color: '#ff0000',
              position: 'center',
              rotation: -45,
              opacity: 0.5,
              'bouding-box': [100w, 100w],
            },
          },
          // Protecting the file with a password
          {
            tool: 'add-password',
            options: {
              password: 'my_sup3r_s3cr3t_pwd',
            }
          }
        ],
      );
    
  

Setting pipeline options (optional)

Download format

2 download formats are available to you, depending on your needs:

  • file: this is the default behavior. In this configuration, each file will be available for download individually. This means you don't have to manage the management of archive files (zip) on your side. This also allows, for example in the case where the output files do not necessarily have to be downloaded at the same time, or by 2 different entities, that each file can be downloaded independently. On the other hand, this may represent a total file size greater than the archive format, and therefore a longer download time as well.
            
              form.append('options', {
                'format': 'file', // Not necessary as it is the default format
                // ...
              });
            
          
  • archive: in this configuration, all files output from the pipeline will be grouped into a single archive file (zip). This will allow you to download all of a pipeline's output files in a single request, which is particularly useful when you have a lot of files. This option also allows for faster downloading because an archive file is generally lighter than all the files it contains.
            
              form.append('options', {
                'format': 'archive',
                // ...
              });
            
          

Output file names

The API allows you to customize the name of output files, either manually or using shortcuts.

If you do not specify this option, a default name will be assigned to the output files.

Vous avez ici le choix entre 2 options:

  • file-names: this setting allows you to name the output files individually, based on the index of the output files of the last task (it is therefore important to calculate the order of the output files yourself, according to all the tasks applied and their options, to properly assign the right name to the right one file)
            
              // Example in the case of a pipeline with 2 output files
              form.append('options', {
                'file-names': ['corporate_copy', 'customer_copy_{date}'],
                // ...
              });
            
          
  • file-names-pattern: This setting allows you to define a pattern that will be used for all output files. If the indicated pattern does not allow having all different file names, a {nb} prefix will be automatically applied.
            
              form.append('options', {
                'file-names-pattern': '{nb}_invoice_{date}',
                // ...
              });
            
          

Archive name

If you have configured the format parameter to archive, you can specify an archive file name using the archive-name parameter:

    
      form.append('options', {
        'archive-name': 'invoices_{date}',
        // ...
      });
    
  

Shortcuts

As shown in the previous examples, you can use shortcuts, allowing you to include different information in the name of your files:

  • {nb}: the index of the file, according to the result of the last task
  • {date}: the current date when the file was generated by the API
  • {tool}: the name of the last tool of the pipeline

Wait mode

The mode option allows you to choose between 2 waiting modes, and allows you to define when the query returns a result:

  • wait: this is the default mode. In this mode, the API will wait until the pipeline is completely finished to return a result to you. The result will thus contain information about the pipeline (execution duration, etc.) as well as links allowing you to download the output files.
            
              form.append('options', {
                'mode': 'wait', // Not necessary as it is the default mode
                // ...
              });
    
              const pipelineResponse = await fetch('https://api.visualpdf.com/v1/process', { /* ... */ })
              const pipelineResult = await reponse.json();
    
              // pipelineResult:
              // {
              //  pipelineid: '6ecd8c99-4036-403d-bf84-cf8400f67836',
              //  duration: 823,
              //  files: [
              //    ...
              //  ],
              //  ...
              // }
            
          
  • async: In this mode, the API will return a result to you as soon as the pipeline has started. The result will contain information about the pipeline, which will then allow you to make the link once the pipeline is completed. In this mode, it will therefore be necessary to specify a webhook URL (webhook-url option), which will be called by the Visual PDF API once the pipeline is completed (as a POST request), to transmit the result of the pipeline.
            
              form.append('options', {
                'mode': 'async',
                'webhook-url': 'https://your-domain.com/visual-pdf-webhook',
                // ...
              });
    
              const pipelineResponse = await fetch('https://api.visualpdf.com/v1/process', { /* ... */ })
              const pipelineResult = await reponse.json();
    
              // pipelineResult:
              // {
              //  pipelineid: '6ecd8c99-4036-403d-bf84-cf8400f67836',
              //  webhook-url: 'https://your-domain.com/visual-pdf-webhook'
              //  ...
              // }
            
          
    Then, in your webhook handler on your server:
            
              // Called once a Visual PDF pipeline is completed
              handlePipelineCompleted(req) {
                const pipelineResult = req.body
    
                // pipelineResult:
                // {
                //  taskid: '6ecd8c99-4036-403d-bf84-cf8400f67836',
                //  duration: 901,
                //  files: [
                //    ...
                //  ],
                //  ...
                // }
    
                // Do whatever you need
                // ...
              }
            
          

Input file passwords

Some of your input PDF files may have a password protection. In this case, it is necessary to pass these passwords to the API so that your files can be processed.

The passwords parameter allows you to do this. This is a key/value object, whose keys are the file names as given in the multipart/form-data content fields, and whose values are the associated passwords:

    
      form.append('options', {
        'passwords': {
          'file-1': 'my_sup3r_s3cr3t_pwd',
          'file-2': 'my_0th3r_s3cr3t_pwd',
        },
        // ...
      });
    
  

Executing the pipeline and downloading the output files

These actions are described in the API workflow guide.

Pipeline limits

Whether for reasons of consistency, limitation of abuse or technical constraints, certain limits must be respected when using the API.

If any of these limits are violated, the pipeline will fail and an error will be returned.

Input files

Each pipeline is limited to

  • 100 input files
  • 500MB of total input file sizes (some tools allow a lower maximum size)

Output files

Each pipeline is limited to

  • 100 output files (beyond that, it will be necessary to configure the pipeline to produce an archive)
  • 500MB of total output file sizes (in the case where the files are saved in an archive file, this limit applies to the resulting non-archived files)

Number of tasks

You cannot combine more than 20 tasks in a single pipeline.

Tool combinations

Not all tools are compatible with each other, and not all of them can be used at any time:

  • Some do not produce a file format compatible with others (for example the "PDF to images" tool produces image files, which will not be accepted by the "Merge PDF" tool)
  • Some are to be used at the end of the pipeline only, such as the compression tool or the password addition tool

These constraints are detailed in the respective tool guides.