Home  /  Resources  /  Blog  /  Document Conversion With the IBM Bluemix API and Watson

Document Conversion With the IBM Bluemix API and Watson

Note: IBM Bluemix is now IBM Cloud

Many of you have seen one of the IBM Watson ads on TV and in print and been impressed by the powerful capabilities that IBM Watson is offering.

If you’re a RPG developer then you’ve likely wished you could leverage one or more of the IBM Watson services from within a RPG program.

We have good news for you.

RPG API Express takes your wish and makes it a reality. RPG API Express even makes the process easy along the way! Using a minimal amount of RPG code, RPG API Express allows your RPG projects to participate in the growing ecosystem of IBM Watson. (As well as a growing number of cloud based web services from other companies.)

In this post, I’m going to detail just how simple it is to use IBM Watson’s Document Conversion API to convert a PDF document into a HTML document. Should you decide to follow along by implementing this code yourself, you’ll need a Bluemix account — Bluemix is IBM’s cloud platform for services, infrastructure and more. You will need to have RPG API Express installed as well, so if you’re not already a RPG API Express customer, get in touch with us to try it free for 30 days!

Sign up for a free Bluemix account here.

Getting set up

Once you’re logged into an active Bluemix account, you can create your Document Conversion service by selecting Watson from the catalog of available services, and then further select the Document Conversion service. After your service is created, you can retrieve the credentials needed to access it:

Example of IBM Bluemix API Response in JSON

With the service credentials in-hand, we’re ready to step through the RPG API Express code for calling the service. To make use of this code, you’ll need a current version of the RPG API Express software, preferably version 3.4.x or higher.

Developing the RPG API Express code was accomplished by closely examining the outstanding documentation and demos at mybluemix.net.

Document conversion steps

The basic steps to be performed by the RPG API Express code are as follows:

  • Construct the required content disposition boundaries and within them the configuration data specified by the IBM Watson API reference.
  • Convert the assembled content to ASCII.
  • Retrieve the binary (ASCII) contents of the PDF to be transmitted for conversion and append it to the content already assembled.
  • Transmit the content (without conversion since it is already in ASCII) to IBM Watson while specifying the credentials needed to authenticate to the IBM Watson service.
  • Process the returned HTML content as your project dictates!

As part of the step to transmit the data to IBM Watson, you can also generate a log file for the actual transmission to IBM Watson, which can be very helpful in troubleshooting any errors that cause the transmission to fail. The log contains the full formatted request data that was transmitted, as well as the complete response sent back by IBM Watson. So if you have any problems implementing the code, be sure to specify a log file and refer to it when troubleshooting.

Below is an example returned response after sending a PDF with RPG API Express:

<?xml version='1.0' encoding'UTF-8' standalone='yes'?>
<html>
   <head>
       <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
       <meta content="t61jh19" name="author"/>
       <meta content="2016-11-01" name="publicationdate"/>
       <title>no title</title>
   </head>
   <body>
       <p>This is awesome!</p>
   </body>
</html>

Just scroll to the bottom of the page for the actual RPG API Express code used to achieve this result. Note that the Document conversion API supports other conversions besides PDF to HTML, so check the documentation if you are interested in converting other types of documents.

We're here to help!

Do you have a specific project that might be a good fit for IBM Watson, but you’re unsure how to tackle it in your RPG environment? We are here to help you create a solution for your business need. In many cases, we can even assist with a free proof of concept that will quickly demonstrate the power and flexibility RPG API Express can bring to your organization.

Contact us if you have specific questions about RPG API Express – we are always available to help!

H DFTACTGRP(*NO) BNDDIR('RXSBND') ACTGRP(*CALLER)

/copy QRPGLECPY,RXSCB

D gConvertCcsidDS...
D DS Likeds(RXS_ConvertCcsidDS_t)
D gGetStmfDS DS LikeDS(RXS_GetStmfDS_t)
D gTransmitDS DS LikeDS(RXS_TransmitDS_t)

D gData S A Len(1000000) Varying(4)
D gRequest S A Len(1000000) Varying(4)
D gResponse S A Len(1000000) Varying(4)
D gBoundary S 1024A Varying(4)

/free

*INLR = *On;

// Setup boundary and required content data within boundaries

gBoundary = '--9e50a2359f993974';

gData = gBoundary + RXS_CRLF;
gData += 'Content-Disposition: form-data; name="config"' + RXS_CRLF;
gData += RXS_CRLF;
gData += '{"conversion_target":"normalized_html"}' + RXS_CRLF;
gData += gBoundary + RXS_CRLF;
gData += 'Content-Disposition: form-data; name="file";';
gData += ' filename="test.pdf"' + RXS_CRLF;
gData += 'Content-Type: application/pdf' + RXS_CRLF;
gData += RXS_CRLF;

// Convert the content data to ASCII/ISO-8859-1

RXS_ResetDS( gConvertCcsidDS : RXS_DS_TYPE_CONVERTCCSID );
gConvertCcsidDS.To = RXS_CCSID_ISO88591;
gRequest = RXS_Convert( gData : gConvertCcsidDS );

// Add the binary content (ASCII) of the PDF file to be transmitted

RXS_ResetDS( gGetStmfDS : RXS_DS_TYPE_GETSTMF );
gGetStmfDS.Stmf = '/tmp/test.pdf';
gGetStmfDS.ToCcsid = RXS_CCSID_NO_CONVERSION;
gRequest += RXS_GetStmf( gGetStmfDS );

// Add boundary to mark the close of the request

gData = RXS_CRLF + gBoundary + '--' + RXS_CRLF;

// Convert the closing boundary to ASCII/ISO-8859-1

RXS_ResetDS( gConvertCcsidDS : RXS_DS_TYPE_CONVERTCCSID );
gConvertCcsidDS.To = RXS_CCSID_ISO88591;
gRequest += RXS_Convert( gData : gConvertCcsidDS );

// Transmit the request to Watson's document conversion endpoint

RXS_ResetDS( GTransmitDS : RXS_DS_TYPE_TRANSMIT );
gTransmitDS.URI = 'https://gateway.watsonplatform.net/'
+ 'document-conversion/api/v1/'
+ 'convert_document?version=2015-12-15';
gTransmitDS.BasicAuthUser = 'bd979757-2108-484e-b0d2-5d5f2c98ccee';
gTransmitDS.BasicAuthPassword = '1UmnGTB5pxhB';
gTransmitDS.RequestCcsid = RXS_CCSID_NO_CONVERSION; // Content already ASCII
gTransmitDS.HTTPMethod = RXS_HTTP_METHOD_POST;
gTransmitDS.HeaderContentType = 'multipart/form-data; boundary="'
+ %subst( gBoundary : 3 ) + '"';
gTransmitDS.LogFile = '/tmp/rxs_transmit_log.txt';
gResponse = RXS_Transmit( gRequest : gTransmitDS );

// gResponse should now contain the HTML representation of the text within the PDF !!
// Thank you Watson !!