Kato Integrations Logo Light
Home  /  Resources  /  Blog  /  Processing Unicode Natively With RPG API Express

Processing Unicode Natively With RPG API Express

Improved CCSID Handling on the IBM i

One of the design goals for our newest version of RPG API Express was to allow for smoother handling of character sets, also referred to as CCSID’s.  We’ve seen that many developers are communicating with systems using Unicode character sets such as UTF-8.  In many cases, there’s a need to be able to natively process Unicode character sets without conversion in order to maintain the integrity of the data being exchanged.

The first step to process an incoming request in its native Unicode character set is to configure the Apache instance receiving the request.  Apache needs to omit the normal conversion to EBCDIC because allowing a conversion to EBCDIC will definitely compromise the integrity of many Unicode characters. The following Apache directive, which specifies binary mode for incoming data, will accomplish this:

				
					CGIConvMode %%BINARY/EBCDIC%%
				
			

Now as we begin to do our RPG coding, we need to configure the call to RXS_GetStdIn() to leave the XML data retrieved from the request in its native character set:

				
					D gGetStdInDS    DS                  LikeDS(RXS_GetStdInDS_t)
D                                    Inz(*LIKEDS)

   clear gGetStdInDS;
   gGetStdInDS.CCSID = RXS_CCSID_NO_CONVERSION;
   gXml = RXS_GetStdIn( gGetStdInDS );
				
			

By specifying RXS_CCSID_NO_CONVERSION for the CCSID, the gXml variable will contain the request data in exactly the same character set as was sent originally by the client making the request.

So let’s assume a request is made using the UTF-8 character set – the most common character set used by web applications.  Even though UTF-8 was used for the incoming request, we will choose to use UTF-16 to hold the parsed value that is retrieved from the XML.  This is because RPG is capable of supporting UTF-16 natively, and for this reason it makes good sense to configure the parser to convert the data from UTF-8 into UTF-16 as the parsing is taking place.  The integrity of the data will be maintained and you can natively store the data in the UTF-16 character set:

				
					D                DS 
D gData                   1   1024A   Varying(4)
D UTF16                   1   1024C   Varying(4) CCSID(1200)

   clear gRootDomDs;
   gRootDomDs.InputCcsid = RXS_CCSID_UTF8;
   gRootDomDs.OutputCcsid = RXS_CCSID_UTF16;
   gRootDomDS = RXS_OpenDom( gXml : gRootDomDs );
   
   clear UTF16;
   gData = RXS_ParseDomToText( '/UnicodeData' : gRootDomDS );
   write RUTF16;
   RXS_CloseDom( gRootDomDS );
				
			

By coding a data structure in which gData and UTF16 both share the same memory space, when parsing the XML element <UnicodeData> the field named UTF16 will be populated with the UTF-16  character set data and is ready to be written to a file that was defined with a UTF-16  / CCSID 1200 field:

				
					A              R RUTF16
A                UTF16        1024G              CCSID(1200)
				
			

This post has intentionally not addressed all the technical aspects of CCSID handling when using multi-byte Unicode character sets in order to demonstrate how RPG API Express can natively support the original character set of the data being processed.  If you need to discuss any technical aspects in more detail, or if you have questions about using the sample code to meet your own specific requirements, contact our team here.

Table of Contents