We can never say it enough: the InDesign Markup Language format is a little gem for all InDesign “bodywork” experts. Its only flaw: it is based on a slightly deviating dialect of the standard XML language!

One of IDML's quirks is its encoding of special InDesign characters located in the range 0x03-0x19. Adobe IDML files then contain XML Processing Instructions (PIs) of the form <?ACE n?> for any special character with code points lower than 32. Those “ACE codes” are proprietary, undocumented Adobe PIs used in IDML/IDMS to represent control characters that are strictly prohibited in the XML specification:

InDesign Character IDMS/IDML Description
U+0003 <?ACE 3?> END NESTED STYLE
U+0004 <?ACE 4?> FOOTNOTE
U+0007 <?ACE 7?> INDENT HERE TAB
U+0008 <?ACE 8?> RIGHT INDENT TAB
U+0018 <?ACE 18?> AUTO PAGE NUMBER
U+0019 <?ACE 19?> SECTION MARKER

Among these characters, the ones that IndexMatic users might encounter are RIGHT INDENT TAB, INDENT HERE TAB, and possibly END NESTED STYLE. Indeed, the separator or delimiter formatting options allow such elements. It is in such a situation that previous versions of iX³ risked falling into trouble!

Technical Note

The core issue is that E4X (ECMAScript for XML) in ExtendScript does not easily manage dynamic PI generation within XML objects. By default, PIs are indeed treated as document-level metadata and are not preserved in the XML object structure during parsing. When XML strings containing PIs are parsed into XML objects, the PIs are typically stripped or lost during the process.

For dynamic generation of IDML files with ACE PIs, the most reliable approach is likely to bypass XML objects entirely and generate the IDML content as raw strings. However, it is still possible to set XML.ignoreProcessingInstructions to FALSE in order to embed PIs during XML treatment.

Various headaches can then arise when working with the usual methods of XML instances. In particular, the XML.ignoreWhitespace boolean option can significantly disrupt certain procedures. Typically, to prepare IDML content via XML, one needs to disable XML.ignoreWhitespace — otherwise leading and trailing whitespace will be lost. But then, several XML conversion routines cause newlines to appear at the PIs insertion points!

The only solution that seemed solid enough to me is to explicitly form the <Content> element using the XML constructor on a string cleaned for this purpose:

XML.setSettings
({
  ignoreProcessingInstructions:  false, // required for ACE
  ignoreWhitespace:              false, // keep leading/trailing sp.
  prettyPrinting:                false,
});
 
var s = " foo\x08bar\t"; // <SPACE>foo<RIGHT INDENT TAB>bar<TAB>
 
var t = s.replace('\x08','<?ACE 8?>');  // IDML encoding
var x = XML( "<Content>" + t + "</Content>" );
 
alert( x.toXMLString() ); // `<Content> foo<?ACE 8?>bar\t</Content>`
 

Note. — Techniques based on <Content/>.prependChild(...) — or similar approaches that assume a well-formed XML argument — have caused a variety of chain reaction problems.

Based on this principle, we can then design a more generic (and more secure) function ensuring the conversion of any string into a <Content> element ready to be injected into your XML flow within the current <CharacterStyleRange> node:

function idmlContent(/*str*/rawString,  re1,rp1,re2,t)
//----------------------------------
// Sanitizes rawString for IDML content and return the
// XML element `<Content>...</Content>`.
// Assumes XML.ignoreProcessingInstructions===false
// => XML
{
  // XML escape: < > &
  re1 = callee.RE_ESC || (callee.RE_ESC=/[<&>]/g);
  rp1 = callee.RP_ESC
    || (callee.RP_ESC={'<':'&lt;','&':'&amp;','>':'&gt;'});
 
  // Control characters are illegal in XML except for 0x9, 0xA, 0xD.
  // IDML then uses the PI scheme `<?ACE hex ?>`
  re2 = callee.RE_ACES
    || (callee.RE_ACES=/[\u0001-\u0008\u000B\u000C\u000E-\u001F]/g);
 
  t = rawString
  .replace(re1, function(c){ return rp1[c]} )
  .replace(re2, function(c)
  {
    return '<?ACE ' + c.charCodeAt(0).toString(16) + '?>';
  });
 
  return XML("<Content>" + t + "</Content>");
}
 
// Sample usage
XML.setSettings
({
  ignoreProcessingInstructions:  false,
  ignoreWhitespace:              false,
  prettyPrinting:                false,
});
 
var s = " foo\x08bar\t";
var x = idmlContent(s); // Valid XML <Content>
 
// etc
 

See also: InDesign Special Characters