Clean ASCII Uneval

String.prototype.toSource(?quoteChar) -> string

Returns a safe, quote-enclosed ASCII string in the form "..." (default formatting) such that eval(str.toSource())===str. You can set the quoteChar argument to a single quote character ("'" aka "\x27") to get a '...' enclosing instead. This function overrides the original ExtendScript method in order to yield shorter outputs:

Input String ExtendScript Output IdExtenso Output
"abc" (new String("abc")) "abc"
"\\\r\n\t\v\f\0" (new String("\\\r\n\t\x0B\f\x00")) "\\\r\n\t\v\f\0"
"àbçdé" (new String("\u00E0b\u00E7d\u00E9")) "\xE0b\xE7d\xE9"

Tested on a JPEG file owning 23,052 bytes, IdExtenso's toSource() returns 60,433 characters while the native method needs 89,653 characters. You save almost 30K!


Trimming, Truncating, Padding

String.prototype.[trim|ltrim|rtrim]() -> string

These three very popular methods, missing from ExtendScript, allow you to remove spaces at the ends of a string. ltrim() left-trims the string, rtrim() right-trims the string, and trim() applies both left- and right-trimming.

Note. - All space characters available in InDesign and Unicode are targeted, including U+205F MEDIUM MATHEMATICAL SPACE.

  var s = " \t\u2000 Hello World \xA0\u2028";
 
  alert(  s.trim().toSource() ); // => "Hello World"
  alert( s.ltrim().toSource() ); // => "Hello World \xA0\u2028"
  alert( s.rtrim().toSource() ); // => " \t\u2000 Hello World"
 

String.prototype.stripSpaces() -> string

Strips all space characters from a string.

  var s = "\tHello World !\u2000";
 
  alert( s.stripSpaces().toSource() ); // => "HelloWorld!"
 

String.prototype.[trunc|ltrunc|rtrunc](size, ?ellip, ?wb) -> string

These methods removes either the MIDDLE (trunc), LEFT (ltrunc), or RIGHT (rtrunc) part of a string according to the maximum size parameter (uint).

   — If the string is already shorter than size, it is returned as is. Otherwise, at most size characters are kept.

   — By default, ellip (ellipsis) is set to three dots (...) but you can specificy here any custom string as 2nd argument.

   — The wb argument (boolean, optional) tells whether the result must preserve word boundaries.

  var s = "And this Fyodor Pavlovich began to exploit; that is, he fobbed him off with small sums.";
  var t = s.rtrunc(25, "…", true); // Detect word boundaries
 
  alert( t ); // => `And this Fyodor…`
 

String.prototype.[rpad|lpad](size, ?padChar) -> string

Extends the RIGHT (rpad) or LEFT (lpad) of the string using a padding character (space by default) until the length reaches size.

  alert( "abc".rpad(5).toSource() );      // => "abc  "
  alert( "abc".lpad(5, '_').toSource() ); // => "__abc"
 

Code Point Manager

String.fromCodePoint(array) -> string

This static method implements ECMAScript's String.fromCodePoint function. Pass in either a simple array of code points (numbers in 0..0x10FFFF), or a list of code points (arguments). The function returns the UTF16-encoded string.

  var s = String.fromCodePoint([0x61, 0x28FF0, 0x62]);
 
  alert( s.toSource() ); // => "a\uD863\uDFF0b"
 

String.prototype.codePointAt(position) -> number

Implements ECMAScript's String.prototype.codePointAt function, which returns the code point (0..0x10FFFF) found at the supplied position (uint). In addition, the function's SIZE property is set to the number of consumed code units (0:None ; 1:RegularCharCode ; 2:Surrogate.)

  alert( "012".codePointAt(1) );            // => 0x31
  alert( "a\uD863\uDFF0b".codePointAt(1) ); // => 0x28FF0
  alert( "a\uD863\uDFF0b".codePointAt(2) ); // => 0xDFF0 ; in surrog.
 

UTF8 Converter

String.fromUTF8(string-or-array) -> string

Given a sequence of valid UTF8 codes (string or array), rebuilds and returns the original UTF16 string.

  var s = String.fromUTF8("\xC3\x80\xC3\x89\xC3\x94");
  // or: String.fromUTF8([0xC3, 0x80, 0xC3, 0x89, 0xC3, 0x94]);
 
  alert( s );            // => `ÀÉÔ`
  alert( s.toSource() ); // => "\xC0\xC9\xD4"
 

String.prototype.toUTF8() -> string

Converts this string (assumed in native UTF16) into UTF8. The result is then formed of characters whose codes are all <= 0xFF. Keep in mind that the output string is in a “transport format” for encoding purpose—it shouldn't be displayed as such!

  var utf8 = "ÀÉÔ".toUTF8();
 
  alert( utf8.toSource() ); // => "\xC3\x80\xC3\x89\xC3\x94"
 

Base64 Decoder/Encoder

String.fromBase64(string-or-array, ?AS_BYTES) -> string

Given a sequence of valid Base64 codes (string or array), reconstructs and outputs the original (JavaScript) string. By default, the outcoming bytes are considered UTF8 units and then converted into UTF16. If the boolean flag AS_BYTES is set, the function returns the bytes without processing UTF8-to-UTF16 conversion.

Note. - B64 codes are ASCII characters in the set A-Za-z0-9+/=.

  var s = String.fromBase64("SW5kaXNjcmlwdHM=");
 
  alert( s ); // => `Indiscripts`
 

String.prototype.toBase64(?AS_BYTES) -> string

Convert this string into Base64 code. The result is always a string formed of B64 characters. By default, the this string is regarded as a full UTF16 string, so it is converted into UTF8 bytes and then passed to the B64 converter. If AS_BYTES is set, the method bypasses UTF16-to-UTF8 conversion and treats each incoming character as a byte. (Thus, if the string contains units greater than 0xFF, only the 8 lowest bits are kept.)

  var b64 = "Indiscripts".toBase64();
 
  alert( b64 ); // => `SW5kaXNjcmlwdHM=`
 

ExtendScript Patches

String.prototype.indexOf(search, ?pos) -> integer

In older ExtendScript versions, str.indexOf(search) might not work when str contains U+0000 before the match and search has more than one character. This bug is solved in IdExtenso.

  alert( "\0\0ABC\0XX".indexOf("ABC") );   // =>  2 (all versions)
  alert( "\0\0ABC\0XX".indexOf("ABC",3) ); // => -1 (all versions)
 

String.prototype.lastIndexOf(search, ?pos) -> integer

In CS4, str.lastIndexOf('\0') wrongly returns the length of the string! This bug is solved in IdExtenso.

  alert( "abcd".lastIndexOf('\0') ); // => -1 (all versions)
  alert( "\0\0".lastIndexOf('\0') ); // =>  1 (all versions)
 

String.prototype.split(separator, ?limit) -> array

Although split has been fixed in higher versions, the method fails in ExtendScript CS4 when U+0000 is involved at some point. It then yields weird results. This bug is solved in IdExtenso.

  alert( "aei\0abc\0\0xyz\0".split('\0') );
  // => ["aei", "abc", "", "xyz", ""]
 
  alert( "aei\0abc\0\0xyz".split(/[ab\x00]+/) );
  // => ["", "ei", "c", "xyz"]
 

String.prototype.charAt(pos) -> string (char)

In JavaScript, charAt can pick a U+0000 character, e.g. "x\0y".charAt(1) returns "\0". But in ExtendScript an empty string is returned whenever charAt should yield "\0". This issue is solved in IdExtenso.

  var c = "x\0y".charAt(1);
 
  alert( c.toSource() ); // => "\0"
 

Miscellaneous

String.random(len) -> string

This static method produces a random string of length len (default: 4) matching the pattern /[a-z][0-9a-z]*/. It is very useful for generating random IDs.

  alert( String.random() );    // =>  e.g `i1x4`
  alert( String.random(16) );  // =>  e.g `gj1duwcgsqk9t8fz`
 

String.levenDist(string1,string2) -> uint

Measures the difference between two strings string1 and string2 using the Levenshtein distance algorithm. The returned value is an unsigned integer.

  alert( String.levenDist("Indiscripts", "indiscripts") ); // => 1
  alert( String.levenDist("Adobe", "Acrobat") );           // => 4
  alert( String.levenDist("InDesign", "Photoshop") );      // => 8
 

Note. - A more sophisticated routine, String.levenFilter(...) is also provided, which builds an sub-array of strings based on a reference array, an incoming string and a maximum Levenshtein distance. See the code for further details.

String.prototype.charSet(?KEEP_ORDER) -> string

Returns (as a string) the set of all characters present in this string. By default, the returned string is UTF16 ordered, unless the KEEP_ORDER flag is true. This function is useful to determine the entire character set that your text data (story, document, etc.) actually requires.

  var s = "Hello_Wonderful_World!";
 
  alert( s.charSet() );     //  =>  `!HW_deflnoru`
  alert( s.charSet(true) ); //  =>  `Helo_Wndrfu!`
 

String.prototype.unaccent() -> string

Removes the accents of a string. This methods supports basic diacritics of Latin, Greek, Cyrillic and Hebrew alphabets.

Note. - Ligatures like œ or ij ARE NOT converted into digrams. A more advanced routine might be implemented for that purpose.

  alert( "ÀçĎéĩĵĶńőŕşūŵŷż".unaccent() ); // => `AcDeijKnorsuwyz`
  alert( "ΐΫάέή".unaccent() );           // => `ιΥαεη`
  alert( "ӝӟӥӫӵӛ".unaccent() );          // => `жзиөчә`
 

String.prototype.subReplace(what, repl, where, OUTSIDE) -> string

Replaces what (string or RegExp) by repl (string or function) inside or outside the substrings captured by where (RegExp). This method performs replacements only in specific areas determined by a regular expression:

   — if OUTSIDE is false or missing, replacements are processed in every substring captured by where (the outside is preserved.)

   — if OUTSIDE is true, replacements are processed out of the substrings captured by where (the inside is preserved.)

The 1st and 2nd parameters are defined as in String.prototype.replace() and have the same meaning and behavior. The regular expression where only delineates the scope of replacement. It may involve multiple substrings if the /g global flag is set; otherwise it will capture at most one matching substring.

  var src = "abc<def><ghi>-mno<stu>";
 
  var what = /[aeiou]/gi;
  var where = /<[^>]+>/g;
 
  // Replace vowels with # only in `<...>` areas
  var r = src.subReplace(what, '#', where);
  alert( r ); // => `abc<d#f><gh#>-mno<st#>`
 

String.prototype.asPath() | ...toPath(str) | ...relativePath(str)

These three methods handle POSIX paths based on the slash separator / and the conventional shortcuts .. (double dot) and . (dot). Although they are perfectly usable in your own code, they are primarily intended as internal IdExtenso routines.


• IdExtenso: github.com/indiscripts/IdExtenso
Implementation of the String extensions
Sample scripts (for newbies)