Let's start with a riddle. Say we want to replace any letter found at a word boundary by a Z. Super easy!

alert( "Hello World".replace(/\b\w/g, "Z") );
 
// -> Zello Zorld
 

The global regex /\b\w/g does the job because it only considers word characters (\w) that satisfy the “word boundary” assertion (\b).

Furthermore, we can use the placeholder $& to refer to the current match in the replace string. For example,

alert( "Hello World".replace(/\b\w/g, "$&_") );
 
// -> H_ello W_orld
 

Note. — If you are not familiar with the special replacement patterns of String.prototype.replace(), give a look at MDN's documentation.

As you can see, the H has been changed into H_ and the W has been changed into W_. But suppose we want a dot instead. Easy-peasy! Change the replacement pattern into "$&." and you're done.

This definitely works in pure JavaScript. However, here is how ExtendScript reacts:

/* in ExtendScript */
 
alert( "Hello World".replace(/\b\w/g, "$&.") );
 
// -> H.e.l.l.o. W.o.r.l.d.
 

Is this a bug or a feature? Well, for you to say! The only crucial question is why? Why does the pattern "$&_" work as expected, while "$&." takes into account every single letter?


The answer is quite unsettling.

By definition, the replace() method returns “a new string with some or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match. (…) The original string is left unchanged.”

So far so good. But what is the actual state of the processed string during replacements? In principle, it shouldn't change. Maybe the input string, "Hello World", is put in a buffer, then a pointer makes progress into it whenever a new match is found, and the output string is updated accordingly, and so on. No matter how this is done, the input and the output stream remain perfectly impervious.

This is not how ExtendScript works. Everything happens as if the input and the output stream were the same entity. In other words, whenever a replacement is done, the string under consideration is updated. It now contains the result of all replacements already processed.

For example, at the initial step the internal buffer is changed from "Hello World" into "H.ello World" (due to the very first replacement.) Then the internal pointer moves to the e and looks for the pattern /\b\w/. In the original string, this e wouldn't match because it is not positioned at a word boundary (its context is "Hello".) But in the current state of the buffer, this e is preceded by a dot (its context is "H.ello"). Thus it now satisfies the word boundary condition! So it is changed into "e.", and so on.

Note. — The dot (".") obviously creates word boundaries, which was not the case of the character "_".


We can probe our interpretation using a replacement function rather than a string. Basically, the function intercepts each match and returns the desired replacement string. Below is the functional equivalent of our previous example:

/* in ExtendScript */
 
var r = "Hello World".replace(/\b\w/g, function(match)
{
    return match + '.';
});
 
alert( r ); // H.e.l.l.o. W.o.r.l.d.
 

Same result. But using a function gives us additional parameters. In the absence of capturing parentheses, the 2nd argument passed to that function is “the offset of the matched substring within the whole string being examined”, and the 3rd argument is “the whole string being examined” itself.

So we can explore more deeply the replacement steps:

/* in ExtendScript */
 
var r = "Hello World".replace(/\b\w/g, function(match,ofs,str)
{
    alert( [match,ofs,str].join(' - ') );
    return match + '.';
});
 
/* Prompted messages: */
H - 0 - Hello World
e - 2 - H.ello World
l - 4 - H.e.llo World
l - 6 - H.e.l.lo World
o - 8 - H.e.l.l.o World
W - 11 - H.e.l.l.o. World
o - 13 - H.e.l.l.o. W.orld
r - 15 - H.e.l.l.o. W.o.rld
l - 17 - H.e.l.l.o. W.o.r.ld
d - 19 - H.e.l.l.o. W.o.r.l.d
 

The above test clearly shows how the string “being examined” evolves during the process. In pure JavaScript, the 3rd argument (str) wouldn't change and the messages would be

/* Prompted messages in pure JS: */
H - 0 - Hello World
W - 6 - Hello World
 

We can even go deeper and reveal at each step the RegExp.leftContext and RegExp.rightContext properties. This test is visually the most convincing:

var r = "Hello World".replace(/\b\w/g, function(match)
{
    alert( [RegExp.leftContext,RegExp.rightContext].join(' - ') );
    return match + '.';
});
 
/* Prompted messages (in ExtendScript) */
- ello World
H. - llo World
H.e. - lo World
H.e.l. - o World
H.e.l.l. -  World
H.e.l.l.o.  - orld
H.e.l.l.o. W. - rld
H.e.l.l.o. W.o. - ld
H.e.l.l.o. W.o.r. - d
H.e.l.l.o. W.o.r.l. - 
 

The facts we have highlighted above are undocumented, and probably unknown to most ExtendScript developers. The side effects of this particular mechanism are generally imperceptible, unless your RegExp relies on \b assertions.

On the other hand, it can be very useful to have live access to the output string—while being generated—from within the replacement function. Pure JavaScript doesn't provide such access, so this is a circumstance where the “bug” might become a “feature.”