1/ General Points
    Learning IndexMatic³
    IndexMatic³ vs. IndexMatic²
    Upgrade Discount Price
    Creative Suite Backwards Compatibility
    Using a Credit Card thru PayPal

2/ Scope & Search Options
    Autoextracting a Vocabulary
    Matches and Page Rank
    “Whole Word” and Regular Expressions
    Conditional Text
    Inner Spaces and Whole Words
    Automatic Mode vs. Query modes

3/ Basic Queries
    Regex Syntax: the / Prefix
    Uppercase Letters and Diacritics
    Term Rewriting / Subtopics / Cross-References
    Dealing with Plural Forms
    Queries and White Spaces
    Website Names, URLs
    Key Length / Grouping Alternatives
    Meaning of the "\w" Metacharacter
    Extracting XML Data
    Special Space Characters
    Generic Apostrophe
    Consolidating Terms into a Single Reference

4/ Proper Names
    Proper Names vs. Acronyms
    Reformatting a Name List
    Redundant Matches

5/ Advanced Queries
    Punctuation
    Stats on Letters
    Using the "$" Symbol
    Emojis
    Arabic Diacritics

6/ Output
    XML Output
    Bypass Sorting
    Multiple indexes
    Reporting not found entries

7/ Known Issues and Troubleshooting
    Preserving Text Formatting
    “Indirect” Character Styles
    Placed PDF
    “Black Mirror”
    Book Chapters are not Remembered!
    InDesign not responding


1/ General Points

Learning IndexMatic³

• IndexMatic³ sounds easy to play with, but not-so-easy to master. Is there a good step-by-step instruction on how to use advanced features?

Your central, unique, official learning place is the user's guide (PDF, 110 pages), which is provided in both English and French language. This documentation has been designed to guide you from the basics (getting started, purpose, UI) to the next level, including technical information on managing “scoped” documents, using the query interpreter, mastering regular expressions, shaping the output index with respect to your needs, and so on.

Download “IndexMatic³ Manual” (PDF, 110 p.)

Becoming an expert does not require considerable time, but essentially concentration. Read the manual calmly, gradually, without being distracted by notifications from your smartphone or social networks! The concepts will fall into place little by little, your tests and queries will become more fruitful, more sensible, more efficient. You will then carry out tasks that you did not even imagine possible at the start of your training.

→ Our YouTube channel also offers more targeted tutorials for French-speaking audience.

More concrete examples and case studies will be addressed in this FAQ!

IndexMatic³ Manual (PDF)

IndexMatic³ vs. IndexMatic²

• I am an experienced IndexMatic² fan. I have been using it for years to offer my clients (mainly publishers) an additional tailor-made indexing service. Can I reuse in iX³ the wordlists and queries I prepared in iX²? If so, why should I upgrade to the new version?

  1. All what you did with IndexMatic² can be done with IndexMatic³. The fundamental query syntax definitely remains, it has only been augmented with new schemes like “Generic Hyphen”, “Directives”, “Topic Formatters”, “Stop Word Flag”, etc. So, yes, iX² queries and query lists should still work fine: just make sure they do not contain special codes that are now parsed as syntactical operators. For an overview, see Topic Structure Cheatsheet, Metacharacters, and Query List: Directives.

  2. Indiscripts' policy has never been to push its customers into unnecessary or captive spending ;-) Our products are subscription-free and our licenses are perpetual (at least, as long as InDesign can run the underlying script.) So, if you love IndexMatic² and feel comfortable with it, if it still meets all your expectations in the field of indexing, there is no reason to change (and we will keep it in our catalogue — although without new updates). The reasons why we redesigned IndexMatic were summarized in the 2023 announcement. To put it in three words: speed, efficiency, ease of use.

Migration from IndexMatic² to IndexMatic³ (chart)

Upgrade Discount Price

• I've been using IndexMatic² Pro for years. Can I get the new version for less than the full price since I previously purchased an older version?

When upgrading to IndexMatic³ (Expert), IndexMatic² Pro users have access to a privileged discount which reduces the standard license price by €50—so they only pay the difference! This offer applies to single licenses as well as multi-seat licenses. The procedure is detailed in the IndexMatic Upgrade Page.

Basically, you need to find your (original) iX² “Download Link” in order to deduce your discount code. If you accidentally lost the URL, simply get in touch with us at support[at]indiscripts{dot}com and provide enough details (invoice number, company name, email address…) so that we can retrieve the original order.

IndexMatic Upgrade Page

Creative Suite Backwards Compatibility

• I refuse to pay a subscription to Adobe and I still work — fully satisfied! — with InDesign CS6. Does IndexMatic³ support this environment? (Any plan to make it available in Affinity Publisher too?)

1. IndexMatic³ has been rewritten from scratch taking into account features introduced in the latest versions of InDesign CC (i.e, versions 9.x to 18.x), but we made it our business to maintain backward compatibility up to 6.x. That is, the program still runs in InDesign CS6, CS5, and CS4, supporting both macOS and Windows platforms. This is a key advantage compared to competing solutions and we dearly wish to preserve it!

Note. - Of course, because some features (hidden blocks, endnotes, etc) are just missing in older InDesign versions, IndexMatic³ won't include them while running in such environments.

2. Although extensibility solutions are being explored at Serif Labs in 2023, it is difficult to predict how quickly they might mature. As of this writing, there is no solid prospect for us to consider porting IndexMatic³ to Affinity Publisher.

IndexMatic³ System Requirements

Using a Credit Card thru PayPal

• I attempted to make a payment by using my pro credit card (without logging into a PayPal account) but the system rejected the information and returned “the credit card cannot be used for this payment”.

It happens that PayPal rejects your credit card, or even does not offer you this payment option. Highly frustrating! And no one knows the comprehensive criteria for this rejection.

It seems that the country you are purchasing the product from is sometimes the culprit, which could cause the problem to appear when using a VPN. According to other sources, similar issues may arise if your credit card, or your email address, is linked to an active PayPal account which you're not logged in. Also, PayPal has limits on non-member credit card usage for security.

Whatever the reason, creating a PayPal account and adding your credit card to it as a funding source can obviously solve the problem. But we understand very well that this approach is unacceptable if you simply do not want to have a PP account!

Note. - We are actively seeking alternative payment solutions (provided they offer the same level of security as the PayPal interface and do not penalize the customer).

Anyway, if any difficulty arises during the transaction, do not hesitate to contact us directly at legal[at]indiscripts{dot}com to set up a direct payment option by bank transfer.

IndexMatic³ Order Form


2/ Scope & Search options

Autoextracting a Vocabulary

• I need to index an entire book but I have not preset any wordlist. Is there a way to automatically extract the most relevant vocab entries and then manually refine the set before processing IndexMatic³?

IndexMatic³ offers an “Automatic Mode” which can be used to detect and grab the most representative words from your document(s). First, open your book in InDesign and launch the script. Select the book chapters in the Target Documents list. In the Finder panel, set the Page Rank to 3 or 4. Also, select the appropriate language in the Stop Words list. Ctrl-Click the Hits button to open the Hits and Stats dialog, set all display fields to [None] except the “Entry Term” field. Click OK. This generates a wordlist that you can easily grab, cleanup and use as a starting point in Query List mode.

Customizing Hits and Stats ; Grab Hits Data

Matches and Page Rank

• Given a regular expression like /Steve|Bill/ which produces two terms—"Steve" and "Bill"—does the Page Rank apply on each term, or considering the total number of matches per page?

The Page Rank operates on each separate term that the query implicitly produces. Suppose that "Steve" occurs three times on page 10 and two times on page 11, while "Bill" occurs two times on page 10 and three times on page 11:

    On Page 10: … Steve … Bill … Steve … Bill … Steve.
    On Page 11: … Bill … Steve … Bill … Steve … Bill.

Then, the query:

    /Steve|Bill/3

sets the Page Rank to 3 and leads to the following output:

    Bill: 11
    Steve: 10

By contrast, if the query provides an explicit term:

    /Steve|Bill/3 => My Friends

then IndexMatic³ considers that each alternate match counts in the "My Friends" Page Rank. So the above query leads to:

    My Friends: 10-11

(Since the total match count is 5 on each page, the Page Rank is obviously satisfied.)

Page Rank

“Whole Word” and Regular Expressions

• Our document relies on a special markup syntax using the following format:
"...\index{the word to be indexed}..."
As we want to extract every marked word, we tried the query:
/\\index\{([^}]+)\}/ => $1, but this does not work.

Always consider the context where the matches appear. Your target format, \index{the word to be indexed}, seems to be directly embedded in the text—I mean, without any separation—so you probably have to disable the Whole Word option. Try one of the following:

(a) Uncheck the “Whole Word” checkbox in Finder > Options (so that the flag is globally disabled).

    OR

(b) Append the local flag W at the end of your pattern:
    /\\index\{([^}]+)\}/W => $1

Whole Word

Conditional Text

• Our annual report is written in several languages that we manage in a single InDesign document through conditional texts (rather than linguistic layers). Can IndexMatic³ target separately each language (i.e., each condition)?

Regarding conditional text, IndexMatic³ simply considers the contents wich is enabled at the moment you launch the program. So, just activate the condition that corresponds to a specific language and you get the related index.

Off-Page Content and Ignored Elements

Inner Spaces and Whole Words

• I want to find matches that contain white spaces, like "sold out" or "back pay". Do I need to uncheck the “Whole Word” option?

No. The purpose of the Whole Word option is to ensure that a match is not preceded or followed by a character that belongs to the current Alphabet. This does not preclude the presence of spaces or non-alphabetic characters within the search key.

For example, with the default settings (Whole Word active) the query back pay finds any occurrence of "back pay", but ignores a string such as: "rollback pay".

It is generally useful to disable Whole Word when a partial token can be found in several expressions that unambiguously refer to the same topic, for example: modern/W => modernity

Assumed that the substring "modern" can only occur in words that relate to modernity, the key:

    modern/W

is much more efficient than a complete pattern like:

    /modern(i(sm|ty|ze))?/

Whole Word

Automatic Mode vs. Query modes

• I was trying to generate an index based on a character style and I found out that IndexMatic³ only considers single words, but not compound expressions like “The Vice President” even if the complete expression (including the spaces) has an appropriate character style applied. Do I have a possibility to grab complete runs of character-styled characters?

You had probably invoked the Automatic Mode, which is not appropriate for your goal. In Automatic Mode, IndexMatic³ only grabs words—in the sense of “letter sequence in the current alphabet”. To get extended results, you have to select the Single Query (or Query List) mode in order to send your own command(s). Here are some usual queries when you are filtering content by character style:

To globally capture every character-style run, use:
    /.+/

To capture words-and/or-space strings only, use:
    /[\w ]+/

To capture words only, use:
    /\w+/

How to Create an Index from a Style


3/ Basic Queries

Regex Syntax: the / Prefix

• I'm trying to find a regex expression: everything inside curly brackets. My query is \{.+?\} and the script finds nothing. Where is my mistake?

Do not forget the / character at the beginning of a regex query (IndexMatic³ both supports literal and regular expressions).

Hence your complete query is: /\{.+?\}

Note. - You can optionally append a final slash, /\{.+?\}/, this is equivalent.

Introduction to Regular Expressions

Uppercase Letters and Diacritics

• I use the query /[A-Z]\w+/I to retrieve words that start with an uppercase letter. Unfortunately the class [A-Z] does not match uppercase letters with diacritics such as À or É. How to fix this?

Use \m instead of [A-Z]. The class [A-Z] only sees ASCII uppercase letters, while the metacharacter \m matches any uppercase letter of the current alphabet. (Similarly, \l matches any lowercase letter of the current alphabet.)

Metacharacters ; GREP-like Operators

Term Rewriting / Subtopics / Cross-References

• I would like to index words that do not actually appear on the pages (e.g. indexing "France" on pages that only talk about "Paris"). How to do?

Take advantage of the rewriting-operator (=>). Every occurence of "Paris" in the document can be rendered "France" in the index, using the following query:

   Paris => France

Of course you can both generate the topics "Paris" and "France" by these two queries:

   Paris
   Paris => France

The first line above indexes "Paris" as itself, while the second line creates alongside the topic "France" (from the same matches).

Another approach is to output "Paris" as a subtopic of "France":

   Paris => France > $0

The above query is more relevant in that it allows to manage other France-related subtopics, for example:

   Bordeaux => France > $0

A compact way to write together the two queries is then:

   /Paris|Bordeaux/ => France > $0

And the resulting index looks like:

   France
      Bordeaux: page numbers
      Paris: page numbers

In addition, if you want to get "Paris" as a first-level heading in your index, you can easily redirect the reader to the topic "France" by adding the following cross-reference (note the double slash at the beginning of the query):

   // Paris => France

Finally, our complete query list could be:

   // Bordeaux => France
   // Paris => France
   /Paris|Bordeaux/ => France > $0

Cross-References

• In a topic like "FRANCE" I would like to have a sub entry, e.g. "Paris", which does not contain any page number and only refers the reader to an external topic named "PARIS". How to?

The syntax for cross-references allows you to declare a “See” reference from any existing or non-existing topic or subtopic. Simply format the “fake term” as a formal subtopic:

    // FRANCE > Paris => PARIS

Now consider a query list having the following lines:

    ...
    FRANCE
    PARIS
    // FRANCE > Paris => PARIS
    ...

The final index will look like:

    …
    FRANCE: page numbers
        Paris: See PARIS
    …
    PARIS: page numbers
    …

Cross-References ; “See” and “See also” Markers

Dealing with Plural Forms

• Given a list of words in singular form, is it possible to capture both the singular and the plural forms?

1. IndexMatic³ is not able to deduce alone what is the plural form of a word, so it is necessary to specify the corresponding alternatives in the queries.

To index singular and plural forms — or other variants — under the same heading, the easier way is to extend the original words to regular expressions. Here is the most common example, including the letter ‘s’ at the end of the word:

    /cats?/ => cat

which can be also expressed:

    /(cat)s?/ => $1

So, when you have several items based on the same plural transformation, you can easily factorize the keys as follows:

    /(cat|dog|snake)s?/ => $1

Of course, you have to deal with a number of special cases:

    /stor(?:y|ies)/ => story
    /wom[ae]n/ => woman
    /person|people/ => person
    etc.

Note. - In Query List mode, activate the Magic Regex Term button to avoid rewriting the term of simple regexes: the query /cats?/ alone will then automatically produce the expected term (“cat”), as well as /wom[ae]n/.

2. But there is even more! You can transform a list of simple words, assumed to be singular, so that it systematically generates the corresponding query patterns. Let's start from the list:

cat
dog
snake
rabbit
horse
. . .
 

It will be assumed that all the elements thus referenced have a simple plural of the form <singular>+s. Then we can use the ~format directive do have the job done in just prepending one instruction to the query list:

//~format :: /^0s?/ => ^0
cat
dog
snake
rabbit
horse
. . .
 

This structure will detect both singular and plural occurrences, while assigning them a single, consistent index entry (that is, the singular form).

format directive

Queries and White Spaces

• We want to capture the string " USD" (with a space before), but the query " USD" seems to be interpreted as "USD" without space. Why?

When a key is based on a simple token (with no starting slash), spaces at the beginning of the string are automatically ignored. Similarly, trailing spaces are ignored if the key has no ending slash. Study the following queries:

    sample
        sample/w
      sample    => Words > $0

In each case the considered token is just "sample" (without space).

To compel IndexMatic³ to take ending/trailing space(s) into account, simply enclose the key between slash characters:

    / USD/

Note that the expression is then parsed as a regular expression, but this has no side-effect unless you use pattern-specific operators.

Generic Space

Website Names, URLs

• Is it possible to index every website mentioned in my document and to render the entries in the form "name [URL], page numbers"?

This primarily depends on how these elements are treated in the document.

If a specific character style is applied to website names and/or their URLs, there is no difficulty in retrieving these items using the Style filter and a generic query like: /.+/

If the data are not styled, you have to provide your own wordlist to grab website names (as the script cannot recognize a priori the name of a website in the text.)

If you only have to grab the URLs, send a query like:

    /(http:\/\/|www\.)\S+/I

or a more sophisticated one. This approach can be used as a preparatory step to identify websites. You can then get the results from the Hits report and build an improved query list that considers both website names and URLs.

Introduction to Regular Expressions

Key Length / Grouping Alternatives

• I found it very useful to target a list of subtopics with queries like:
/John|Bill|Kate/=>Authors>$0. But, how many terms can be in such a query? I have several hundreds of such terms. Can this number be handled by the query?

The maximum length of an IndexMatic³ regex is no longer limited, so it actually depends of what the subsystem (that is, ExtendScript) can digest. We have not tested all environments, but it is likely that beyond 1,000 characters the interpreter will struggle. This still allows you to write very complex regular expressions. Should you reach the limit, a solution is to use a regular Query List:

    John => Authors > $0
    Bill => Authors > $0
    Kate => Authors > $0
    ...

Note. - And an even better solution is to simply use the ~format directive at the top of the “John¶Bill¶Kate...” list:
    //~format :: ^0 => Authors > $0

You can still optimize the list using grouped alternatives. For example:

    // A...
    /Aaron|Adelle|Alban|Andre|Arnold|Ava/ => Authors > $0
    // B...
    /Barton|Beatrice|Benjamin|Brandon|Breanna/ => Authors > $0
    // C...
    /Celeste|Charmaine|Chuck|Constance|Curt/ => Authors > $0
    ...

According to our tests, the regex engine gradually loses speed when the pattern involves a huge number of alternatives, to the point that it becomes more efficient to run separate queries performing exactly the same function!

Alternatives

Meaning of the "\w" Metacharacter

• What is the exact scope of the \w metacharacter?

The \w metacharacter automatically fits the current alphabet: it matches any character of the selected alphabet, and optionally matches the hyphens, digits, apostrophes, and/or the underscore — depending on the checked boxes in the Alphabet field.

Note that the behavior of \w is IndexMatic-specific. In pure JavaScript regular expressions, \w only matches an alphanumeric character, including "_". Hence, if you want to make \w behave as in JavaScript, select the ASCII alphabet and check the “Digits” and “Underscore” switches. Otherwise, you can still use the character class [a-zA-Z0-9_].

Alphabet

Extracting XML Data

• Is IndexMatic³ able to parse text like:
"...<index>New Orleans</index>..."?
And is it possible to extract the attribute as well? Like:
"...<index entry="New Orleans, LA">New Orleans</index>..."

To capture the content of the <index> element, send the following query:
    /<index>([ \w]+)<\/index>/ => $1

And to capture the attribute:
    /<index entry="([ \w,]+)">/ => $1

Note that a more advanced query (labeled “XML Tags” in the Favorites list) can even capture generic patterns of the form <tag>...</tag> and produce the associated “tag > content” topic structure. Here is what it looks like:

    /<([a-z][a-z0-9-]*)>([^<]+)<\/\1>/iW => $1 > $2

Saving/Restoring Favorite Queries

Special Space Characters

• At a specific location of a regular expression, I want to specify "thin space" OR "hair space" rather than the IndexMatic³ generic space. Is it possible?

Simply enter the explicit character class:
   [~<~|] (GREP syntax), or [\u2009\u200A] (Unicode code points).

Note. — Whatever the Generic Space settings, IndexMatic³ always considers the special characters you specify.

GREP-like Metacharacters ; Fine-Tuning Generic Spaces

Generic Apostrophe

• How to perform a simple search on a text containing both the ASCII apostrophe ' U+0027 and the typographic apostrophe ’ U+2019 ?

IndexMatic³ does not natively provide a generic apostrophe feature, but it's super-simple to write your queries as if it did: just use the special class ['] in your regexes. This code is interpreted as [’'] and will therefore capture both forms.

Note. - From a query list, it is then possible to forcibly output the typographic apostrophe by enabling the Magic Regex Term.

GREP-like Operators

Consolidating Terms into a Single Reference

• I have two terms that I would like to consolidate into one reference:
   1) Brahma
   2) Mahabrahma
The outcome I would like is that the index displays as follows:
   Brahma See Mahabrahma
   Mahabrahma 279, 293-295, etc
How should I proceed?

The merging query is

    /Mahabrahma|Brahma/wI => Mahabrahma

(using both the whole-word flag /w and the case-sensitive flag /I.)

And the cross-reference is:

    // Brahma => Mahabrahma

Cross-References


4/ Proper Names

Proper Names vs. Acronyms

• I am a genealogist, and my books have many people's names. I tried using the “Acronyms” query from the list of favorites. The result gives me many abbreviations without surnames. (People commonly used initials instead of given names in the 1800s.) For the instances where I have the abbreviations, the surname is not picked up in indexing. What should I do?

Dealing with proper names primarily depends on how your InDesign document is structured. The built-in favorite queries only provide generic patterns to give you some examples of using IndexMatic³ regular expressions. But it is usually necessary to adjust these commands according to the actual text found in the document, how it is formatted, etc.

First, suppose you have a dedicated character style applied to the target names. It is then extremely easy to recover all the names regardless of their particular formatting (using the /.+/ query). And from the collected set, as typically extracted by the Hits feature, you can then study the different patterns in a systematic way — e.g. “John Doe”, “J. Doe”, “J. K. Doe”, “J. McDoe”… — and decide which of them will require some fine-tuning.

   — There are cases where you want to re-use the /.+/ hits as a fresh, literal query list, because you need to add KEY => TERM rewriting rules that entirely depend on your expectations. For example, you have to add birthdates to the index and this information is not directly accessible in the InDesign document as it comes from, say, a separate database.

   — There are other cases where the whole structure is available in the /.+/ hits extracted from the text, but now you need to discriminate patterns like “John Doe”, i.e. \m\w+ \m\w+ and “J. Doe”, i.e. \m\. \m\w+. Then your original /.+/ single query should be exploded into subpatterns that will perform the required transformations — which also involves a query list, but made up of regular expressions.

The “Acronyms” query is just one particular component that may be involved here. It captures either simple uppercase sequences (“ABCD”), or strings of the form “A.B.C.D.”, “A. B. C. D.”, “A B C D”, or any mixed pattern:

    /\m(?:\.?\ ?\m)+\.?/Is

But this is obviously insufficient to extract surnames. Maybe there wouldn't be much to change for it to work. Let's try this:

    /\m(?:\.?\ ?\m)*\l+/Is

Does it look better? I imagine so, but again, it all depends on the source text. My main point in this discussion is that there is no one-size-fits-all answer.

IndexMatic³: The Big Picture ; Metacharacters

Reformatting a Name List

• I have a list of several hundred names in alphabetical order: “Arrabas, Jacques¶Burton, Robert W.¶Dupont de Boismont, Hyppolite¶etc. How to process it as a query list?

In most situations involving proper nouns, the central question is: in what form is the key element I am looking for in the document? And secondarily, which index entry to associate with it?

The specific part (the KEY) is usually the last name. Aside from genealogy books which require more careful queries, it is generally sufficient to identify that element in the text to produce the desired index entry. So, on average, we want to extract case-sensitive matches like “Arrabas”, “Burton”, “Dupont de Boismont” and generate the respective full terms that will then include the first name (in whatever form).

(If this general scenario encounters exceptions, they will be dealt with separately, on a case-by-case basis.)

Now consider the list initially provided:

    Arrabas, Jacques
    Burton, Robert W.
    Dupont de Boismont, Hyppolite
    . . .

What we want is to take the part before the comma as a search key (alone, case sensitive), and then produce a term absolutely identical to the entire input. For example,

    Arrabas/I => Arrabas, Jacques

But we don't want to carry out this tedious transformation by hand on hundreds of lines! So we keep the list as it is, and put a hat on it: the magic format directive.

//~format /[^,]+/ :: ^1/I => ^0
Arrabas, Jacques
Burton, Robert W.
Dupont de Boismont, Hyppolite
. . .
// here hundreds of "<last name>, <first name>" lines
 

Note. - A split directive would have done the job just as well.

Now suppose that the “Burton” element is not OK, because we have to deal with both “Robert W. Burton” and “Nadia Burton”. Alright! We remove that particular line out of the scope of the format directive and we create a more elaborate query for it, maybe

    /(Nadia) Burton|Burton/I => Burton, {$1:Robert W.}

which means: “If "Nadia" is found just before "Burton", then we're talking about Nadia Burton, otherwise our item is, by defaut, Robert W. Burton.”

And the final query list is then:

//~format /[^,]+/ :: ^1/I => ^0
Arrabas, Jacques
Dupont de Boismont, Hyppolite
. . .
Zipf, George Kingsley
 
// Special cases:
/(Nadia) Burton|Burton/I => Burton, {$1:Robert W.}
. . .
 

Note that a simple empty line stops the format directive.

Directives ; Ternary Operator

Redundant Matches

• In the book I'm indexing I have three variants of first names ("P. H. Nielsen", "L.-D. Nisipeanu", and "G. Kasparov") which I grab from character-style runs. My queries are the following:
    // 1. Catches "P. H. Nielsen" etc.
    /([A-Z]\. [A-Z]\. )([A-Z]\w+)/ => $2, $1
    // 2. Catches "L.-D. Nisipeanu" etc.
    /([A-Z]\.\-[A-Z]\. )([A-Z]\w+)/ => $2, $1
    // 3. Catches all like "G. Kasparov"
    /([A-Z]\. )([A-Z]\w+)/ => $2, $1
But Nielsen is then duplicated in the output as both "Nielsen, H." and "Nielsen, P. H." How to fix this?

Since ExtendScript does not support the “lookbehind” assertion, it is not possible to prevent "P. H. Nielsen" from being detected by your third query, which in this case is redundant since your first query already treats that string. This is a common issue when one need to control the very beginning of the input string. By chance you can solve the whole problem through a single query:

    // Catches all cases from a single pattern:
    /([A-Z]\.(?:[ -][A-Z]\.)?) ([A-Z]\w+)/ => $2, $1

This works because the ? operator in the middle of the query is greedy, which forces the engine to grab "P. H." rather than "P." alone. As a general strategy, when you need to capture variants of a single data type it's better to address all the cases from a unique query. Using multiple queries causes redundancy if two distinct regular expressions have a chance to capture the same string.

Note. — The syntax (?: in the above pattern declares a non-capturing parenthesis. This allows to create an optional group,
    (?:[ -][A-Z]\.)?
without increasing the number of captured placeholders, so $2 always refers to the second part of the global pattern, ([A-Z]\w+).

Greedy and Non-Greedy Quantifiers ; $-Variables


5/ Advanced Queries

Punctuation

• What is the most generic way to indicate a punctuation character in a regular expression?

The Unicode metacharacter \p{P} will capture any punctuation sign. Some refinements are available:

Unicode Properties

Stats on Letters

• I want to report all letters contained in a document, including non-Latin letters, with their related frequency. Can this be done through the “Hits” feature?

1. Enter the following (single) query:
    /\p{L}/IW!

Note. - The special flag ! locally deactivates the “Stop Words” mechanism, as single-letter words would be considered meaningless.

2. Ctrl-Click the Hits button.

3. Make the Entry Term and Frequency fields visible.

4. Click OK.

Note. — The \p metacharacter is never affected by the Alphabet settings, you can always use it to search characters from their Unicode properties.

Adding Local Flags ; Unicode Properties ; Stop Words

Using the "$" Symbol

• We are testing different regular expressions in order to find the most relevant in URL extraction. Can IndexMatic³ display what pattern has produced each specific set of results?

In the query term, the $ symbol always represents the original key in its literal form. For example:

    /[a-z]{3}\d/ => $

will report in a single topic, "[a-z]{3}\d", every page that contains a sequence of three letters followed by a digit.

Hence, you can easily backup the relationship between each pattern and the found terms. In your case, a possibility is to output the patterns as first-level topics and the URLs as related subtopics:

    /pattern1/ => $ > $0
    /pattern2/ => $ > $0
    etc.

$-Variables

Emojis

• My recipe book contains emojis (EmojiOne font) that typically range from 🍄 (U+1F344) to 🍿 (U+1F37F). I use them to extract the category of the word that immediately follows, so my queries need to use patterns like /([🍄-🍿])(\w+)/=>$1>$2, or so. But IndexMatic³ will not find my items, why?

1. Unicode characters reflecting emojis are beyond the Basic Multilingual Plane (BMP), that is, they involve codepoints higher than U+FFFF. These cannot be integrated into a regex character class, because they are then interpreted as the UTF16 surrogate pair they contain. E.g, [🍄-🍿] just means [\uD83C\uDF44-\uD83C\uDF7F] to the interpreter, which is obviously not what you want to express.

The solution is to systematically address the sets of emojis using an alternation structure of the form (a|b|c|...). This requires the characters to be given exhaustively, but it's the only way to provide them to the regular expression engine.

In your case, since the set of icons is known and fixed — say 🍅, 🍆, 🍇, 🍈, 🍉, 🍋, 🍐, 🍞 — you can use the query:

    /(🍅|🍆|🍇|🍈|🍉|🍋|🍐|🍞)(\w+)/ => $1 > $2

2. But there is an even more expert approach. If the target emojis have consecutive code points — as in your example U+1F344 MUSHROOM — U+1F37F POPCORN — you can usually factor out the first UTF16 component (here \uD83C) and create a very compact regex of the form (\uD83C[\uDF44-\uDF7F]) which actually covers all food-related emojis. Hence the full query:

    /(\uD83C[\uDF44-\uDF7F])(\w+)/ => $1 > $2

Note. - A good reflex is to save your discovery, /(\uD83C[\uDF44-\uDF7F]), as a favorite query e.g “Emoji Food”.

Codepoints (Unicode) ; Saving/Restoring Favorite Queries

Arabic Diacritics

• In Arabic, IndexMatic³ missed around 100 words that happen to have a Shadda. How to fix this?

Short answer: with “Arabic” selected in the Alphabet list, use the class [\w\p{Mn}] instead of \w when you need to target special diacritics such as the Shadda mark.

Why is this so? Well, U+0651 ARABIC SHADDA is not part of the basic set of Arabic letters in IndexMatic³, which explains why \w does not recognize it. My source for defining letters in different scripts is the Unicode standard. According to Unicode, U+0651 is not a letter (although it belongs to the Arabic block), it is rather a Nonspacing Mark (Mn) that inherits its script property from the preceding character. More precisely, it is a diacritical combining mark used with the Arabic alphabet and mainly available to Arabic and Syriac texts.

Thanks to your message, I'm now familiar¹ with the fact that Arabic texts may require such additional non-letter characters to have some diacritics properly specified and rendered.

¹ That's not a well-known fact from a Latin-centered language speaker like me :-/ In French, for example, all usual diacritics are already integrated in the set of Unicode letters, e.g ‘É’ is a single character (U+00C9 LATIN CAPITAL LETTER E WITH ACUTE). It is true that we can also specify the glyph ‘É’ by combining the letter E with the acute accent (U+0301 COMBINING ACUTE ACCENT), so one formal letter and a diacritical combining mark are used, but most of us are not aware of that option — which Unicode describes as a Canonical Decomposition (NFD).

Now, should we add Combining Diacritical Marks to the sets of IndexMatic letters? By default, we certainly shouldn't. In many languages, that's not what the user expects. The main reason is that a combining mark, alone, is not a letter and could be used with arbitrary characters that are not letters either. However, the issue you've reported shows that including combining marks might be an option... and such an option is definitely relevant in the case of Arabic language!

Workaround. - Technically, we already have the ability to use the composite class [\w\p{Mn}] instead of \w when dealing with Arabic letters. This way, a word like

    الصّبي

processed through the query /[\w\p{Mn}]+/ will be properly identified, while /\w+/ would split it into two invalid forms:

    الص
    بي

But I now understand that it would be much more practical to override \w so it could already capture the Shadda as well. The solution I'm considering is to add an option “Include combining diacritical marks” to the Alphabet selectors (those that allow you to selectively include digits, hyphens…). At the moment this is a pending feature.

Alphabets ; Unicode Properties


6/ Output

XML Output

• I'm confused about the XML file output. Once you edit your XML file, how do you get that info back into InDesign and turn the XML info into an index?

The XML output feature is independent from InDesign. It allows you to retrieve your index as a pure XML file for further processing (database, etc.), but it is not intended to comply with the InDesign XML workflow—wich is very specific.

XML Output

Bypass Sorting

Is there a way to index without sorting the results (alphabetical or otherwise)?

What you are looking for is the “Keep Original Order” button. It is available in Finder > Query List mode.

Keep Original Order

Multiple indexes

• My problem is to create multiple indexes for a book (index of places separately from the index of personal names, etc.). Is IndexMatic³ capable of making such multiple indexes?

1. Basically, IndexMatic³ does not manage multiple indexes at the same time. You can configure options/queries first to build the PLACES index, then you want to re-run the script and apply new settings in order to create the NAMES index, etc. Switching to specific prepared wordlists should help you to quickly achieve these tasks.

2. Another interesting option is to create a single query list based on subtopics, where each subtopic represents in fact a sub-index (NAMES, PLACES, etc.), for example:

    /Boston|Atlanta|Paris/ => PLACES > $0
    /John|Bob|Jiri/ => NAMES > $0

Although this results in a single file, your final entries will be rendered in the form:

    NAMES
        Bob 5, 12-13, 20...
        Jiri 14, 18, 22...
        John 17, 20-23...
    PLACES
        Atlanta 7, 9, 12-13...
        Boston 15...
        Paris 12-15, 17-22...

Nested Topics ; Exporting and Re-Importing Settings

Reporting not found entries

• The previous version of IndexMatic offered an option to mark “not found” terms in the index (using the — character). Has this feature disappeared?

Go to Output > Destination > Frequency and set the Minimum to 0 (zero). This leads IndexMatic³ to report terms without match.

Not Found Entries


7/ Known Issues and Troubleshooting

Preserving Text Formatting

• Is there a way to preserve character-level formatting in an index entry? For example: book titles, movie titles, species names, etc., are typically italicized.

Sorry, no. IndexMatic³ is primarily a text-driven search engine and it does not keep track of applied formatting.

Note that the same expression, or text pattern, may appear with different formatting in the document. Suppose IndexMatic³ finds “New York Times” italicized on page 1 and the same string with no formatting on page 2... Is this the same TOPIC? How to compute hits, Page Rank, etc., in that context?

Note, however, that IndexMatic³ offers you very granular control of the styles generated in IDML or InDesign output. This makes it possible to post-process many formatting attributes for terms and headings (H0, H1, H2...), page numbers, separators, etc. In practice, many iX³ users take advantage of InDesign's GREP styles to adjust the typography of specific elements.

Predefined Output Styles

“Indirect” Character Styles

• When indexing based on character styles, IndexMatic³ seems to find only those character styles that are explicitely applied and not those that are applied by "nested styles" or a "GREP style" within a paragraph style. Can you make the script find character styles that are formatted with a nested/GREP style?

IndexMatic³ indeed does not inspect indirect styles, it only considers formal Character and Paragraph styles, whatever the way they are enriched or overridden.

A workaround is to use a companion script that converts local formatting to pure character styles. Some tools were reviewed in this post from CreativePro: “Free Scripts Help Fix Word Formatting”.

Style Overrides

Placed PDF

• Some pages of our product catalogs are built using PDF files that I import directly into InDesign. But when I use IndexMatic³ it does not detect the article codes entered in these pages, why?

Indeed, IndexMatic³ does not “see” the text written in a PDF placed in InDesign. And besides, InDesign itself does not see this text and does not offer us any functions to access it.

At the moment there is no simple and straightforward answer to this issue. Basically, an imported PDF behaves like an image. The scripting subsystem exposes it as a graphic object and its intimate content remains inaccessible to us. Certainly, the PDF encoding would in principle allow the text to be recovered (this is typically what Web browsers do), but such functionality is quite cumbersome to embed in IndexMatic³. However, we are thinking about it…

“Black Mirror”

• Although everything was working fine until now, my IndexMatic console remains empty and no longer seems to respond.

Due to a bug that appeared with InDesign 2024 (19.x), the display of certain colors is no longer supported in the user interface (which led to the feeling of an empty screen while IndexMatic³ continued to function normally in the background).

This cosmetic bug has been fixed (i.e., worked around) starting with version 3.23111. Download the new version of the program from your private link!

Compatibility Patch for InDesign 2024

Book Chapters are not Remembered!

• The “Remember book chapters” preference does not seem to work. When I restart IndexMatic³, it does not preselect the chapters I had just worked on.

This bug has been fixed in version 3.24012. Download the new version of the program from your private link!

Remember book chapters

InDesign not responding

• IndexMatic³ was processing my regular expressions at lightning speed, and suddenly, after a slight (?) change to my query list, the progress bar froze and InDesign became unresponsive.

In this very specific situation where InDesign becomes unresponsive while the CPU is still working at full speed — and if you cannot force the dialog to close by holding down the Esc key — then you are experiencing an infinite loop of the ExtendScript regular expression processor.

Strictly speaking, this is not an IndexMatic bug (although it obviously results in a program crash!), but rather an internal malfunction of the scripting subsystem itself. There are indeed bugs inherent to the processing of regexes by ExtendScript. No one has an exhaustive list, so it is almost impossible to detect and/or neutralize them in advance.

However, nine times out of ten, we know that this category of problems is related to a backtracking error — which then results in this cursed infinite loop. More concretely, we know that most crashes arise from the use of nested quantifiers or quantified alternations, typically in the following simplified pattern:

   /... (a|b|c)?/ => ...

The question mark at the end of the regex makes capturing one of the alternatives (a|b|c) optional, and this seemingly innocuous pattern can trigger a critical loop (to ExtendScript) especially when it occurs near the end of the query, as it is not followed by additional constraints forcing the processor to make a decision.

In the particular case that we have just mentioned, it is sometimes beneficial to rewrite the pattern in an equivalent way by replacing the quantifier ? with an empty option:

   /... (a|b|c|)/ => ...

More generally, try to reduce the superposition of quantifiers and optional elements in your queries, facilitate the processor's calculation by imposing more severe conditions on pattern boundaries. (Experienced users have also noticed that using non-greedy quantifiers rather than lookahead assertions could occasionally resolve difficult problems.)

“The Hard Problem of Quantified Alternatives in ExtendScript” ; Advanced Queries: Quantifiers


Do you have a question to add? A bug to report?
Let's get in touch → support{at}indiscripts[dot]com