Regular Expression - ignore or exclude a specific word, find everything else

Posted in development on March 24, 2021 by Adrian Wyssmann ‐ 2 min read

I want to use a regular expression to exclude a complete word. I need this for a particular situation which I explain further

Problem

As part of Implementing a vulnerability Waiver Process for infected 3rd party libraries I have a jira transition dialog, which excepts the user to set some values. There are two drop-down fields or as JIRA calls it “Select List (single choice)”. These always present a value None in case nothing is selected.

jira example issue transition dialog
A issue transition dialog with drop-down fields containing 'None' values

In order to ensure, that when doing a transition to a specific state, a proper value is selected we use jira-validators. These validators support regular expressions, so the question is now, how I ensure that the selected value is not None:

Solution

Some searching in the web I found a solution in the regular-expressions-cookbook - which sample is readable. So the solution is

\b(?!None\b)\w+

The result is a proper evaluation of the value in the dialog:

jira example issue transition dialog with error
Validation of the drop-down fields containing 'None' values

Explain the details

As explained in regular-expressions-cookbook and while looking at the regular-expressions.info you can understand why the above solution works:

  • negate character classes

    Typing a caret ^ after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class.

    The issue with this is the part highlighted: It matches any character, so using [^None] ignores anything containing N, o, n and e - but we care about the whole word.

  • wordboundaries

    \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b

    The issue with that is that \b[^None]\w+\b is still looking at the character class thus ignoring any word that contains N, o, n and e

  • negative lookagead

    Similar to positive lookahead, except that negative lookahead only succeeds if the regex inside the lookahead fails to match.

So the final solution using the techniques mentioned above

  • \b asserts the position at a word boundary
  • (?! not followed by
  • None the word we want to “ignore” i.e. should not match
  • \b asserts the position at a word boundary
  • ) ends the negative lookahead
  • \w+ still match anything other