RE2/J: Linear-time regular-expression matching for Java
February 24th, 2015 | Published in Google Open Source
Today we’re announcing the public release of RE2/J: a pure-Java implementation of the popular RE2 regular expression library.
Although RE2/J is not always faster than java.util.regexp, its running time is always linear in the size of the input. Thus when matching larger inputs, especially against patterns containing a high degree of alternation, RE2/J may be dramatically faster. With a backtracking implementation such as java.util.regexp, it is not hard to construct a pathological pattern whose matcher would take years to run on some inputs, so RE2/J's performance guarantee makes it suitable for use in applications where the pattern is supplied by untrusted users, such as the clients of a web server.
If you are looking for a detailed technical discussion of the motivation for RE2 and RE2/J and the tradeoffs involved, please see “Regular Expression Matching Can Be Simple And Fast” and “Regular Expression Matching in the Wild”, both written by Russ Cox.
RE2/J is used widely by Java projects within Google. In many cases, it can be used as a drop-in replacement for java.util.regexp. We are pleased to be able to make this library available for public consumption.
Please head to RE2/J’s GitHub page to find out how to use it!
By James Ring, Google Engineering