An Online Python re.findall Service


As a programmer, I know that grep, sed and awk are powerful for processing text, but they sometimes aren't that straight-forward for specific tasks, as I need to think about how to filter the lines and the columns out.

So I wonder if there is a handy way to do these tasks?

After using it for a while, I think using regex directly can help, so I launched a re.findall service building on top of Python re.findall API.

/img/2021-07-11-re.findall.png
The website UI

Here are some use cases for it.

  1. Find all words beginning with a specific prefix.

    Imagine that I have a few paragraphs, and I want to find out all words beginning with the letter 's'. I can do it in a shell session with a command awk '{for(i=1; i<=NF; i++) {print $i}}' /tmp/paragraphs.txt | grep '^\w', but that's a lot of typing.

    Now with this service, I can use one regex with just a few steps:

    1. Copy the text to the left input box.

    2. Type in the regex: \bs\w*, and click the button.

    3. The result will show in the right box.

  2. Extract fields out.

    It's not uncommon to extract some fields from some lines with a similar structure, such as a Protobuf message definition.

    Imagine that I need to write a few test cases for a Protobuf-based service, and I have such a message (taken from the Protocol Buffers site) at hand:

    message Person {
      required string Name = 1;
      required int32 Id = 2;
      optional string Email = 3;
    }

    The final test case that I want looks like this (Note that the field names in the set_xxx form must be lower-case):

    Person person;
    person.set_name("Text Toolkit");
    person.set_id(1024);
    person.set_email("whatacold@gmail.com");
    

    The steps are the same as the above use case. Copy the message definition to the left input box and type (\w+) = in the regex box. It will give you the three field names as output, based on what I can quickly complete the test case with Emacs' help.

    In the contrast, I can also do it using awk '/=/{print $3}' /tmp/person.proto, which is not too complicated (but much more typing) in this case.

  3. Find specific attributes in HTML/XML.

    As a widely used configuration file format, I sometimes need to find out all value of a specific attribute in an XML file. HTML has a similar file syntax, so here I make an example from HTML. I now need to figure out what types of input I use in a specific HTML file for whatever reasons. How can I do that?

    With sed, I can sed -n 's:^.*<input.*type="\([^"]\+\)".*$:\1:p' /tmp/test.html, as you can see it, that is quite a complicated command, and I can barely do it right in my first time.

    But with re.findall service, I simply copy and paste the HTML code, write a regex <input[^<]+type="(\w+)" in the box, and click the button. Want to deduplicate the result? Check the box "unique" and click the button again.

The above three cases are only a few examples that arose around from my daily usage, proving that the service is a simple yet powerful service for some scenarios.

Beyond that, there is another problem that it solves, that is the regex syntax varies a bit for grep, sed, and awk. One can hardly make it right when he/she writes it not often. With re.findall, one regex syntax for all, that is the Python regex.


See also

comments powered by Disqus