An Online Python re.findall Service

As a programmer, I know that grep, sed and awk are powerful for processing text, but they sometimes aren't that straight-forward for specific tasks, as I need to think about how to filter the lines and the columns out.

So I wonder if there is a handy way to do these tasks?

After using it for a while, I think using regex directly can help, so I launched a re.findall service building on top of Python re.findall API.

Here are some use cases for it.

Find all words beginning with a specific prefix.

Imagine that I have a few paragraphs, and I want to find out all words beginning with the letter 's'. I can do it in a shell session with a command awk '{for(i=1; i<=NF; i++) {print $i}}' /tmp/paragraphs.txt | grep '^\w', but that's a lot of typing.

Now with this service, I can use one regex with just a few steps:
1. Copy the text to the left input box.
2. Type in the regex: \bs\w*, and click the button.
3. The result will show in the right box.
Extract fields out.

It's not uncommon to extract some fields from some lines with a similar structure, such as a Protobuf message definition.

Imagine that I need to write a few test cases for a Protobuf-based service, and I have such a message (taken from the Protocol Buffers site) at hand:
```
message Person {
  required string Name = 1;
  required int32 Id = 2;
  optional string Email = 3;
}
```
The final test case that I want looks like this (Note that the field names in the set_xxx form must be lower-case):
```
Person person;
person.set_name("Text Toolkit");
person.set_id(1024);
person.set_email("whatacold@gmail.com");
```
The steps are the same as the above use case. Copy the message definition to the left input box and type (\w+) = in the regex box. It will give you the three field names as output, based on what I can quickly complete the test case with Emacs' help.

In the contrast, I can also do it using awk '/=/{print $3}' /tmp/person.proto, which is not too complicated (but much more typing) in this case.
Find specific attributes in HTML/XML. As a widely used configuration file format, I sometimes need to find out all value of a specific attribute in an XML file. HTML has a similar file syntax, so here I make an example from HTML. I now need to figure out what types of input I use in a specific HTML file for whatever reasons. How can I do that? With sed, I can sed -n 's:^.*<input.*type="$[^"]\+$".*$:\1:p' /tmp/test.html, as you can see it, that is quite a complicated command, and I can barely do it right in my first time. But with re.findall service, I simply copy and paste the HTML code, write a regex <input[^<]+type="(\w+)" in the box, and click the button. Want to deduplicate the result? Check the box "unique" and click the button again.

The above three cases are only a few examples that arose around from my daily usage, proving that the service is a simple yet powerful service for some scenarios.

Beyond that, there is another problem that it solves, that is the regex syntax varies a bit for grep, sed, and awk. One can hardly make it right when he/she writes it not often. With re.findall, one regex syntax for all, that is the Python regex.

Python texttoolkit Tools

See also