# An Online Python re.findall Service

As a programmer, I know that grep, sed and awk are powerful for processing text, but they sometimes aren't that straight-forward for specific tasks, as I need to think about how to filter the lines and the columns out.

So I wonder if there is a handy way to do these tasks?

After using it for a while, I think using regex directly can help, so I launched a re.findall service building on top of Python re.findall API.

Here are some use cases for it.

1. Find all words beginning with a specific prefix.

Imagine that I have a few paragraphs, and I want to find out all words beginning with the letter 's'. I can do it in a shell session with a command awk '{for(i=1; i<=NF; i++) {print $i}}' /tmp/paragraphs.txt | grep '^\w', but that's a lot of typing. Now with this service, I can use one regex with just a few steps: 1. Copy the text to the left input box. 2. Type in the regex: \bs\w*, and click the button. 3. The result will show in the right box. 2. Extract fields out. It's not uncommon to extract some fields from some lines with a similar structure, such as a Protobuf message definition. Imagine that I need to write a few test cases for a Protobuf-based service, and I have such a message (taken from the Protocol Buffers site) at hand: message Person { required string Name = 1; required int32 Id = 2; optional string Email = 3; } The final test case that I want looks like this (Note that the field names in the set_xxx form must be lower-case): Person person; person.set_name("Text Toolkit"); person.set_id(1024); person.set_email("whatacold@gmail.com");  The steps are the same as the above use case. Copy the message definition to the left input box and type (\w+) = in the regex box. It will give you the three field names as output, based on what I can quickly complete the test case with Emacs' help. In the contrast, I can also do it using awk '/=/{print$3}' /tmp/person.proto, which is not too complicated (but much more typing) in this case.

3. Find specific attributes in HTML/XML.

As a widely used configuration file format, I sometimes need to find out all value of a specific attribute in an XML file. HTML has a similar file syntax, so here I make an example from HTML. I now need to figure out what types of input I use in a specific HTML file for whatever reasons. How can I do that?

With sed, I can sed -n 's:^.*<input.*type="$$[^"]\+$$".*\$:\1:p' /tmp/test.html, as you can see it, that is quite a complicated command, and I can barely do it right in my first time.

But with re.findall service, I simply copy and paste the HTML code, write a regex <input[^<]+type="(\w+)" in the box, and click the button. Want to deduplicate the result? Check the box "unique" and click the button again.

The above three cases are only a few examples that arose around from my daily usage, proving that the service is a simple yet powerful service for some scenarios.

Beyond that, there is another problem that it solves, that is the regex syntax varies a bit for grep, sed, and awk. One can hardly make it right when he/she writes it not often. With re.findall, one regex syntax for all, that is the Python regex.