Regex

Writing for the first time. This is not meant to be serious,so here's a brief, very over-the-top overview of regular expressions.

What is REGEX

Simply put, it is a line of code that helps you define a pattern to a computer. You use it to describe exactly what pattern you want the system to search with.

Be it searching in a text document, a word file, a PDF document or even scraping the internet — it doesn’t matter. You can wikipedia it for a more formal definition.
That said, regex expressions, when at your fingertips, are one of your best friends as a programmer. Let’s dive into it.

Scenario: You get emailed a 357 page PDF file. And your boss has asked you to identify all addresses of clients inside it.

Assumptions

We’ll assume the following. All addresses:

  1. Have no apartment or suites.
  2. Are of the format: 555, JimCarrey St.
  3. Are either in a street or an avenue and always end with a period.

Let’s do this

We'll create different parts of this regex and then club all the blocks together towards the end. The regular expression you will use to define this starts with:

\d{1,5}

What on planet earth is \d? Good question, it's how you define a digit or a number. Our assumption here is that the house number will NOT be greater than 5 digits, so we tell the computer to find up-to 5 digits in succession (by including the lower limit 1 and upper limit 5 inside { } curly braces)

Next comes whitespace:

\s

Hey! watch this place for a space, you tell the computer. Moving on..

\w+

And then you expect a string of text-characters with multiple letters. \w is used to look for an alphanumeric character. While the + sign tells your computer to look for one or more of these. Now, we add another space. Remember how?

\s

and then we have another series of characters.

\w+

And finally we’re gonna have a period which is denoted by a backslash \ followed by a .

\.

The period . is used to represent any character possible in regex. Time we put it all together:

\d{1,5}\s\w+\s\w+\.

Hurray! As a safety measure and to check if that's correct, let us proceed to plug that in a useful website (that you should totally bookmark) called regex101.

This block tells you exactly what we just covered.

Congratulations, you managed to skim through the world's most generic tutorial. To actually help you get started, I will list some good resources below. See you around.