thatshubham

Software Engineer

Coffee and Tea lover



Beginner's guide to REGEX

20/02/2020 | Reading time: 2 minutes


Writing for the first time. This is not meant to be serious, I aim to puke out a brief, very over-the-top overview of regular expressions.

What is REGEX

Simply put, it is a line of code that helps you define a pattern to the computer describing exactly what text are looking for. Be it searching in a text document, a word file, PDF documents or even scraping the internet— it doesn’t matter. You can Wikipedia for a more formal definition. Regex expressions when at your fingertips, are one of your best friends as a programmer. Let’s dive into it.

Scenario: You get emailed a 357 page PDF file. And your boss has asked you to identify all addresses of clients inside it.

Assumptions

We’ll assume the following:

> These address have no apartment or suites.
> All addresses are of the format: 555, JimCarrey St.
> All the addresses are also either a street or an avenue and always end with a period.

 

Let’s do this

We’ll explore different parts of this regex and then club all the blocks together towards the end. The regular expression you will use to define this starts with:

\d{1,5}

What on planet earth is \d? Good question, it’s how you define a digit or a number regexone.com Our assumption here is that the house number will NOT be greater than 5 digits, so we tell the computer to find up-to 5 digits in succession (by including the lower limit 1 and upper limit 5 inside { } curly braces)

Next comes whitespace:

\s

“Hey! watch this place for a space Whitespaces”, you tell the computer.

\w+

And then you expect a string of text-characters with multiple letters. \w is used to look for an alphanumeric/text character Character Ranges while the + sign tells it to look for one or more. Moving on we’ll add another space.

\s

and then we have another series of characters.

\w+

And finally we’re gonna have a period which is denoted by a backslash followed by a .

\.

The period . is used to represent any character possible in regex. Time we put it all together:

\d{1,5}\s\w+\s\w+\.

Hurray! As a safety measure and to check if that’s correct, let us proceed to plug that in a useful website (that you should totally bookmark) called regex101.

Responsive image

This block tells you exactly what we just covered.

Congratulations, you managed to skim through the world’s most generic “tutorial” “blog post”. To actually help you get started, I will list some good resources. See you around.

http://www.regexr.com/
http://www.regular-expressions.info/