Analytics
1.9K members online now
1.9K members online now
Ask questions about filter set-up and issues with using filters in Google Analytics reports
 
Guide Me
star_border
Reply

RegEx to exclude variations on a word?

Visitor ✭ ✭ ✭
# 1
Visitor ✭ ✭ ✭

I need a RegEx that will match a word, but not that word within another word. For instance...I need to look at all Page Titles that include "man" but want exclude words such as "roMAN" or "penMANship" or "sportsMAN" etc., etc.

Re: RegEx to exclude variations on a word?

Participant ✭ ✭ ☆
# 2
Participant ✭ ✭ ☆
Try this:
^man\s|\sman\s|\sman$

Re: RegEx to exclude variations on a word?

Explorer ✭ ☆ ☆
# 3
Explorer ✭ ☆ ☆

Tomasz_C is basically right on, but I would do a bit differently:

(\b|_)man(\b|_)
Just because I've seen a lot of page titles where they don't use spaces and it will also work for URL's.

 

So it will work for all of these:
The man in the moon
The-man-in-the-moon
The_man_in_the_moon

It will also capture hyphenated titles like this:
He decided it was time to man-up

And it won't pick this up:
The roman uses good penmanship because he is a good sportsman
The-roman-uses-good-penmanship-because-he-is-a-good-sportsman

 

 

Re: RegEx to exclude variations on a word?

Participant ✭ ✭ ☆
# 4
Participant ✭ ✭ ☆
Are you sure that "\b" syntax is supported in Google Analytics regular expressions?
Because I think it's not.

Re: RegEx to exclude variations on a word?

Explorer ✭ ☆ ☆
# 5
Explorer ✭ ☆ ☆
Yup, I use it all the time. Although, I did have to test it just now since I use RegEx with a couple of applications and languages that do have some quirks when passing in RegEx syntax. In any case, I tested it in GA with this:
\bgen\b

and got back this:
/content/articles/2015/12/lead-gen-general-membership
/content/articles/2015/12/lead-gen-technology-solution-provider
/content/articles/2015/12/lead-gen-general-membership-free-content
/content/articles/2015/09/connecting-with-your-next-customer-gen-z
/content/articles/2015/12/lead-gen-technology-solution-provider-thank-you
/content/articles/2015/12/lead-gen-general-membership-thank-you


I then took the boundary syntax off and got an additional 4 pages.

Re: RegEx to exclude variations on a word?

Participant ✭ ✭ ☆
# 6
Participant ✭ ✭ ☆
Ok, you're right.
I have checked this syntax with word "nie". And in my Analytics data I have got weird response because of polish letters.
Filter has passed title: "Rozszerzenie objaśnień"

It seems that letters "ś" and "ń" are treated as boundary Smiley Happy

Re: RegEx to exclude variations on a word?

Explorer ✭ ☆ ☆
# 7
Explorer ✭ ☆ ☆
Yes. I believe any character not in the [A-Za-z0-9_] set would count as boundaries because even though letters like the ones you show are in the alphabet of your language, they don't fall with the ASCII (?) range that RegEx uses to determine \w. I think \b is essentially \W with the inclusion of line feeds, carriage returns, etc.