Data Studio
1.6K members online now
1.6K members online now
Ask questions about Data Studio
Guide Me
star_border
Reply

Regex - how to extract substring when delimiter is more than one character

Visitor ✭ ✭ ✭
# 1
Visitor ✭ ✭ ✭

This is a follow-up question from my earlier one about finding the nth substring in a delimited Campaign Name:

https://www.en.advertisercommunity.com/t5/Data-Studio/Regex-how-to-find-substring-at-nth-delimiter-s...

 

I have a Regex that works to find the nth substring of Campaign  when the delimiter is a single character (and there may or may not be a delimiter after my substring):

 

REGEXP_EXTRACT(Campaign, '^(?:[^-]*-){n-1}([^-]*)-*')  

 

Where "-" is my delimiter and n is the number of the substring I want.

 

Does anyone know how I would change this to work with a multi-character delimiter - e.g. " - "  (space-dash-space)?

 

I've tried replacing all of the '-'s with all of the following:

( - )    " - "  \Q - \E   "\Q - \E"  

but none seem to work - any ideas, please?

 

 

 

 

Regex - how to extract substring when delimiter is more than one character

Visitor ✭ ✭ ✭
# 2
Visitor ✭ ✭ ✭

Hi Jenny, 

 

I’m having a similar issue as you had above. I was wondering whether you found a solution for the “ - “ problem?

 

Thanks in advance.

Regex - how to extract substring when delimiter is more than one character

Visitor ✭ ✭ ✭
# 3
Visitor ✭ ✭ ✭

Hi Nicholas

 

Sadly not.  I am still keen to find a solution! 

 

In the meantime, I'm advising users to (where possible), do 1 of the following:

 

1. Set up campaign names with single character delimiters (i.e. remove the problem)

2. Ensure all campaign names have the same number of delimited segments.  This means that, for a campaign name with N segments:

     Segment 1 will have no leading space, but a trailing space

     Segments 2 to (N-1) will all have leading and trailing spaces

     Segment N will have a leading but not a trailing space

At least this will mean that the segments can be grouped successfully when used in a calculated field.  I think the leading spaces don't show in Data Studio, and as long as any formulae take account of the trailing space, it should be workable.

 

The problems come with historic data or (in an agency setting) when you don't have control over the naming of campaigns...

Regex - how to extract substring when delimiter is more than one character

Visitor ✭ ✭ ✭
# 4
Visitor ✭ ✭ ✭

Still looking for answers - can anyone help please?

Regex - how to extract substring when delimiter is more than one character

Visitor ✭ ✭ ✭
# 5
Visitor ✭ ✭ ✭

just add the space after and before the "-" (I'll replace the [space] for _ just for the sake of explanation 

REGEXP_EXTRACT(Campaign, '^(?:[^-]*- ){n-1}([^-]*) -*') ---> with _ instead of [Space] REGEXP_EXTRACT(Campaign, '^(?:[^-]*-_){n-1}([^-]*)_-*')