Data Studio
2.7K members online now
2.7K members online now
Ask questions about Data Studio
Guide Me
star_border
Reply

Regex - how to find substring at nth delimiter (structured campaigns)

Visitor ✭ ✭ ✭
# 1
Visitor ✭ ✭ ✭

This question follows on from Anthony A's question in February:

 

https://www.en.advertisercommunity.com/t5/Data-Studio/Regex-Help-Aggregating-Campaign-Data/m-p/93450...

 

I have structured campaign names, delimited with  ' - '.  Sometimes, I want to aggregate by Brand Name, which will be (say) the 3rd substring of the Campaign.

 

E.g. for the following 5 Campaigns, I'd like to aggregate by the Brand name:

SS - Brands - BrandX - Product_GroupA

SS - Brands - BrandX - Product_GroupB

SS - Brands - BrandY - Product_GroupA

SS - Brands - BrandY - Product_GroupB

SS - Brands - BrandZ - Product_GroupA

 

So that I have an aggregated total for BrandX, BrandY and BrandZ

 

Can I use Regex to extract the 3rd substring of the campaign name, specifying " - " as the delimiter?

Examples would be very helpful, please!

1 ACCEPTED SOLUTION

Accepted Solutions
Marked as Best Answer.
Solution
Accepted by topic author JennyJ
May 2017

Regex - how to find substring at nth delimiter (structured campaigns)

Follower ✭ ☆ ☆
# 4
Follower ✭ ☆ ☆

Hi Jenny,

You can try it using regexp_match function

 

REGEXP_EXTRACT(Campaign, '^(?:[^_]*_){n-1}([^_]*)_')

 

replace n with the position of the substring

 

Was struggling with the same and found a similar answer on stackoverflow

View solution in original post

Re: Regex - how to find substring at nth delimiter (structured campaigns)

Explorer ✭ ✭ ☆
# 2
Explorer ✭ ✭ ☆

Hi Jenny, you can't use REGEX to extract the value of your brand's campaign. 

You can use SUBSTR if all your brands have the same lenght, starting at the same position. 

 

If i take your example : 

SS - Brands - BrandX - Product_GroupA

Your brand start at the position 14, with 7 caracters of length

Then create a new field in your data source in data studio with the formula SUBSTR(Campaigns, 14, 7)

 

If your different brand don't have the same lenght, create first a field "NameBrand" using the REGEX to set all your brands with the same lenght. 

 

And after, a new field who will have the formula SUBSTR(Namebrand, 14, N)

N is the number of letters for all your brands. 

 

Jenny.PNG

Regex - how to find substring at nth delimiter (structured campaigns)

Visitor ✭ ✭ ✭
# 3
Visitor ✭ ✭ ✭

Hi Patrick,

 

Thanks very much for this.  Unfortunately, I think our brands will be wildly varying in length, and may have similar start strings, so I think truncating from (in our example) position 14 wouldn't be sufficient.

 

I've found this post: https://stackoverflow.com/questions/40581650/how-to-find-a-regex-in-re2-to-search-between-strings

This shows you how to extract a substring, by using REGEXREPLACE to create 3 capture groups (1st part of string with known structure,2nd part (the bit we want),3rd part of string with known structure) and to use the 2nd one as the replacement ...

My Regex knowledge isn't enough to work out how to do this, but would you know what expressions I would need for my 3 groups as follows, and how to set them up as capture groups within DataStudio?

1 - {Any number of characters ending in {delimiter}} - occurring N times (the number of delimiters before my substring)

2 - {any number of characters}

3 - {{delimiter} followed by any number of characters}

 

If I had this, I think I could use this 'substitute second capture group' method.

 

I feel I'm so near and yet so far!!

Marked as Best Answer.
Solution
Accepted by topic author JennyJ
May 2017

Regex - how to find substring at nth delimiter (structured campaigns)

Follower ✭ ☆ ☆
# 4
Follower ✭ ☆ ☆

Hi Jenny,

You can try it using regexp_match function

 

REGEXP_EXTRACT(Campaign, '^(?:[^_]*_){n-1}([^_]*)_')

 

replace n with the position of the substring

 

Was struggling with the same and found a similar answer on stackoverflow

Regex - how to find substring at nth delimiter (structured campaigns)

Visitor ✭ ✭ ✭
# 5
Visitor ✭ ✭ ✭

Thanks, Nabeelah, that's brilliant.

 

I'm teaching myself Regex at the moment, and I can see that the '_' in this text is the delimiter, and it reads:

 

‘     (non-capturing group [Character class: not {delimiter} ] {0 or more repetitions}  then a single {delimiter} ) {n-1 times}
(Capturing Group [Character class: not {delimiter} ] {0 or more characters}
followed by  {delimiter}  ‘

 

I've tested it on some real campaigns and have discovered that, in some cases, there is no substring AFTER the substring I'm after, so there's no trailing delimiter. (I know, it would be better if they were all the same structure...)

 

Would putting a * after the final {delimiter} give me what I'm after?  

i.e. REGEXP_EXTRACT(Campaign, '^(?:[^_]*_){n-1}([^_]*)_*')  

(where _ is the delimiter and n is the number of the substring I want to extract)?

 

 

 

 

 

Regex - how to find substring at nth delimiter (structured campaigns)

Visitor ✭ ✭ ✭
# 6
Visitor ✭ ✭ ✭

Nabeelah, I have just answered my own question - putting * after the last delimiter DOES allow me to extract the nth substring, with or without trailing substrings.

 

Thank you so much for your help - I will mark your original answer as Solution, as it answered my original request, but I guess this is helpful for others.

 

Regex - how to find substring at nth delimiter (structured campaigns)

Follower ✭ ☆ ☆
# 7
Follower ✭ ☆ ☆

That's amazing. there's a shorter way to achieve the same. Should be posting that in some time Smiley Happy

Regex - how to find substring at nth delimiter (structured campaigns)

[ Edited ]
Visitor ✭ ✭ ✭
# 8
Visitor ✭ ✭ ✭

Sounds interesting, Nabeelah.  

I'm wondering if you could also help with my (follow-on) question, posted here, to cope with multi-character delimiters?: