Data Studio
2.1K members online now
2.1K members online now
Ask questions about Data Studio
Guide Me
star_border
Reply

Problems with expressions - case study (LONG)

Follower ✭ ✭ ✭
# 1
Follower ✭ ✭ ✭

This is a rather long complaint, but although I finally managed to get what I wanted, it took about 20 times as long as I expected, and I want the Google team to see how difficult it is to do even reasonably simple work on your data. So, lets take it from the start.

 

In one of my data sets, I have data split by month, and I want to calculate the daily average (which will depend on the number of days in each month). To do so, I wanted to create a calculated field giving me the number of days in each month, and then divide my data by that field. Simple, eh? No.

 

First of all, the time dimension I use is ga:yearMonth. When I look at the data in Google Sheets, this is shown as "201601" for January, etc. Since I only wanted the month number, I tried to apply MONTH(ga:yearMonth,DEFAULT_DECIMAL). Result: INVALID FORMULA.. even though "Year Month" is recognized as a Date format in the TYPE field, the Month function refuses to consider it.

 

At first, I tried to convert ga:yearMonth to TEXT by using TRIM(), and then set it back to DATE type. This appeared to work somewhat; I got a number which seemed to represent the month number, except that "January 2016" became "12, "February 2016", became "1", etc. 

 

Then, after considerable trial and error, I tried to use REGEXP_MATCH to math these numbers to list of month numbers. The result was three fields (for 31 days, 30 days, or 28 days), which returned "true" if the month value corresponded to one of the listed months, false otherwise. So, for "31-days", the value was true for Jan, Mar, May, Jul, Aug, Oct, Dec; etc. These boolean values were possible to show in a table, with the expectd values.

 

However, no matter what I did, using these boolean values in a CASE expression proved completely impossible. The only explanation I can assume is that, although appearing to display correctly and appearing to be booleans, some kind of "history" of the conversions applied might have remained, screwing up the use of these values in a logical comparison. At this stage, I was about ready to give up, having already spent over 3 hours on a "simple" thing.

 

However, going back to the drawing board, I decided to test a more low-level approach.

So, I converted the original date to a text by using TRIM(ga:yearMonth), and then I could use "REGEXP_REPLACE(TRIM(ga:yearMonth), '2016', '') to get a string "01" for January, etc. Lets call this field "Month-number".

 

To get the number of days corresponding to a certain month-number, I then create the expression:

CASE

WHEN REGEXP_MATCH(Month-number, "01|03|05|07|08||12") THEN 31

WHEN REGEXP_MATCH(Month-number, "04|06|09|11") THEN 30

WHEN REGEXP_MATCH(Month-number, "10") THEN 10

ELSE 28

END

(the special treatment of “10” is to get the extrapolated values for the current month [October]. Unfortunately, that line must be edited each day, or replaced with something more complex; I will look further into that later)

 

I place that expression into a new field "Days". So far so good... However, when I try to use "Days" to get daily averages, I get the error "you can use a dimension or a metric but not both". Sigh... However, if I set the "Aggregation" to "Average" instead on "None" (I have no idea what this really means in my case), the field will become a metric instead of a dimension, and can now be used as I intended. However, when I later go back, the field "Days" has turned back into a metric (green) but the expression still seem to work. Huh?

 

Anyway, this is a rather round-about way to do it all. So, why not remove some of the intermediate steps? Should work, or? It does not... Taking the expression in "Month-number" and putting this directly into the CASE statement, will not be accepted.

 

So, OK, after more than half a day of trial and error, I finally managed to work around the limitations in the formats, the almost total absence of what goes wrong when an expression will not be accepted, the fact that things which work in one context - or appear to do so - will not work in another because of some "hidden" condition. This simple request - getting the daily averages of my data - should have only taken 15 minutes, at most, even if this had not been specifically prepared but programmed from scratch, as done here.

 

I believe the whole scripting functionality should be re-assessed and cleaned up, and possibly re-implemented. There is just too much that makes scripting a real hassle and waste of time to use right now.

 

And, while the team is at it, PLEASE fix the script editor! It only displays about 30 characters, and only on a single line, the scrolling has a life of its own and the cursor often ends up somewhere else, but most of all, it is terribly slow especially after it has run for some time. Add to that, that when faced with "return" or "newline" from another editor, it just removes it without adding another delimiter such as space; et c.

 / kjell