Data Studio
2.3K members online now
2.3K members online now
Ask questions about Data Studio
Guide Me
star_border
Reply

Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Visitor ✭ ✭ ✭
# 1
Visitor ✭ ✭ ✭

I'm seeing discrepancies of distinct totals between Data Studio and BigQuery.  For example, 

 

SELECT EXACT_COUNT_DISTINCT(user_id) AS distinct_users_201703
FROM [project:dataset.table]
WHERE date BETWEEN '20170301' AND '20170331'

 

returns 1,240,920 but creating a custom metric in Data Studio using COUNT_DISTINCT(user_id) returns 1,255,326.  Is this a known issue or is there something I'm missing?

 

Thanks in advance,

Rodney

Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Explorer ✭ ✭ ✭
# 2
Explorer ✭ ✭ ✭

Hi Rodney,

 

What result do you get if you run your query in DataStudio? I'm wondering if your GA data is getting sampled in DS?

 

Ben

Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Visitor ✭ ✭ ✭
# 3
Visitor ✭ ✭ ✭

Hi Ben,

 

The data is coming directly from a BigQuery table, not GA.  It's a combination of both GA data and offline data managed by the client's back-end.  The client is 360 premium and therefore shouldn't be sampled.  I created a BigQuery Custom Query data source using the query above in Data Studio which returned the 1,240,920 result.  Using the Data Studio custom metric shown above and filtering via the built-in date range control for March 1st to March 31st returned 1,255,326.

 

Thanks again for any assistance you're able to provide.

 

Best,

Rodney

Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Visitor ✭ ✭ ✭
# 4
Visitor ✭ ✭ ✭

For reference, the 'date' field is in YYYYMMDD format.  It is not a timestamp and does not get into HHMMSS levels of specificity.

Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Visitor ✭ ✭ ✭
# 5
Visitor ✭ ✭ ✭

I encounter the same problem. When I use COUNT_DISTINCT on a BigQuery Datasource, it returns the same number as the COUNT function.

 

However, when I apply COUNT_DISTINCT on a google-sheet datasource, COUNT_DISTINCT returns the correct cound.

Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Visitor ✭ ✭ ✭
# 6
Visitor ✭ ✭ ✭

Also consider the newest notes on "COUNT DISTINCT" STANDARD SQL on https://stackoverflow.com/questions/45840341/firebase-bigquery-monthly-event-counts/45850267#4585026...

 

Regards

Dirk

 

 

Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Visitor ✭ ✭ ✭
# 7
Visitor ✭ ✭ ✭

Same problem here!

Re: Data Studio COUNT_DISTINCT reporting higher monthly totals than BigQuery EXACT_COUNT_DISTINCT

Visitor ✭ ✭ ✭
# 8
Visitor ✭ ✭ ✭

I'm also seeing an issue in varying distinct counts across Data Studio and Big Query. 

 

For example:

 

SELECT
COUNT_DISTINCT(device_id) as distinct_device_ids
FROM my-dataset.my-table

 

returns 14591

 

 

However, using COUNT_DISTINCT(device_id) on the same table (when creating a calculated field) in Data Studio returns 15042.  

Any idea why there might be a discrepancy?