Google Analytics 4 (GA4)
This page contains the setup guide and reference information for the Google Analytics 4 source connector.
Google Analytics 4 (GA4) is the latest version of Google Analytics, introduced in 2020. It offers a new data model that emphasizes events and user properties, rather than pageviews and sessions. This updated model allows for more flexibility and customization in reporting, and provides more accurate measurement of user behavior across various devices and platforms.
The Google Analytics Universal Analytics (UA) connector utilizes the older version of Google Analytics, which was the standard for tracking website and app user behavior before the introduction of GA4. Please note that the UA connector is being deprecated in favor of this one. As of July 1, 2023, standard Universal Analytics properties no longer process hits. For further reading on the transition from UA to GA4, refer to Google's official support page.
Prerequisites
- A Google Analytics account with access to the GA4 property you want to sync
Setup guide
For Airbyte Cloud
For Airbyte Cloud users, we highly recommend using OAuth for authentication, as this significantly simplifies the setup process by allowing you to authenticate your Google Analytics account directly in the Airbyte UI. Please follow the steps below to set up the connector using this method.
Log in to your Airbyte Cloud account.
In the left navigation bar, click Sources. In the top-right corner, click + New source.
Find and select Google Analytics 4 (GA4) from the list of available sources.
In the Source name field, enter a name to help you identify this source.
Select Authenticate via Google (Oauth) from the dropdown menu and click Authenticate your Google Analytics 4 (GA4) account. This will open a pop-up window where you can log in to your Google account and grant Airbyte access to your Google Analytics account.
Enter the Property ID whose events are tracked. This ID should be a numeric value, such as
123456789
. If you are unsure where to find this value, refer to Google's documentation.noteIf the Property Settings shows a "Tracking Id" such as "UA-123...-1", this denotes that the property is a Universal Analytics property, and the Analytics data for that property cannot be reported on using this connector. You can create a new Google Analytics 4 property by following these instructions.
(Optional) In the Start Date field, use the provided datepicker or enter a date programmatically in the format
YYYY-MM-DD
. All data added from this date onward will be replicated. Note that this setting is not applied to custom Cohort reports.(Optional) In the Custom Reports field, you may optionally provide a JSON array describing any custom reports you want to sync from Google Analytics. See the Custom Reports section below for more information on formulating these reports.
(Optional) In the Data Request Interval (Days) field, you can specify the interval in days (ranging from 1 to 364) used when requesting data from the Google Analytics API. The bigger this value is, the faster the sync will be, but the more likely that sampling will be applied to your data, potentially causing inaccuracies in the returned results. We recommend setting this to 1 unless you have a hard requirement to make the sync faster at the expense of accuracy. This field does not apply to custom Cohort reports. See the Data Sampling section below for more context on this field.
It's important to consider how dimensions like month
or yearMonth
are specified. These dimensions organize the data according to your preferences.
However, keep in mind that the data presentation is also influenced by the chosen date range for the report. In cases where a very specific date range is selected, such as a single day (Data Request Interval (Days) set to one day), duplicated data entries for each day might appear.
To mitigate this, we recommend adjusting the Data Request Interval (Days) value to 364. By doing so, you can obtain more precise results and prevent the occurrence of duplicated data.
- Click Set up source and wait for the tests to complete.
For Airbyte Open Source
For Airbyte Open Source users, the recommended way to set up the Google Analytics 4 connector is to create a Service Account and set up a JSON key file for authentication. Please follow the steps below to set up the connector using this method.
Create a Service Account for authentication
- Sign in to the Google Account you are using for Google Analytics as an admin.
- Go to the Service Accounts page in the Google Developers console.
- Select the project you want to use (or create a new one) and click Continue.
- Click + Create Service Account at the top of the page.
- Enter a name for the service account, and optionally, a description. Click Create and Continue.
- Choose the role for the service account. We recommend the Viewer role (Read & Analyze permissions). Click Continue.
- Select your new service account from the list, and open the Keys tab. Click Add Key > Create New Key.
- Select JSON as the Key type. This will generate and download the JSON key file that you'll use for authentication. Click Continue.
Enable the Google Analytics APIs
Before you can use the service account to access Google Analytics data, you need to enable the required APIs:
- Go to the Google Analytics Reporting API dashboard. Make sure you have selected the associated project for your service account, and enable the API. You can also set quotas and check usage.
- Go to the Google Analytics API dashboard. Make sure you have selected the associated project for your service account, and enable the API.
Set up the Google Analytics connector in Airbyte
Navigate to the Airbyte Open Source dashboard.
In the left navigation bar, click Sources. In the top-right corner, click + New source.
Find and select Google Analytics 4 (GA4) from the list of available sources.
Select Service Account Key Authenication dropdown list and enter Service Account JSON Key from Step 1.
Enter the Property ID whose events are tracked. This ID should be a numeric value, such as
123456789
. If you are unsure where to find this value, refer to Google's documentation.noteIf the Property Settings shows a "Tracking Id" such as "UA-123...-1", this denotes that the property is a Universal Analytics property, and the Analytics data for that property cannot be reported on in the Data API. You can create a new Google Analytics 4 property by following these instructions.
(Optional) In the Start Date field, use the provided datepicker or enter a date programmatically in the format
YYYY-MM-DD
. All data added from this date onward will be replicated. Note that this setting is not applied to custom Cohort reports.
If the start date is not provided, the default value will be used, which is two years from the initial sync.
Many analyses and data investigations may require 24-48 hours to process information from your website or app. To ensure the accuracy of the data, we subtract two days from the starting date. For more details, please refer to Google's documentation.
- (Optional) In the Custom Reports field, you may optionally provide a JSON array describing any custom reports you want to sync from Google Analytics. See the Custom Reports section below for more information on formulating these reports.
- (Optional) In the Data Request Interval (Days) field, you can specify the interval in days (ranging from 1 to 364) used when requesting data from the Google Analytics API. The bigger this value is, the faster the sync will be, but the more likely that sampling will be applied to your data, potentially causing inaccuracies in the returned results. We recommend setting this to 1 unless you have a hard requirement to make the sync faster at the expense of accuracy. This field does not apply to custom Cohort reports. See the Data Sampling section below for more context on this field.
It's important to consider how dimensions like month
or yearMonth
are specified. These dimensions organize the data according to your preferences.
However, keep in mind that the data presentation is also influenced by the chosen date range for the report. In cases where a very specific date range is selected, such as a single day (Data Request Interval (Days) set to one day), duplicated data entries for each day might appear.
To mitigate this, we recommend adjusting the Data Request Interval (Days) value to 364. By doing so, you can obtain more precise results and prevent the occurrence of duplicated data.
- Click Set up source and wait for the tests to complete.
Supported sync modes
The Google Analytics source connector supports the following sync modes:
Supported streams
This connector outputs the following incremental streams:
- Preconfigured streams:
- daily_active_users
- devices
- four_weekly_active_users
- locations
- pages
- traffic_sources
- website_overview
- weekly_active_users
- user_acquisition_first_user_medium_report
- user_acquisition_first_user_source_report
- user_acquisition_first_user_source_medium_report
- user_acquisition_first_user_source_platform_report
- user_acquisition_first_user_campaign_report
- user_acquisition_first_user_google_ads_ad_network_type_report
- user_acquisition_first_user_google_ads_ad_group_name_report
- traffic_acquisition_session_source_medium_report
- traffic_acquisition_session_medium_report
- traffic_acquisition_session_source_report
- traffic_acquisition_session_campaign_report
- traffic_acquisition_session_default_channel_grouping_report
- traffic_acquisition_session_source_platform_report
- events_report
- weekly_events_report
- conversions_report
- pages_title_and_screen_class_report
- pages_path_report
- pages_title_and_screen_name_report
- content_group_report
- ecommerce_purchases_item_name_report
- ecommerce_purchases_item_id_report
- ecommerce_purchases_item_category_report_combined
- ecommerce_purchases_item_category_report
- ecommerce_purchases_item_category_2_report
- ecommerce_purchases_item_category_3_report
- ecommerce_purchases_item_category_4_report
- ecommerce_purchases_item_category_5_report
- ecommerce_purchases_item_brand_report
- publisher_ads_ad_unit_report
- publisher_ads_page_path_report
- publisher_ads_ad_format_report
- publisher_ads_ad_source_report
- demographic_country_report
- demographic_region_report
- demographic_city_report
- demographic_language_report
- demographic_age_report
- demographic_gender_report
- demographic_interest_report
- tech_browser_report
- tech_device_category_report
- tech_device_model_report
- tech_screen_resolution_report
- tech_app_version_report
- tech_platform_report
- tech_platform_device_category_report
- tech_operating_system_report
- tech_os_with_version_report
- Custom stream(s)
Connector-specific features
Custom Reports
Custom reports in Google Analytics allow for flexibility in querying specific data tailored to your needs. You can define the following components:
- Name: The name of the custom report.
- Dimensions: An array of categories for data, such as city, user type, etc.
- Metrics: An array of quantitative measurements, such as active users, page views, etc.
- CohortSpec: (Optional) An object containing specific cohort analysis settings, such as cohort size and date range. More information on this object can be found in the GA4 documentation.
- Pivots: (Optional) An array of pivot tables for data, such as page views by city, etc. More information on pivots can be found in the GA4 documentation.
A full list of dimensions and metrics supported in the API can be found here. To ensure your dimensions and metrics are compatible for your GA4 property, you can use the GA4 Dimensions & Metrics Explorer.
Custom reports should be constructed as an array of JSON objects in the following format:
[
{
"name": "<report-name>",
"dimensions": ["<dimension-name>", ...],
"metrics": ["<metric-name>", ...],
"cohortSpec": {/* cohortSpec object */},
"pivots": [{/* pivot object */}, ...]
}
]
The following is an example of a basic User Engagement report to track sessions and bounce rate, segmented by city:
[
{
"name": "User Engagement Report",
"dimensions": ["city"],
"metrics": ["sessions", "bounceRate"]
}
]
By specifying a cohort with a 7-day range and pivoting on the city dimension, the report can be further tailored to offer a detailed view of engagement trends within the top 50 cities for the specified date range.
[
{
"name": "User Engagement Report",
"dimensions": ["city"],
"metrics": ["sessions", "bounceRate"],
"cohortSpec": {
"cohorts": [
{
"name": "Last 7 Days",
"dateRange": {
"startDate": "2023-07-27",
"endDate": "2023-08-03"
}
}
],
"cohortReportSettings": {
"accumulate": true
}
},
"pivots": [
{
"fieldNames": ["city"],
"limit": 50,
"metricAggregations": ["TOTAL"]
}
]
}
]
Data Sampling and Data Request Intervals
Data sampling in Google Analytics 4 refers to the process of estimating analytics data when the amount of data in an account exceeds Google's predefined compute thresholds. To mitigate the chances of data sampling being applied to the results, the Data Request Interval field allows users to specify the interval used when requesting data from the Google Analytics API.
By setting the interval to 1 day, users can reduce the data processed per request, minimizing the likelihood of data sampling and ensuring more accurate results. While larger time intervals (up to 364 days) can speed up the sync, we recommend choosing a smaller value to prioritize data accuracy unless there is a specific need for faster synchronization at the expense of some potential inaccuracies. Please note that this field does not apply to custom Cohort reports.
Refer to the Google Analytics documentation for more information on data sampling.
Performance Considerations
The Google Analytics connector is subject to Google Analytics Data API quotas. Please refer to Google's documentation for specific breakdowns on these quotas.
Data type map
Integration Type | Airbyte Type |
---|---|
string | string |
number | number |
array | array |
object | object |
Build instructions
Build your own connector image
This connector is built using our dynamic built process.
The base image used to build it is defined within the metadata.yaml file under the connectorBuildOptions
.
The build logic is defined using Dagger here.
It does not rely on a Dockerfile.
If you would like to patch our connector and build your own a simple approach would be:
- Create your own Dockerfile based on the latest version of the connector image.
FROM airbyte/source-google-analytics-data-api:latest
COPY . ./airbyte/integration_code
RUN pip install ./airbyte/integration_code
# The entrypoint and default env vars are already set in the base image
# ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
# ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
Please use this as an example. This is not optimized.
- Build your image:
docker build -t airbyte/source-google-analytics-data-api:dev .
# Running the spec command against your patched connector
docker run airbyte/source-google-analytics-data-api:dev spec
Customizing our build process
When contributing on our connector you might need to customize the build process to add a system dependency or set an env var.
You can customize our build process by adding a build_customization.py
module to your connector.
This module should contain a pre_connector_install
and post_connector_install
async function that will mutate the base image and the connector container respectively.
It will be imported at runtime by our build process and the functions will be called if they exist.
Here is an example of a build_customization.py
module:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
# Feel free to check the dagger documentation for more information on the Container object and its methods.
# https://dagger-io.readthedocs.io/en/sdk-python-v0.6.4/
from dagger import Container
async def pre_connector_install(base_image_container: Container) -> Container:
return await base_image_container.with_env_variable("MY_PRE_BUILD_ENV_VAR", "my_pre_build_env_var_value")
async def post_connector_install(connector_container: Container) -> Container:
return await connector_container.with_env_variable("MY_POST_BUILD_ENV_VAR", "my_post_build_env_var_value")
Changelog
Version | Date | Pull Request | Subject |
---|---|---|---|
2.0.1 | 2023-10-13 | 31377 | Use our base image and remove Dockerfile |
2.0.0 | 2023-09-29 | 30930 | Use distinct stream naming in case there are multiple properties in the config. |
1.6.0 | 2023-09-19 | 30460 | Migrated custom reports from string to array; add FilterExpressions support |
1.5.1 | 2023-09-20 | 30608 | Revert : auto replacement name to underscore |
1.5.0 | 2023-09-18 | 30421 | Add yearWeek , yearMonth , year dimensions cursor |
1.4.1 | 2023-09-17 | 30506 | Fix None type error when metrics or dimensions response does not have name |
1.4.0 | 2023-09-15 | 30417 | Change start date to optional; add suggested streams and update errors handling |
1.3.1 | 2023-09-14 | 30424 | Fixed duplicated stream issue |
1.2.0 | 2023-09-11 | 30290 | Add new preconfigured reports |
1.1.3 | 2023-08-04 | 29103 | Update input field descriptions |
1.1.2 | 2023-07-03 | 27909 | Limit the page size of custom report streams |
1.1.1 | 2023-06-26 | 27718 | Limit the page size when calling check() |
1.1.0 | 2023-06-26 | 27738 | License Update: Elv2 |
1.0.0 | 2023-06-22 | 26283 | Added primary_key and lookback window |
0.2.7 | 2023-06-21 | 27531 | Fix formatting |
0.2.6 | 2023-06-09 | 27207 | Improve api rate limit messages |
0.2.5 | 2023-06-08 | 27175 | Improve Error Messages |
0.2.4 | 2023-06-01 | 26887 | Remove authSpecification from connector spec in favour of advancedAuth |
0.2.3 | 2023-05-16 | 26126 | Fix pagination |
0.2.2 | 2023-05-12 | 25987 | Categorized Config Errors Accurately |
0.2.1 | 2023-05-11 | 26008 | Added handling for 429 - potentiallyThresholdedRequestsPerHour error |
0.2.0 | 2023-04-13 | 25179 | Implement support for custom Cohort and Pivot reports |
0.1.3 | 2023-03-10 | 23872 | Fix parse + cursor for custom reports |
0.1.2 | 2023-03-07 | 23822 | Improve rate limits customer faced error messages and retry logic for 429 |
0.1.1 | 2023-01-10 | 21169 | Slicer updated, unit tests added |
0.1.0 | 2023-01-08 | 20889 | Improved config validation, SAT |
0.0.3 | 2022-08-15 | 15229 | Source Google Analytics Data Api: code refactoring |
0.0.2 | 2022-07-27 | 15087 | fix documentationUrl |
0.0.1 | 2022-05-09 | 12701 | Introduce Google Analytics Data API source |