Skip to main content

Google Sheets

This page contains the setup guide and reference information for the Google Sheets source connector.

info

The Google Sheets source connector pulls data from a single Google Sheets spreadsheet. Each sheet (tab) within a spreadsheet can be replicated. To replicate multiple spreadsheets, set up multiple Google Sheets source connectors in your Airbyte instance. No other files in your Google Drive are accessed.

Prerequisites

  • Spreadsheet Link - The link to the Google spreadsheet you want to sync.
  • For Airbyte Cloud A Google Workspace user with access to the spreadsheet
  • For Airbyte Open Source:
  • A GCP project
  • Enable the Google Sheets API in your GCP project
  • Service Account Key with access to the Spreadsheet you want to replicate

Setup guide

The Google Sheets source connector supports authentication via either OAuth or Service Account Key Authentication.

For Airbyte Cloud users, we highly recommend using OAuth, as it significantly simplifies the setup process and allows you to authenticate directly from the Airbyte UI.

For Airbyte Open Source users, we recommend using Service Account Key Authentication. Follow the steps below to create a service account, generate a key, and enable the Google Sheets API.

note

If you prefer to use OAuth for authentication with Airbyte Open Source, you can follow Google's OAuth instructions to create an authentication app. Be sure to set the scopes to https://www.googleapis.com/auth/spreadsheets.readonly. You will need to obtain your client ID, client secret, and refresh token for the connector setup.

Set up the service account key (Airbyte Open Source)

Create a service account

  1. Open the Service Accounts page in your Google Cloud console.
  2. Select an existing project, or create a new project.
  3. At the top of the page, click + Create service account.
  4. Enter a name and description for the service account, then click Create and Continue.
  5. Under Service account permissions, select the roles to grant to the service account, then click Continue. We recommend the Viewer role.

Generate a key

  1. Go to the API Console/Credentials page and click on the email address of the service account you just created.
  2. In the Keys tab, click + Add key, then click Create new key.
  3. Select JSON as the Key type. This will generate and download the JSON key file that you'll use for authentication. Click Continue.

Enable the Google Sheets API

  1. Go to the API Console/Library page.
  2. Make sure you have selected the correct project from the top.
  3. Find and select the Google Sheets API.
  4. Click ENABLE.

If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, give your Service account access to your spreadsheet.

Set up the Google Sheets source connector in Airbyte

To set up Google Sheets as a source in Airbyte Cloud:

  1. Log in to your Airbyte Cloud or Airbyte Open Source account.
  2. In the left navigation bar, click Sources. In the top-right corner, click + New source.
  3. Find and select Google Sheets from the list of available sources.
  4. For Source name, enter a name to help you identify this source.
  5. Select your authentication method:

For Airbyte Cloud

  • (Recommended) Select Authenticate via Google (OAuth) from the Authentication dropdown, click Sign in with Google and complete the authentication workflow.

For Airbyte Open Source

  • (Recommended) Select Service Account Key Authentication from the dropdown and enter your Google Cloud service account key in JSON format:

    { "type": "service_account", "project_id": "YOUR_PROJECT_ID", "private_key_id": "YOUR_PRIVATE_KEY", ... }
  • To authenticate your Google account via OAuth, select Authenticate via Google (OAuth) from the dropdown and enter your Google application's client ID, client secret, and refresh token.

  1. For Spreadsheet Link, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click Share in the top right corner, and click Copy Link.
  2. (Optional) You may enable the option to Convert Column Names to SQL-Compliant Format. Enabling this option will allow the connector to convert column names to a standardized, SQL-friendly format. For example, a column name of Café Earnings 2022 will be converted to cafe_earnings_2022. We recommend enabling this option if your target destination is SQL-based (ie Postgres, MySQL). Set to false by default.
  3. Click Set up source and wait for the tests to complete.

Output schema

Each sheet in the selected spreadsheet is synced as a separate stream. Each selected column in the sheet is synced as a string field.

Airbyte only supports replicating Grid sheets.

Supported sync modes

The Google Sheets source connector supports the following sync modes:

Data type map

Integration TypeAirbyte TypeNotes
any typestring

Performance consideration

The Google API rate limits are:

  • 300 read requests per minute per project
  • 60 requests per minute per user per project

Airbyte batches requests to the API in order to efficiently pull data and respect these rate limits. We recommend not using the same user or service account for more than 3 instances of the Google Sheets source connector to ensure high transfer speeds.

Troubleshooting

  • If your sheet is completely empty(no header rows) or deleted, Airbyte will not delete the table in the destination. If this happens, the sync logs will contain a message saying the sheet has been skipped when syncing the full spreadsheet.

Build instructions

Build your own connector image

This connector is built using our dynamic built process. The base image used to build it is defined within the metadata.yaml file under the connectorBuildOptions. The build logic is defined using Dagger here. It does not rely on a Dockerfile.

If you would like to patch our connector and build your own a simple approach would be:

  1. Create your own Dockerfile based on the latest version of the connector image.
FROM airbyte/source-google-sheets:latest

COPY . ./airbyte/integration_code
RUN pip install ./airbyte/integration_code

# The entrypoint and default env vars are already set in the base image
# ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
# ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

Please use this as an example. This is not optimized.

  1. Build your image:
docker build -t airbyte/source-google-sheets:dev .
# Running the spec command against your patched connector
docker run airbyte/source-google-sheets:dev spec

Customizing our build process

When contributing on our connector you might need to customize the build process to add a system dependency or set an env var. You can customize our build process by adding a build_customization.py module to your connector. This module should contain a pre_connector_install and post_connector_install async function that will mutate the base image and the connector container respectively. It will be imported at runtime by our build process and the functions will be called if they exist.

Here is an example of a build_customization.py module:

from __future__ import annotations

from typing import TYPE_CHECKING

if TYPE_CHECKING:
# Feel free to check the dagger documentation for more information on the Container object and its methods.
# https://dagger-io.readthedocs.io/en/sdk-python-v0.6.4/
from dagger import Container


async def pre_connector_install(base_image_container: Container) -> Container:
return await base_image_container.with_env_variable("MY_PRE_BUILD_ENV_VAR", "my_pre_build_env_var_value")

async def post_connector_install(connector_container: Container) -> Container:
return await connector_container.with_env_variable("MY_POST_BUILD_ENV_VAR", "my_post_build_env_var_value")

Changelog

VersionDatePull RequestSubject
0.3.112023-10-1331377Use our base image and remove Dockerfile
0.3.102023-09-2730487Fix bug causing rows to be skipped when batch size increased due to rate limits.
0.3.92023-09-2530749Performance testing - include socat binary in docker image
0.3.82023-09-2530747Performance testing - include socat binary in docker image
0.3.72023-08-2529826Remove row batch size from spec, add auto increase this value when rate limits
0.3.62023-08-1629491Update to latest CDK
0.3.52023-08-1629427Add stop reading in case of 429 error
0.3.42023-05-1529453Update spec descriptions
0.3.32023-08-1029327Add user-friendly error message for 404 and 403 error while discover
0.3.22023-08-0929246Add checking while reading to skip modified sheets
0.3.12023-07-0628033Fixed several reported vulnerabilities (25 total), CVE-2022-37434, CVE-2022-42898
0.3.02023-06-2627738License Update: Elv2
0.2.392023-05-3126833Remove authSpecification in favour of advancedAuth in specification
0.2.382023-05-1626097Refactor config error
0.2.372023-02-2123292Skip non grid sheets.
0.2.362023-02-2123272Handle empty sheets gracefully.
0.2.352023-02-2323057Slugify column names
0.2.342023-02-1523071Change min spreadsheet id size to 20 symbols
0.2.332023-02-1323278Handle authentication errors
0.2.322023-02-1322884Do not consume http spreadsheets.
0.2.312022-10-0919574Revert 'Add row_id to rows and use as primary key'
0.2.302022-10-0919215Add row_id to rows and use as primary key
0.2.212022-10-0415591Clean instantiation of AirbyteStream
0.2.202022-10-1017766Fix null pointer exception when parsing the spreadsheet id.
0.2.192022-09-2917410Use latest CDK.
0.2.182022-09-2817326Migrate to per-stream states.
0.2.172022-08-0315107Expose Row Batch Size in Connector Specification
0.2.162022-07-0713729Improve configuration field description
0.2.152022-06-0213446Retry requests resulting in a server error
0.2.132022-05-0612685Update CDK to v0.1.56 to emit an AirbyeTraceMessage on uncaught exceptions
0.2.122022-04-2012230Update connector to use a spec.yaml
0.2.112022-04-1311977Replace leftover print statement with airbyte logger
0.2.102022-03-2511404Allow using Spreadsheet Link/URL instead of Spreadsheet ID
0.2.92022-01-259208Update title and descriptions
0.2.72021-09-278470Migrate to the CDK
0.2.62021-09-276354Support connecting via Oauth webflow
0.2.52021-09-125972Fix full_refresh test by adding supported_sync_modes to Stream initialization
0.2.42021-08-055233Fix error during listing sheets with diagram only
0.2.32021-06-093973Add AIRBYTE_ENTRYPOINT for Kubernetes support
0.2.22021-04-202994Formatting spec
0.2.12021-04-032726Fix base connector versioning
0.2.02021-03-092238Protocol allows future/unknown properties
0.1.72021-01-211762Fix issue large spreadsheet
0.1.62021-01-271668Adopt connector best practices
0.1.52020-12-301438Implement backoff
0.1.42020-11-301046Add connectors using an index YAML file