Skip to main content

Getting Started with PyAirbyte (Beta)

PyAirbyte is a library that provides a set of utilities to use Airbyte connectors in Python. It is meant to be used in situations where setting up an Airbyte server or cloud account is not possible or desirable, for example in a Jupyter notebook or when iterating on early prototypes on a developer's workstation.

You can also check out this YouTube video on how to get started with PyAirbyte!

Installation

pip install airbyte

Or during the beta, you may want to install the latest from from source with:

pip install 'git+https://github.com/airbytehq/PyAirbyte.git'

Usage

Data can be extracted from sources and loaded into caches:

Try with Colab
import airbyte as ab

source = ab.get_source(
"source-faker",
config={"count": 5_000},
install_if_missing=True,
)
source.check()
source.select_all_streams()
result = source.read()

for name, records in result.streams.items():
print(f"Stream {name}: {len(list(records))} records")

Quickstarts

API Reference

For details on specific classes and methods, please refer to our PyAirbyte API Reference.

Architecture

Architecture

PyAirbyte is a python library that can be run in any context that supports Python >=3.9. It contains the following main components:

  • Source: A source object is using a Python connector and includes a configuration object. The configuration object is a dictionary that contains the configuration of the connector, like authentication or connection modalities. The source object is used to read data from the connector.
  • Cache: Data can be read directly from the source object. However, it is recommended to use a cache object to store the data. The cache object allows to temporarily store records from the source in a SQL database like a local DuckDB file or a Postgres or Snowflake instance.
  • Result: An object holding the records from a read operation on a source. It allows quick access to the records of each synced stream via the used cache object. Data can be accessed as a list of records, a Pandas DataFrame or via SQLAlchemy queries.

Available connectors

The following connectors are available:

LangChain integration

For those interested in using PyAirbyte to drive your LLM use cases, we provide two ways to integrate with LangChain:

  • LangChain native integration: This approach requires you to utilize the langchain-airbyte integration package. Refer to LangChain Docs or watch this YouTube video to get started.

  • PyAirbyte-centric integration: You can also directly use PyAirbyte to create documents. With this approach, you do not need to import langchain-airbyte. Refer to PyAirbyte Document Creation Demo to get started.