Datahub
This page documents the preview (v2.21) version. Preview includes features under active development and is for development and testing only. For production, use the stable (v2024.1) version.
DataHub is an open-source metadata platform for the data stack. DataHub is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance. It supports various data sources including PostgreSQL.
Because YugabyteDB's YSQL API is wire-compatible with PostgreSQL, Datahub can connect to YugabyteDB as a data source using the PostgreSQL plugin.
Setup
You can run the Docker Compose quickStart example provided in the Datahub GitHub repository against YugabyteDB with the following changes:
- Replace the MySql Docker image with that of YugabyteDB.
- Specify the entrypoint command for the YugabyteDB Docker container.
- Change port from 5432 to 5433
- Change username and password to yugabyte.
- Change the driver to
org.postgresql.Driver.
Make changes in the following files:
-
In
docker/quickstart/docker-compose-without-neo4j.quickstart.yml, change the following:-
Change the EBEAN_DATASOURCE configuration [lines 80-84 and 126-130] as follows:
EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver EBEAN_DATASOURCE_HOST=yugabyte:5433 EBEAN_DATASOURCE_PASSWORD=yugabyte EBEAN_DATASOURCE_URL=jdbc:postgresql://yugabyte:5433/yugabyte EBEAN_DATASOURCE_USERNAME=yugabyte -
Change
mysql-setuptopostgres-setup[line 123]. -
Replace the mysql and mysql-setup container [lines 197 - 231] with yugabyte and postgres-setup container as follows:
yugabyte: container_name: yugabyte hostname: yugabyte image: yugabytedb/yugabyte:latest command: /bin/bash /home/yugabyte/docker-entrypoint-initdb.d/yb-init.sh environment: POSTGRES_USER: ${POSTGRES_USER:-yugabyte} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-yugabyte} ports: - '5433:5433' volumes: - ./yb-setup/:/home/yugabyte/docker-entrypoint-initdb.d/ healthcheck: test: bin/ysqlsh -h `hostname -i` -U yugabyte -tAc 'select 1' -d yugabyte interval: 10s timeout: 5s retries: 20 postgres-setup: container_name: postgres-setup depends_on: yugabyte: condition: service_healthy environment: - POSTGRES_HOST=yugabyte - POSTGRES_PORT=5433 - POSTGRES_USERNAME=yugabyte - POSTGRES_PASSWORD=yugabyte - DATAHUB_DB_NAME=yugabyte hostname: yugabyte-setup image: ${DATAHUB_POSTGRES_SETUP_IMAGE:-acryldata/datahub-postgres-setup}:${DATAHUB_VERSION:-head}
-
-
Create a directory
yb-setupindocker/quickstart/and a script file namedyb-init.shwith the following content and place it underdocker/quickstart/yb-setup/in the repository. The script runs during container initialization to launch the YugabyteDB cluster.bin/yugabyted start sleep 5 bin/ysqlsh -h `hostname -i` -f /home/yugabyte/docker-entrypoint-initdb.d/init.sql tail -f /dev/null -
Copy the file
docker/postgres/init.sqltodocker/quickstart/yb-setup/.
Run the example
Run the example using the following command:
docker compose -f docker-compose-without-neo4j.quickstart.yml up -d
After all the containers are running, you can ingest some demo data by running ./datahub/docker/ingestion/ingestion.sh, or head to http://localhost:9002 (username: datahub, password: datahub) to access the UI.