Datahub

This page documents the preview (v2.21) version. Preview includes features under active development and is for development and testing only. For production, use the stable (v2024.1) version.

DataHub is an open-source metadata platform for the data stack. DataHub is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance. It supports various data sources including PostgreSQL.

Because YugabyteDB's YSQL API is wire-compatible with PostgreSQL, Datahub can connect to YugabyteDB as a data source using the PostgreSQL plugin.

Setup

You can run the Docker Compose quickStart example provided in the Datahub GitHub repository against YugabyteDB with the following changes:

  • Replace the MySql Docker image with that of YugabyteDB.
  • Specify the entrypoint command for the YugabyteDB Docker container.
  • Change port from 5432 to 5433
  • Change username and password to yugabyte.
  • Change the driver to org.postgresql.Driver.

Make changes in the following files:

  • In docker/quickstart/docker-compose-without-neo4j.quickstart.yml, change the following:

    • Change the EBEAN_DATASOURCE configuration [lines 80-84 and 126-130] as follows:

      EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
      EBEAN_DATASOURCE_HOST=yugabyte:5433
      EBEAN_DATASOURCE_PASSWORD=yugabyte
      EBEAN_DATASOURCE_URL=jdbc:postgresql://yugabyte:5433/yugabyte
      EBEAN_DATASOURCE_USERNAME=yugabyte
      
    • Change mysql-setup to postgres-setup [line 123].

    • Replace the mysql and mysql-setup container [lines 197 - 231] with yugabyte and postgres-setup container as follows:

      yugabyte:
         container_name: yugabyte
         hostname: yugabyte
         image: yugabytedb/yugabyte:latest
         command: /bin/bash /home/yugabyte/docker-entrypoint-initdb.d/yb-init.sh
         environment:
           POSTGRES_USER: ${POSTGRES_USER:-yugabyte}
           POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-yugabyte}
         ports:
         - '5433:5433'
         volumes:
         - ./yb-setup/:/home/yugabyte/docker-entrypoint-initdb.d/
         healthcheck:
           test: bin/ysqlsh -h `hostname -i` -U yugabyte -tAc 'select 1' -d yugabyte
           interval: 10s
           timeout: 5s
           retries: 20
      postgres-setup:
        container_name: postgres-setup
        depends_on:
          yugabyte:
            condition: service_healthy
        environment:
        - POSTGRES_HOST=yugabyte
        - POSTGRES_PORT=5433
        - POSTGRES_USERNAME=yugabyte
        - POSTGRES_PASSWORD=yugabyte
        - DATAHUB_DB_NAME=yugabyte
        hostname: yugabyte-setup
        image: ${DATAHUB_POSTGRES_SETUP_IMAGE:-acryldata/datahub-postgres-setup}:${DATAHUB_VERSION:-head}
      
  • Create a directory yb-setup in docker/quickstart/ and a script file named yb-init.sh with the following content and place it under docker/quickstart/yb-setup/ in the repository. The script runs during container initialization to launch the YugabyteDB cluster.

    bin/yugabyted start
    
    sleep 5
    
    bin/ysqlsh -h `hostname -i` -f /home/yugabyte/docker-entrypoint-initdb.d/init.sql
    tail -f /dev/null
    
  • Copy the file docker/postgres/init.sql to docker/quickstart/yb-setup/.

Run the example

Run the example using the following command:

docker compose -f docker-compose-without-neo4j.quickstart.yml up -d

After all the containers are running, you can ingest some demo data by running ./datahub/docker/ingestion/ingestion.sh, or head to http://localhost:9002 (username: datahub, password: datahub) to access the UI.