Skip to content

Keycloak Logical Replication Setup FULL (FTMO)

This document explains how to set up logical replication for the Keycloak database
(fftrader-idp-team-idp-keycloak) for use with Google Cloud Datastream.

Important - All steps must be executed by a database admin user. - Do not run any setup as the replication user (datateam_replication).


0. Admin Preparation

Grant yourself the required admin role if needed:

GRANT "fftrader-idp-team-idp-keycloak_full_access" TO viliam;

1. Mandatory Verification: Tables, Primary Keys, Replica Identity

1.1 Verify required tables exist

SELECT table_schema, table_name
FROM information_schema.tables
WHERE table_name IN (
  'client','ftmo_user_marketing_profile','ftmo_user_update',
  'keycloak_group','keycloak_role','user_attribute',
  'user_group_membership','user_role_mapping'
);

1.2 Mandatory: Verify replica identity = DEFAULT (d)

Logical replication requires replica identity to be based on primary keys.

SELECT relname AS table_name, relreplident AS replica_identity
FROM pg_class
WHERE relname IN (
  'client','ftmo_user_marketing_profile','ftmo_user_update',
  'keycloak_group','keycloak_role','user_attribute',
  'user_group_membership','user_role_mapping'
);

All tables must show:

replica_identity = 'd'

If not consult this change with IDP team and let them change the table metadata:

1.3 Mandatory: Verify primary keys

All primary key columns must be included in the publication (for UPDATE/DELETE support):

SELECT tc.table_name, kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
  ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
  AND tc.table_name IN (
        'client','ftmo_user_marketing_profile','ftmo_user_update',
        'keycloak_group','keycloak_role','user_attribute',
        'user_group_membership','user_role_mapping'
  );

2. Create Replication User

CREATE USER datateam_replication WITH PASSWORD 'CHANGE_ME_STRONG_PASSWORD';
ALTER USER datateam_replication WITH REPLICATION;

3. Grant Permissions to Replication User

3.1 Schema usage

GRANT USAGE ON SCHEMA public TO datateam_replication;

3.2 Table-level SELECT permissions

GRANT SELECT ON public.user_attribute,
               public.ftmo_user_update,
               public.ftmo_user_marketing_profile,
               public.user_role_mapping,
               public.keycloak_role,
               public.user_group_membership,
               public.keycloak_group,
               public.client
TO datateam_replication;

4. Create Publication (with Explicit Column Lists)

CREATE PUBLICATION data_team_publication FOR TABLE
    public.user_attribute (id, name, value, user_id),
    public.ftmo_user_update (id, enabled, user_id, registration_date, cid, email),
    public.ftmo_user_marketing_profile (user_id, marketing_id),
    public.user_role_mapping (role_id, user_id),
    public.keycloak_role (id, name),
    public.user_group_membership (group_id, user_id),
    public.keycloak_group (id, name),
    public.client (id, client_id);

Validate publication:

SELECT * FROM pg_publication_tables WHERE pubname = 'data_team_publication';

5. Create Logical Replication Slot

SELECT PG_CREATE_LOGICAL_REPLICATION_SLOT(
  'datateam_publication_slot',
  'pgoutput'
);

Verify:

SELECT slot_name, plugin, database, active
FROM pg_replication_slots
WHERE slot_name = 'datateam_publication_slot';

6. Datastream Configuration Notes

6.1 Whitelist-Based Table Selection

We use whitelisting connection option:

  • add ips provided by Datastream to Cloud SQL authorized networks

Datastream is intentionally restricted to only the tables and columns defined in the publication.


6.2 Single BigQuery Dataset Strategy

All replicated tables are stored in one shared BigQuery dataset.

Benefits:

  • simpler joins
  • unified permissions
  • cleaner downstream transformations
  • easier lineage and monitoring

Because the replicated table set is small and well-defined, separate datasets would add unnecessary complexity.


6.3 Table & Column Alignment Requirement

In Datastream, you must:

  • select the same tables as defined in the publication
  • select the same columns as in the publication’s column lists

This is required because:

  • Datastream validates column lists against the publication
  • Missing/extra columns cause stream initialization failures
  • Update/Delete events fail if primary key columns are mismatched
  • BigQuery relies on PK columns to apply merge/delete operations

7. Operational Notes

  • Changing primary keys requires updating replica identity + publication + Datastream configuration.
  • Dropping the publication or slot immediately breaks replication.
  • All steps should be executed by an admin user.