Skip to content

Database Sanitization 101

Notes cover: - Sanitization architecture with repository links - Times and schedules of various sanitization steps - Restoring users and grants after a failed sanitization

Last update: 29.01.2026, gnd

Table of contents

[[TOC]]

Introduction

DB sanitization is needed for the dev process in order to: - Create a dev version of production database that can be worked on without fear of data destruction - Remove PII (Personally Identifiable Information) from the dev database version

This brings several challenges.

Size and speed

The main production database is a huge monolith of over 650GB data-on-disk (01/2026). To create a up-to-date beta, the data needs to be at least daily exported from the main db and imported into a dev version. This process takes some time (<1 hr, 01/2026).

PID data removal

To remove sensitive data from the main database a set of transformation rules has to be maintained, that describe which data is cleaned up in what way. In many cases the sanitized data need to have a special format, or need to be unique on row-level.

Robustness

A disruption of the sanitization can slow down the development process or even bring it down to a halt in a catastrophic scenario. The sanitization has to be done in a very redundant fashion, so that devs still have at least a old sanitized db to work with in the event of a failure.

All of these challenges and their solutions are addressed in the architecture part.

Sanitization architecture

TODO

Sanitization schedules

Several databases are being synchronized. Synchronizations runs to several receiving databases. Synchronization also runs from several VMs.

Production database Sanitized database Start End VM Note
commondb @ 35.234.68.77 beta_commondb @ 35.234.68.77 06:00, daily, 06:15 Sanitizer VM
hlf_master @ CloudSQL sanitized_hlf_master @ CloudSQL 04:00, daily 04:01 Sanitizer VM
ziskejucet @ SkySQL DB1 ziskejucet @ SkySQL Dev DB1 02:00, daily 03:00 (01/2026) Sky Sanitizer VM

Detailed schedules for the ziskejucet @ SkySQL Dev DB1 sanitization

For a detailed understanding of the process see the sanitization script: skysql_beta_ziskejucet.sh

Following times are approximate and valid for 01/2026:

  • 02:15 - Dev DB1 users & grants backup
  • 02:15 - DB1 Snapshot
  • 02:17 - DB1 Snapshot finished
  • 02:17 - Restore of DB1 snapshot to Dev DB1
  • 02:22 - Restore of DB1 snapshot to Dev DB1 finished
  • 02:22 - Sanitization of ziskejucet on Dev DB1 start
  • 02:40 - Sanitization of ziskejucet on Dev DB1 finished
  • 02:40 - All done.

Availability of Dev DB1 ziskejucet

According to the above, the database ziskejucet is not fully available between ~02:17 and ~02:22.

Roughly between 02:22 and 02:40 are live data visible for the devs. The sanitization is running on the ziskejucet database which is accessible to devs.

Restaring a failed sanitization

When sanitization fails, it is in 99% of casesbecause of some SkySQL error where the replica could not be recovered from the DB1 snapshot. After communication with SkySQL Team and opening a P1 ticket, re-run the sanitization manually.

Please note: Afterwards you will need to import the users and grants manually. See below how.

Importing users and grants after sanitization re-run.

Since the sanitization is a complete restore of DB1 complete with DB1 users and their passwords & etc we need a way how to store Dev DB1 users and passwords somewhere. We do it before the restore when we dump all grants and all users. After the restore is done, we import them again.

This is fine, however sometimes the restore fails. In such a case, even after manual restart of the restore, the old passwords and grants are lost.

In this case you will need to recover the old users and grants manually from the temporary files, stored on disk.

  1. Identify the last good user passwords + grants file:
ls -ltr /tmp

You will see something like this:

-rw------- 1 root root     0 Jan 28 18:00 tmp.1PGzwXJi7b
-rw------- 1 root root     0 Jan 28 21:00 tmp.zXchDnup8H
-rw------- 1 root root     0 Jan 29 00:00 tmp.xCjSKPT06e
-rw------- 1 root root 19122 Jan 29 02:15 tmp.ezYgtGWp1h
-rw------- 1 root root 13841 Jan 29 02:15 tmp.MO21eXyQZY
-rw------- 1 root root     0 Jan 29 03:00 tmp.x1Af6iqHde
-rw------- 1 root root     0 Jan 29 06:00 tmp.lcQqhKK26O
-rw------- 1 root root     0 Jan 29 09:00 tmp.gUFn8Un90T

The sanitization runs at 2AM, so according to time and file size the good files are these two:

-rw------- 1 root root 19122 Jan 29 02:15 tmp.ezYgtGWp1h
-rw------- 1 root root 13841 Jan 29 02:15 tmp.MO21eXyQZY

Check what is what:

head /tmp/tmp.ezYgtGWp1h

CREATE OR REPLACE USER 'petrak'@'%' IDENTIFIED BY PASSWORD 'REDACTED';
CREATE OR REPLACE USER 'petrak@ftmo.com.full'@'%' IDENTIFIED BY PASSWORD 'REDACTED';

Obviously this file is the user passwords file. Copy the files over to your home folder:

cp /tmp/tmp.ezYgtGWp1h /home/gnd/users.sql
cp /tmp/tmp.MO21eXyQZY /home/gnd/grants.sql

  1. A quicker way how to identify the files id just to look at the log of the failed sanitization:
--- 01/29/26 02:15: Starting SkySQL beta creation ---
01/29/26 02:15: Backing up current users to /tmp/tmp.ezYgtGWp1h
01/29/26 02:15: Backing up current grants to /tmp/tmp.MO21eXyQZY
01/29/26 02:15: Starting a complete backup and restore sequence.
  1. Once you identified and copied the files, import first the users, and then the grants. Please note grants are imported into the mysql database. Use the config file in /root/scripts/database-sanitization to be able to connect:
    mariadb --defaults-extra-file=config/ziskejucet_skysql_beta_new.cnf < /home/gnd/users.sql
    mariadb --defaults-extra-file=config/ziskejucet_skysql_beta_new.cnf mysql < /home/gnd/grants.sql
    
    Users + grants are now successfully imported.