Database Sanitization 101¶
Notes cover: - Sanitization architecture with repository links - Times and schedules of various sanitization steps - Restoring users and grants after a failed sanitization
Last update: 29.01.2026, gnd
Table of contents¶
[[TOC]]
Introduction¶
DB sanitization is needed for the dev process in order to: - Create a dev version of production database that can be worked on without fear of data destruction - Remove PII (Personally Identifiable Information) from the dev database version
This brings several challenges.
Size and speed
The main production database is a huge monolith of over 650GB data-on-disk (01/2026). To create a up-to-date beta, the data needs to be at least daily exported from the main db and imported into a dev version. This process takes some time (<1 hr, 01/2026).
PID data removal
To remove sensitive data from the main database a set of transformation rules has to be maintained, that describe which data is cleaned up in what way. In many cases the sanitized data need to have a special format, or need to be unique on row-level.
Robustness
A disruption of the sanitization can slow down the development process or even bring it down to a halt in a catastrophic scenario. The sanitization has to be done in a very redundant fashion, so that devs still have at least a old sanitized db to work with in the event of a failure.
All of these challenges and their solutions are addressed in the architecture part.
Sanitization architecture¶
TODO
Sanitization schedules¶
Several databases are being synchronized. Synchronizations runs to several receiving databases. Synchronization also runs from several VMs.
| Production database | Sanitized database | Start | End | VM | Note |
|---|---|---|---|---|---|
| commondb @ 35.234.68.77 | beta_commondb @ 35.234.68.77 | 06:00, daily, | 06:15 | Sanitizer VM | |
| hlf_master @ CloudSQL | sanitized_hlf_master @ CloudSQL | 04:00, daily | 04:01 | Sanitizer VM | |
| ziskejucet @ SkySQL DB1 | ziskejucet @ SkySQL Dev DB1 | 02:00, daily | 03:00 (01/2026) | Sky Sanitizer VM |
Detailed schedules for the ziskejucet @ SkySQL Dev DB1 sanitization¶
For a detailed understanding of the process see the sanitization script: skysql_beta_ziskejucet.sh
Following times are approximate and valid for 01/2026:
- 02:15 - Dev DB1 users & grants backup
- 02:15 - DB1 Snapshot
- 02:17 - DB1 Snapshot finished
- 02:17 - Restore of DB1 snapshot to Dev DB1
- 02:22 - Restore of DB1 snapshot to Dev DB1 finished
- 02:22 - Sanitization of ziskejucet on Dev DB1 start
- 02:40 - Sanitization of ziskejucet on Dev DB1 finished
- 02:40 - All done.
Availability of Dev DB1 ziskejucet
According to the above, the database ziskejucet is not fully available between ~02:17 and ~02:22.
Roughly between 02:22 and 02:40 are live data visible for the devs. The sanitization is running on the ziskejucet database which is accessible to devs.
Restaring a failed sanitization¶
When sanitization fails, it is in 99% of casesbecause of some SkySQL error where the replica could not be recovered from the DB1 snapshot. After communication with SkySQL Team and opening a P1 ticket, re-run the sanitization manually.
Please note: Afterwards you will need to import the users and grants manually. See below how.
Importing users and grants after sanitization re-run.¶
Since the sanitization is a complete restore of DB1 complete with DB1 users and their passwords & etc we need a way how to store Dev DB1 users and passwords somewhere. We do it before the restore when we dump all grants and all users. After the restore is done, we import them again.
This is fine, however sometimes the restore fails. In such a case, even after manual restart of the restore, the old passwords and grants are lost.
In this case you will need to recover the old users and grants manually from the temporary files, stored on disk.
- Identify the last good user passwords + grants file:
You will see something like this:
-rw------- 1 root root 0 Jan 28 18:00 tmp.1PGzwXJi7b
-rw------- 1 root root 0 Jan 28 21:00 tmp.zXchDnup8H
-rw------- 1 root root 0 Jan 29 00:00 tmp.xCjSKPT06e
-rw------- 1 root root 19122 Jan 29 02:15 tmp.ezYgtGWp1h
-rw------- 1 root root 13841 Jan 29 02:15 tmp.MO21eXyQZY
-rw------- 1 root root 0 Jan 29 03:00 tmp.x1Af6iqHde
-rw------- 1 root root 0 Jan 29 06:00 tmp.lcQqhKK26O
-rw------- 1 root root 0 Jan 29 09:00 tmp.gUFn8Un90T
The sanitization runs at 2AM, so according to time and file size the good files are these two:
-rw------- 1 root root 19122 Jan 29 02:15 tmp.ezYgtGWp1h
-rw------- 1 root root 13841 Jan 29 02:15 tmp.MO21eXyQZY
Check what is what:
head /tmp/tmp.ezYgtGWp1h
CREATE OR REPLACE USER 'petrak'@'%' IDENTIFIED BY PASSWORD 'REDACTED';
CREATE OR REPLACE USER 'petrak@ftmo.com.full'@'%' IDENTIFIED BY PASSWORD 'REDACTED';
Obviously this file is the user passwords file. Copy the files over to your home folder:
- A quicker way how to identify the files id just to look at the log of the failed sanitization:
--- 01/29/26 02:15: Starting SkySQL beta creation ---
01/29/26 02:15: Backing up current users to /tmp/tmp.ezYgtGWp1h
01/29/26 02:15: Backing up current grants to /tmp/tmp.MO21eXyQZY
01/29/26 02:15: Starting a complete backup and restore sequence.
- Once you identified and copied the files, import first the users, and then the grants.
Please note grants are imported into the
mysqldatabase. Use the config file in/root/scripts/database-sanitizationto be able to connect: Users + grants are now successfully imported.