Project:MariaDB-K8s: Difference between revisions

From MaRDI portal
No edit summary
No edit summary
Line 61: Line 61:
== Backup ==
== Backup ==


* Backups for the database are scheduled every 24h.
* Backups for the database are scheduled every 24h. We simultaneously keep 7 copies for the last 7 days.
* MariaDB-operator takes care of creating the backup and pushing them into an S3 bucket, as defined in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/backup.yaml backup.yaml].
* MariaDB-operator takes care of creating the backup and pushing them into an S3 bucket, as defined in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/backup.yaml backup.yaml].
* Production backups are prefixed with <code>production</code>, staging backups use the prefix <code>staging</code>.
* Production backups are prefixed with <code>production</code>, staging backups use the prefix <code>staging</code>.

Revision as of 13:50, 26 February 2025

MariaDB configuration

Set up the DB

The main database of the portal is based on MariaDB. Our MariaDB instance is deployed an managed using mariadb-operator, which allows to use Custom Resource Definitions (CRDs) in Kubernetes to declaratively manage our database. In simple terms, this allows us to define all our DB resources (databases, uses, grants...) using YAML files.

The MariaDB operator takes care of the DB replication to spin up a main node and several other read replicas.

The following definition in mariadb.yaml assigns an external IP to access the DB from outside the cluster.

primaryService:
  type: LoadBalancer

secondaryService:
  type: LoadBalancer

primaryService represents the write node whereas secondaryService refers to the read-only replicas. With

kubectl get svc

it is possible to see which specific IP was assigned to these services. These IPs are necessary to set up the connection between the MediaWiki container and the DB.

MediaWiki configuration

MediaWiki requires its SQL user to have the SLAVE MONITOR and BINLOG MONITOR privileges in order for replication to work. These privileges are thus defined in grant.yaml.

The connection between MediaWiki and the different DB nodes can be specified with the following parameter in LocalSettings.php, substituting primaryService-IP and secondaryService-IP by the right values:

$wgLBFactoryConf = array(

    'class' => 'LBFactoryMulti',

    'sectionsByDB' => array(
        'my_wiki' => 's1', 
    ),

    'sectionLoads' => array(
        's1' => array(
            '<primaryService-IP>'   => 0,
            '<secondaryService-IP>' => 50,
        ),
    ),


    'serverTemplate' => array(
        'dbname'     => $wgDBname,
        'user'       => $wgDBuser,
        'password'   => $wgDBpassword,
        'type'       => 'mysql',
        'flags'      => DBO_DEFAULT,
        'max lag'    => 30,
    ),
);

Backup

  • Backups for the database are scheduled every 24h. We simultaneously keep 7 copies for the last 7 days.
  • MariaDB-operator takes care of creating the backup and pushing them into an S3 bucket, as defined in backup.yaml.
  • Production backups are prefixed with production, staging backups use the prefix staging.
  • The files in the S3 bucket can examined using s3cmd:
s3cmd --host=hsm-test-09.zib.de:9001 --host-bucket=hsm-test-09.zib.de:9001 --region=us-east-1 --access_key=<access_key> --secret_key=<secret_key> ls s3://

Restore a database to the cluster

To restore or load a backup file once the DB is running.

1) Copy the backup file inside the main DB pod. The main pod is the only one with write rights. The file can be copied into the /tmp directory with:

kubectl cp ./portal_db_backup.gz mariadb-0:/tmp

2) Unzip and load the file into the database:

cd /tmp
gzip -d portal_db_backup.gz
mariadb -u root -p [enter password]
SOURCE /tmp/portal_db_backup