AWS DMS is a new type of service used to migrate data from different types of database and data-warehouse. I met some problems when trying to use it in production environment.
Problem 1. When using a MySQL server of AWS RDS as the source of a replication task. It reported errors after started the task:
Last failure message
Last Error Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2673] [1020418] Error Code [10001] : Binary Logging must be enabled for MySQL server; Errors in MySQL server binary logging configuration. Follow all prerequisites for 'MySQL as a source in DMS' from https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html or'MySQL as a target in DMS' from https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.MySQL.html ; Failed while preparing stream component 'st_0_WBK5KGUWQAH6VKEP4I5LH2EFHE'.; Cannot initialize subtask; Stream component 'st_0_WBK5KGUWQAH6VKEP4I5LH2EFHE' terminated [reptask/replicationtask.c:2680] [1020418] Stop Reason FATAL_ERROR Error Level FATAL
The failure message looks terrible. But at least I can find this doc to follow. After changed the configurations as below:
binlog_format |
ROW |
binlog_checksum |
NONE |
binlog_row_image |
FULL |
the error still existed.
The real answer is in here since I used RDS instead of self-managed MySQL. After I add one line Terraform code to enable “automatic backups”:
resource "aws_db_instance" "test_gaf" {
......
backup_retention_period = 10
}
the replication task began to work without the error.
Problem 2. Running replication task for a while to export data from MySQL to AWS Redshift. A new error log appeared in Redshift load logs:
019-10-29T04:41:27 [TARGET_LOAD ]E: RetCode: SQL_ERROR SqlState: XX000 NativeError: 30 Message: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR: User arn:aws:redshift:us-east-1:262284277472:dbuser:analytics-20190902/masteruser is not authorized to assume IAM Role arn:aws:iam::262284277472:role/dms-access-for-endpoint DETAIL: ----------------------------------------------- error: User arn:aws:redshift:us-east-1:262284277472:dbuser:analytics-20190902/masteruser is not authorized to assume IAM Role arn:aws:iam::262284277472:role/dms-access-for-endpoint code: 8001 context: IAM Role=arn:aws:iam::262284277472:role/dms-access-for-endpoint query: 1799 location: xen_aws_credentials_mgr.cpp:321 process: padbmaster [pid=21755] ----------------------------------------------- [1022502] (ar_odbc_stmt.c:4622)
Why masteruser is not authorized? The answer is here. Below is the Terraform code:
data "aws_iam_policy_document" "dms_assume_role" {
statement {
actions = ["sts:AssumeRole"]
principals {
identifiers = ["dms.amazonaws.com"]
type = "Service"
}
}
statement {
actions = ["sts:AssumeRole"]
# By https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Security.APIRole.html,
# we also need principal `redshift.amazonaws.com`
principals {
identifiers = ["redshift.amazonaws.com"]
type = "Service"
}
}
}
Then I had giiven “dms_assume_role” two Trusty Entities
data:image/s3,"s3://crabby-images/7180a/7180aeb3b5b30937fa4c04e5c54482239ba6cc19" alt=""
Problem 3. There was still a error in Redshift load log (so many errors in AWS DMS…):
Error |
Type |
Raw Field Value |
Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SS] |
timestamp |
0000-00-00 00:00:00 |
Seems the answer is here. Therefore I added “acceptanydate=true;timeformat=auto” into the “extra connection settings” in Redshift endpoint. But the error just changed to:
Error |
Type |
Raw Field Value |
Invalid data |
timestamp |
0000-00-00 00:00:00 |
After searching for almost two days, I found that the reason is in the schema of Redshift, which is automatically created by AWS DMS replication task.
CREATE TABLE my (
...
mydate TIMESTAMP DEFAULT '0000-00-00 00:00:00' NOT NULL,
...
)
Since the schema doesn’t allow “mydate” column to be null but the “acceptanydate=true” is trying to transfer “0000-00-00 00:00:00 to null”, the final error is “Invalid data” for Redshift.
The solution for this problem is: create table of Redshift manually to let “mydate” column to be “nullable”, and change the working mode of replication task to “TRUNCATE_BEFORE_LOAD”.