There are many instances where the maintenance task fails on VDP. This article is in specific to VDP when integrated with data domain and moreover when the DDoS version is 6.1 and above.
The checkpoint and HFS tasks were completing fine without issues:
2018/03/19-12:01:04.44235 {0.0} <4301> completed checkpoint maintenance
2018/03/19-12:04:17.71935 {0.0} <4300> starting scheduled checkpoint maintenance
2018/03/19-12:04:40.40012 {0.0} <4301> completed checkpoint maintenance
2018/03/18-12:00:59.49574 {0.0} <4002> starting scheduled hfscheck
2018/03/18-12:04:11.83316 {0.0} <4003> completed hfscheck of cp.20180318120037
2018/03/19-12:01:04.49357 {0.0} <4002> starting scheduled hfscheck
2018/03/19-12:04:16.59187 {0.0} <4003> completed hfscheck of cp.20180319120042
Garbage collection task was the one that was failing:
2018/03/18-12:00:22.29852 {0.0} <4200> starting scheduled garbage collection
2018/03/18-12:00:36.77421 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR
2018/03/19-12:00:23.91138 {0.0} <4200> starting scheduled garbage collection
2018/03/19-12:00:41.77701 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR
The checkpoint and HFS tasks were completing fine without issues:
# dumpmaintlogs --types=cp | grep "<4"
2018/03/19-12:01:04.44235 {0.0} <4301> completed checkpoint maintenance
2018/03/19-12:04:17.71935 {0.0} <4300> starting scheduled checkpoint maintenance
2018/03/19-12:04:40.40012 {0.0} <4301> completed checkpoint maintenance
# dumpmaintlogs --types=hfscheck | grep "<4"
2018/03/18-12:00:59.49574 {0.0} <4002> starting scheduled hfscheck
2018/03/18-12:04:11.83316 {0.0} <4003> completed hfscheck of cp.20180318120037
2018/03/19-12:01:04.49357 {0.0} <4002> starting scheduled hfscheck
2018/03/19-12:04:16.59187 {0.0} <4003> completed hfscheck of cp.20180319120042
Garbage collection task was the one that was failing:
# dumpmaintlogs --types=gc --days=30 | grep "<4"
2018/03/18-12:00:22.29852 {0.0} <4200> starting scheduled garbage collection
2018/03/18-12:00:36.77421 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR
2018/03/19-12:00:23.91138 {0.0} <4200> starting scheduled garbage collection
2018/03/19-12:00:41.77701 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR
From ddrmaint.log located under /usr/local/avamar/var/ddrmaintlogs had the following entry:
Mar 18 12:00:31 VDP01 ddrmaint.bin[14667]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error
Mar 18 12:00:34 VDP01 ddrmaint.bin[14667]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set
Mar 19 12:00:35 VDP01 ddrmaint.bin[13409]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error
Mar 19 12:00:39 VDP01 ddrmaint.bin[13409]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set
It was basically failing to retrieve checkpoint list from the data domain.
Also, the get checkpoint list was failing:
Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: cplist::body - auto checkpoint list failed result code: 0
Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: <4750>Datadomain get checkpoint list operation failed.
Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::execute_cplist: Failed to retrieve snapshot checkpoints from LSU: avamar-1488469814, ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error
Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::body - auto checkpoint list failed result code: 0
Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: <4750>Datadomain get checkpoint list operation failed.
From the mTree LSU of this VDP Server, we noticed that the checkpoints were not expired:
Snapshot Information for MTree: /data/col1/avamar-1488469814
----------------------------------------------
Name Pre-Comp (GiB) Create Date Retain Until Status
----------------- -------------- ----------------- ------------ ------
cp.20171220090039 128533.9 Dec 20 2017 09:00
cp.20171220090418 128543.0 Dec 20 2017 09:04
cp.20171221090040 131703.8 Dec 21 2017 09:00
cp.20171221090415 131712.9 Dec 21 2017 09:04
.
cp.20180318120414 161983.7 Mar 18 2018 12:04
cp.20180319120042 162263.9 Mar 19 2018 12:01
cp.20180319120418 162273.7 Mar 19 2018 12:04
cur.1515764908 125477.9 Jan 12 2018 13:49
----------------- -------------- ----------------- ------------ ------
Snapshot Summary
-------------------
Total: 177
Not expired: 177
Expired: 0
Due to this, all the recent checkpoints on VDP were invalid:
cp.20180228120038 Wed Feb 28 12:00:38 2018 invalid --- --- nodes 1/1 stripes 76
.
cp.20180318120414 Sun Mar 18 12:04:14 2018 invalid --- --- nodes 1/1 stripes 76
cp.20180319120042 Mon Mar 19 12:00:42 2018 invalid --- --- nodes 1/1 stripes 76
cp.20180319120418 Mon Mar 19 12:04:18 2018 invalid --- --- nodes 1/1 stripes 76
The case here is the VDP version was 6.1.x and the data domain OS version was 6.1
====================== Read-DDR-Info ======================
System name : xxx.xxxx.xxxx
System ID : Bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx4
DDBoost user : ddboost
System index : 1
Replication : True
CP Backup : True
Model number : DDxxx
Serialno : Cxxxxxxxx
DDOS version : 6.1.0.21-579789
System attached : 1970-01-01 00:00:00 (0)
System max streams : 16
6.1 DD OS version is not supported for VDP 6.1.x. 6.0.x is the last DD OS version supported for VDP.
So if your DD OS is on 6.1.x then the choice would be to:
> Migrate the VDP to Avamar Virtual Edition (Recommended)
> Rollback DD OS to 6.0.x
Hope this helps!
Mar 18 12:00:31 VDP01 ddrmaint.bin[14667]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error
Mar 18 12:00:34 VDP01 ddrmaint.bin[14667]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set
Mar 19 12:00:35 VDP01 ddrmaint.bin[13409]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error
Mar 19 12:00:39 VDP01 ddrmaint.bin[13409]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set
It was basically failing to retrieve checkpoint list from the data domain.
Also, the get checkpoint list was failing:
Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: cplist::body - auto checkpoint list failed result code: 0
Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: <4750>Datadomain get checkpoint list operation failed.
Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::execute_cplist: Failed to retrieve snapshot checkpoints from LSU: avamar-1488469814, ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error
Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::body - auto checkpoint list failed result code: 0
Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: <4750>Datadomain get checkpoint list operation failed.
From the mTree LSU of this VDP Server, we noticed that the checkpoints were not expired:
# snapshot list mtree /data/col1/avamar-1488469814
Snapshot Information for MTree: /data/col1/avamar-1488469814
----------------------------------------------
Name Pre-Comp (GiB) Create Date Retain Until Status
----------------- -------------- ----------------- ------------ ------
cp.20171220090039 128533.9 Dec 20 2017 09:00
cp.20171220090418 128543.0 Dec 20 2017 09:04
cp.20171221090040 131703.8 Dec 21 2017 09:00
cp.20171221090415 131712.9 Dec 21 2017 09:04
.
cp.20180318120414 161983.7 Mar 18 2018 12:04
cp.20180319120042 162263.9 Mar 19 2018 12:01
cp.20180319120418 162273.7 Mar 19 2018 12:04
cur.1515764908 125477.9 Jan 12 2018 13:49
----------------- -------------- ----------------- ------------ ------
Snapshot Summary
-------------------
Total: 177
Not expired: 177
Expired: 0
Due to this, all the recent checkpoints on VDP were invalid:
# cplist
cp.20180228120038 Wed Feb 28 12:00:38 2018 invalid --- --- nodes 1/1 stripes 76
.
cp.20180318120414 Sun Mar 18 12:04:14 2018 invalid --- --- nodes 1/1 stripes 76
cp.20180319120042 Mon Mar 19 12:00:42 2018 invalid --- --- nodes 1/1 stripes 76
cp.20180319120418 Mon Mar 19 12:04:18 2018 invalid --- --- nodes 1/1 stripes 76
The case here is the VDP version was 6.1.x and the data domain OS version was 6.1
# ddrmaint read-ddr-info --format=full
====================== Read-DDR-Info ======================
System name : xxx.xxxx.xxxx
System ID : Bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx4
DDBoost user : ddboost
System index : 1
Replication : True
CP Backup : True
Model number : DDxxx
Serialno : Cxxxxxxxx
DDOS version : 6.1.0.21-579789
System attached : 1970-01-01 00:00:00 (0)
System max streams : 16
6.1 DD OS version is not supported for VDP 6.1.x. 6.0.x is the last DD OS version supported for VDP.
So if your DD OS is on 6.1.x then the choice would be to:
> Migrate the VDP to Avamar Virtual Edition (Recommended)
> Rollback DD OS to 6.0.x
Hope this helps!