This week I received the Synology DS1515+ NAS I ordered and will go about setting it up as a home file server. In a subsequent post, I’ll write about clustering with DS1812+. In Part 1, I’d like to cover following scenarios:
Part 1 (this post)
Initial setup and basics.
- RAID10 and SHR2 experiments with varying disk sizes
- Speed of change operations
- Simulated outages.
- Throughput observations
Part 2 (will post soon)
High availability with another Synology unit. In this case, with my older DS1812+
- Initial setup
- WAN considerations, if any.
Disaster/Recovery to the other Synology
- Remote site
On this 2-part series I’ll go into a bit of advanced topics and cover things that most likely you won’t have the time to do yourself. Idea is to help you save time in protecting your family memories, digital media and other files at home or at work. Being a technology enthusiast, I happen to touch a variety of tech toys and will do my best to bring out some related experiences. Also worth noting, I’m not a Synology expert and will cover every detail I can as I learn.
For many years, I have been using Windows Server 2012 R2 Essentials for:
- Backup of client PCs around home, as well as for my extended family members
- File server storage for media
- Hypervisor to host such things as Minecraft servers for my kids
- Various experiments
Windows Server based solution goes back to Windows Home Server v1 days, and been upgrading to latest versions as they come.
I have also been using Synology DS1812+ for:
- Backup target for the content stored on Windows Server 2012 R2 above
- Additional file server storage
- Various storage and experiments Synology apps out there.
In my earlier blog post on this, I noted some pros/cons for each of the above (note earlier versions though). The client backup challenge remains a big gap in DS1515+. Frankly I don’t get how Synology is not fixing this. The out of the box sync solution is a joke, and is not a backup to begin with. So called backup apps are not centrally managed in DSM, eliminating chances of deduplication among other management functions. Windows Server 2012 R2 Essentials backup is leaps and bounds ahead on backup aspect, but we’ll get to that a bit later.
By the end of this post series, my hope is that you’ll know if DS1515+ and if Synology clustering is right for your needs.
With that, let’s get started.
Unboxing and First-time Setup
Here’s how the DS1515+ looks:
Now let’s power it on and see what happens. Keep in mind, the Quick Start guide does have the disk installation as a required step before powering on. But I will do it without disks regardless. This will tell me couple of things:
- Can it handle no-disk situation gracefully
- What functionality, if any, exists without any disks in place.
So with the unit powered on and having waited roughly about 30 seconds, on a laptop wired to the same switch, I entered:
… and promptly the following came up. If you have another host with name “diskstation” on your network, this can potentially be a problem. You’ll then probably need to go to your DHCP server and figure out the IP the DS1515+ obtained, and attempt to go to that address and port 5000 instead.
So now we know how it behaves without disks, let’s go ahead and insert one, a 750GB drive and see what happens. Note that I will report if I do or don’t power-cycle the unit. For now, we’re keeping it ON and hot-inserting the drives as they come.
At this point I should note that I do appreciate the drive trays – they don’t require any screws. These things just hold the drive in place, making it super easy to replace. Kudos to Synology for this improvement.
It’s been 60 seconds or so, let’s click on that “Connect again” button now:
It did recognize the disk and launched the welcome wizard. Clicking next gets me to:
I will let it download the latest DSM per its recommendation… From there we got to this point:
Note that I unchecked the SHR volume creation, as I intend to play with it a lot more.
After final confirmation, off it begins.
Formatting phase took about 3-5 seconds. From there it started downloading… I have 50MBits download service at home, and…
It took about 8 minutes to download and reach this phase:
At the end of the download phase, fans maxed out – perhaps it was doing an OS reset inside, then came down to normal levels.
All in all, in 10 minutes total, I have this logon prompt:
Just in case it isn’t obvious to some readers, “T” is the name I gave to this install.
Upon first logon, system asks how I should go about updates. Well folks, if you ran any sort of real system, you should know that you don’t take this decision lightly. Personally, I never allow uncontrolled/untested updates. Synology forums are full of automatic updates breaking people’s systems, causing lengthy troubleshooting efforts and/or permanent data loss. You want to always be notified of updates yes, but to actually implement them, you need to have a backup in place. With that, of course I picked the right option for me.
Testing if Synology DS1515+ is stateless or not
I have been curious whether Synology unit stateless or not. Meaning, can I simply swap disks and immediately run a completely different install without going through full reconfiguration process? You know, just we can on a standard PC.
For this purpose, I’ll logon, shutdown and then remove the disk and power on again.
Action: Shutdown, remove all disks, Power On.
Let’s see how it behaves:
Right now there are no disks in it. The name “T” is gone, replaced with a generic “DISKSTATION”. Let me see if I can logon with the credentials I provided.
Nope. The credentials do not work (which is nice, in that the credential info is not stored on the unit itself). But then again, it’s not automatically dropping back to the OOBE mode. So I’m pretty much stuck. I tried blank password too, I could go nowhere at this dialog.
Quick Internet research revealed following article, which implies that I now have a DS1515+ that is married to a specific installation of the OS. Unlike a PC/Server where all the state information is stored on disks, here we have much deeper relationship between hardware and the OS. We’ll accept that and move on. Worth noting that it’ll be possible to completely reset and go back to OOBE state. I won’t do that as I learnt what I needed to learn.
So I will power off, insert the disk on which I have made the installation, then power back on. Note that I pressed and held the power button for about 10 seconds and like a PC, it went off.
Alright. After this little experiment, let’s move on to disk configuration flexibilities. First, here how it again shows its configured name as “T”, and sure enough logs me in with the password configured. All is well.
Here we begin the scenario testing.
#1. Create a volume with RAID10 protection using 4 drives, 2x750GB + 2x1TB
In 30 seconds it reached this phase:
In 60 seconds total it reached this state:
I remember now from DS1812+ tests as well, that doing any volume operation with Synology was quite slow. At any rate, in another 5-6 minutes it finished optimization phase as well.
Keeping in mind that Synology has only LAN1 port connected to a gigabit switch, I will go ahead and attempt a large file copy:
|Volume: 2×750+2×1000 RAID10||Take 1||Take 2|
|Single 13GB XCopy
(Windows Server 2012 R2 to DS1515+ on same switch, gigabit wired network)
(avg. of 120MB/sec throughput)
(avg. of 121MB/sec throughput)
Here’s the Performance Monitor view from the sending host, Windows Server 2012 R2:
122Mbytes/sec maximum. Average number there is not reflective of the actuals because it’s the average of the current view, which includes a few seconds of inactivity after the completion (i.e. actual average is higher). Further, note the scale of both lines are different (Green one is 100 times the red one). In short, all is well here. We can saturate single LAN1 connection consistently for as long as it takes to copy a 13GB file. So there is no write-back cache related skewed throughput – at least not at gigabit speed.
Configuring Link Aggregation (IEEE 802.3ad, Bonding)
Next up, let’s connect the 2nd Ethernet cable, enable LAN aggregation. My switch doesn’t do link aggregation but it doesn’t hurt to test how it behaves.
Here, both ports are connected:
Choose Create / Create Bond:
Then enable Link Aggregation using default settings:
From there it takes about 30 seconds:
…and the admin console refreshes, page reloads. On the plus side, despite connected switch not supporting Link Aggregation, it didn’t fail to reconnect. So kudos for handling this scenario and still failing back to normal operations. Here’s how things look – to be perfectly clear, below is expected. My $30 switch is not supporting 802.3ad connection. But now we know how it behaves and how it reports, we also know it’s not intrusive to the network operations.
Further, I have performed above tests again – performance did not drop. Now let’s do some cable pull tests.
Pull LAN1 cable:
All systems functional. Copy operations succeeded at the same speed as before.
(while 13GB copy ongoing) Insert LAN1, pull LAN2 cable:
All systems functional. Copy operations continued/succeeded at the same speed as before. Not even a blip on the throughput. This is good. I wasn’t expecting this level of transparent failover capability. I’ll leave it like this for normal operations later. Who knows, maybe I’ll add a better switch that can do 802.3ad.
#2. Disk pull test in RAID10
With that, here I’m pulling one of the 1TB disks. First thing is that it starts to BEEP. To silence it, I went here and clicked that button:
Now. I have reinserted the same disk back into its slot within about 20 seconds.
Observation here is that Synology doesn’t like such things as temporary loss of disks. It immediately decided that the disk is bad, and marked it as non-initialized, as in here:
From there I went to see how our Volume is doing:
As expected, it is degraded. Next up is to see how fast it repairs – to initiate repair, you click on Manage button and choose Repair, then choose a disk. As in:
2 minutes in…
After 6-7 minutes…
While this repair process is ongoing, let’s repeat the same file copy operation and see if performance drops:
Looks like it doesn’t drop at all. Still can saturate the gigabit connection. Keep in mind, there is always RAID rebuild overhead, but that we are on RAID10 (fastest with redundancy), that we are unable to saturate disk IO due to lack of link aggregating switch, this does not mean there will never be RAID rebuild overhead. If you have tests that show impact under those circumstances, drop me a link in the comments and I’ll add them here.
At this point the repair process is ongoing for 90 minutes, reached 46%, 1.33TB volume. It really is slow. Next test here is to pull another disk while the repair is ongoing and see what happens:
Action: I pulled disk #2. Note that RAID10 is stripe of mirrors (see http://www.thegeekstuff.com/2011/10/raid10-vs-raid01/)
In this case, there is no indication of which disk is mirroring which, therefore I have some chance of pulling the wrong disk to kill the volume. Let’s see:
Ok I got lucky. Volume is still online and able to take file operations at the same speed as before. For those who couldn’t follow; here’s how we reached this state:
- Insert 2×750 + 2×1000 disks.
- Create a RAID10 volume spanning across all 4
- Remove Disk #5 (one of the 1000 disks)
- Insert Disk #5 back, choose Repair
- 90 minutes later, (before repair could complete), pull Disk #2 (one of the 750 disks)
Given that RAID10 is stripe of mirrors, it is inherently vulnerable to 2nd disk outage but as evidenced here, it doesn’t necessarily mean every 2-disk outage will take the volume down. Another observation is that DS1515+ did NOT start to beep.
I will go ahead and remove another disk. This will surely kill it. Why I’m sure? Well, this is the 3rd disk on a 4-disk RAID10 volume. It is for sure going down.
Action: Remove disk #3
Yes it did. This time beeping started as well. Volume went down.
It also suggested this process called “File System Check”. Which I will ignore for now.
Next question is, data should already be there. Can I shutdown, insert all the disks back and hope for the best? Trying that:
Action: Shutdown. While offline, insert pulled disks, power on. What comes next is interesting:
At this point, the volume is… online. Keep in mind, if I were to reinsert the disks while hot, they’d get marked as non-initialized. So having it see the disks present during boot made a difference. I did not know this.
Here’s the view from Storage Manager.
Note how “System Partition Failed” is noted in those disks? Well, let’s check all other disks:
So the 1st disk I inserted and the one that is NOT part of this RAID10 volume and disk-pull tests is happy and remain as “initialized”. Looks like that is a healthy system partition copy as well.
Results of RAID10 disk pull tests can be summarized as:
- Synology can handle 2 disk outages if you by chance pull different pairs in different mirror sets. I have found no way to identify which disk belongs to which mirror group inside a RAID10 volume.
- There is no observable performance impact of disk outages and rebuilds in RAID10, up to a gigabit connection. Higher rate transfers were unable to be tested.
- If you lose multiple disks for any reason, consider shutting down the unit first before reinserting multiple disks. There is a higher chance that it’ll automatically remount the volume and start repair process.
Now let’s check SHR2. Synology Hybrid Raid with redundancy for up to 2 disk outages
Here’s how I created the SHR2 volume:
For starters, I initiated file copy process again. Results are similar to RAID10. It can take 122MB/sec sustained without any problem.
Question is, what happens when 1 disk is out and then 2nd disk also out. From the following initial state…
I proceed to pull one disk.
Action: Remove disk #2
Then attempted the same copy process as before. Performance dropped significantly by about 50%, from 122MB/sec in RAID10 disk pull scenario, we’re looking at:
63MB/sec in SHR2 with one disk out.
Now on to the 2nd disk outage.
Action: Remove disk #3
Same as single disk outage.
I then reinserted both disks and repeated the copy process again. Repair process didn’t impact the performance any further, and remained the same as 63MB/sec.
As a summary of the SHR2 disk pull test, here’s the performance evaluation test table:
|Volume: 2×750+2×1000 SHR2
(Windows Server 2012 R2 to DS1515+ on same switch, gigabit wired network)
|Single 13GB XCopy throughput
|1 disk pulled||63MB/sec|
|2 disks pulled||63MB/sec|
|Both disks reinserted and repair in progress||63MB/sec|
With RAID10 and SHR2 scenarios covered, I’ll end part 1 here and will post the clustering setup notes in few weeks. Stay tuned. If there are any questions/suggestions/corrections, please post a comment and I’ll be happy to address them.
Categories: Computers and Internet