April 21, 2017
Performing distcp with Namenode HA
April 21, 2017
SQL Server 2016 Standard Edition now supports what was an Enterprise Edition-only level feature – Availability Groups! Of course, there are limitations imposed from what you get in Enterprise, and I will cover those later. I am excited as one of our customers wants to start using this feature in production, so I figured I'd better get up close and personal with this new offering and set up a test environment to become familiar with the gotchas before implementing this for real. But first, I needed an environment with room for at least three VMs to play with this. What options do I have?
I decided it was time to get down and dirty with Azure VM and crack open the seal of my MSDN subscription Azure credits. I'd used Azure storage and backup before, but not VM as of yet. With my extensive VMware experience, it was very easy to jump in and get right to work. I created three VMs in short order – one domain controller and two SQL Servers. I realized you can create SQL 2016 AGs without a domain, but since the customer will be using a domain, I wanted to be familiar with this method. I used Windows Server 2012 R2 as that is also what the customer is employing. I also need to keep in mind that they are using linked servers, so I need to be sure to enable the options for WITH DTC_SUPPORT = PER_DB or the equivalent checkbox in the GUI setup as well as install patch KB3090973 for Windows Server 2012 R2. Reference: https://blogs.technet.microsoft.com/dataplatform/2016/01/25/sql-server-2016-dtc-support-in-availability-groups/
Being pretty inexperienced with Azure, after creating the three VMs, creating the domain, joining the SQL VMs to the domain, and installing SQL on both VMs, I tried to create the cluster. As Murphy predicted, it failed. The reason? There is a construct called an Availability Set in Azure that you need to set up prior to creating your virtual machines in order for clustering to work. As defined by the link below: https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-portal-sql-alwayson-availability-groups-manual?toc=%2fazure%2fvirtual-machines%2fwindows%2ftoc.json
"Before creating virtual machines, you need to create availability sets. Availability sets reduce downtime for planned or unplanned maintenance events. An Azure availability set is a logical group of resources that Azure places on physical fault domains and update domains. A fault domain ensures that the members of the availability set have separate power and network resources. An update domain ensures that members of the availability set are not brought down for maintenance at the same time."
So instead of digging in manually and attempting to set this all up with my existing knowledge, I read a bit more of the previously linked article above, which is the manual set up version of the instructions. It does state:
"Completing this tutorial takes a few hours because you must build and configure several Azure virtual machines. You can also automatically build this entire solution. In the Azure Portal, there is a gallery setup for Always On availability groups with a listener. The gallery setup configures everything you need for availability groups automatically."
Sounded good to me, so I chose the template for 5 VMs with a listener. It prompts you for the domain name, AG name, listener details, and everything else you would set manually and spawns it all for you in much less time than a few hours. It did take some time for it all to appear and configure (maybe an hour?), but it did eventually complete after some patience.
However, I had a meeting to attend while this was all building, so I let it keep running. The meeting was at the end of the day, so I skedaddled home afterwards and forgot all about the running environment until
the next morning. I logged into Azure to find out that my MSDN allocated funds for that month had been eaten up by this environment running just a few hours. Seriously? Microsoft gives you just enough credits with an MSDN subscription to give you but one lick of the Azure lollipop, but then jerks it away and replaces it with a card swiper. Alas, I was out of luck to continue testing this until the current monthly billing cycle ended.
Instead of waiting, I decided to see if I could get this running in VirtualBox on my laptop with only 8GB of RAM. I originally went to Azure as I didn't think I could get a three-VM environment running successfully on a machine with this amount of resources, but to my surprise, it was actually successful. Also, since the networking all ran within my local environment, there were no Availability Set-type constructs to worry about like in Azure.
Not ideal as 1GB RAM VMs, but they fit!
After setting up all three VMs in their particular roles while networking was configured to NAT so I could get all the necessary Windows Server and SQL Server updates, I changed the network to Host-Only so I could have things run privately within my laptop. I also created an additional Host-Only network inside VirtualBox so I could switch individual machines over to the other network to test for split-brain scenarios.
I installed the Failover Clustering feature on each VM, including the DC. The reason I installed this role on the domain controller was because I wanted to centrally manage the cluster from the console of this machine only. I also installed SQL Server Management Studio on the domain controller for the same reason. I launched Failover Cluster Manager and started the cluster validation wizard. I validated a cluster configuration and created the cluster using my fresh SQL1 and SQL2 machines. The cluster was then created after the validation was successful.
I then created a share on the domain controller to use as a witness for the cluster. This makes the cluster more reliable as this helps avoid split-brain when the primary node fails. Without that third vote, if the primary fails, the cluster is down as 50% of the votes is not a majority. I configured Full Control
permissions for the SQL1$ and SQL2$ machine accounts to this share as SQL was running under the default service accounts on these machines. The SQL service accounts need full access to read and write to these shares for the file share witness to work properly. I then used the Failover Cluster Manager to modify the quorum configuration to use the file share witness.
Can I get a witness?
Why yes. Yes I can.
Choose a file share path, but choose wisely. For while the true file share path will bring you green check marks in the validation screen, the false path will take them from you.
Now witness the firepower of this fully armed and operational battle station! I mean, cluster…
Now that the Windows Server Failover Cluster was set up, it was time to create the Availability Group. But before this can be done in SSMS, there is one odd trick I had to perform in order to enable AGs on each instance. I had to visit my old pal SQL Server Configuration Manager and Enable AlwaysOn Availability Groups at the instance level.
Checking this box is equivalent to taking the red pill
Of course, I was greeted with one of the many pleasant Microsoft dialog boxes letting me know "You knew you were going to have to bounce the instance for this, right?"
So I can bounce the service at my leisure? Good deal!
Clicking OK did not bounce the instance for me. However, if I wanted to disable AGs for this instance, it will not only require a service restart if I wanted to save the setting, but it would restart it right then and there if I clicked Yes. Beware!
So you are going to force me to bounce service, I think? This is mostly clear to the initiated. But to the uninitiated, this could catch you off-guard. Be careful hitting Yes!
Now that I had enabled both instances for AGs, it was time to open up SSMS, right-click that new AlwaysOn High Availability node and choose New Availability Group Wizard. Ah, back to the warm comfort of Management Studio at last. The first thing that the wizard told me was that I needed to take a backup of the database in question. It clearly stated "Full backup required" in the first wizard step. I was using Microsoft's Wide World Importers canned database that I had just restored, but I hadn't taken any backups yet, so I created a backup manually. Going back into the AG Setup wizard, it then stated "Meets prerequisites" and allowed me to click Next. The next screen asked for several pieces of information, the first of which were the replica instances that I wanted to use. SQL1 was already chosen as this was where the database originally resided. There was an Add Replica button as well as an Add Azure Replica button which is a compelling deployment option which I would like to try when I have MSDN credits again. For the time being, I chose Add Replica and chose the SQL2 instance.
Other options to consider are Automatic Failover, Synchronous Commit, and Readable Secondary. Automatic Failover is just as it sounds. The first question to my mind was why would I not want this? This is a high availability feature; why would I not want this to happen automatically? Not everyone uses AGs for HA. In Enterprise Edition, you can set this up as a Readable Secondary which means you can perform SELECTs as well as backups against it so you can pull both read and backup workloads off your primary replica. Also, the network between the two replicas may not be appropriate for production traffic from the primary site, so this AG could just be a warm backup of sorts. Since Standard Edition AGs don't allow for Readable Secondaries, and the fact that this feature is supposed to eliminate the aging Database Mirroring feature, I would say that most Standard Edition AGs would likely be used for HA purposes. Synchronous Commit means every update, insert, delete, alter, etc. gets committed to all secondaries before the transaction is reported to the client application as complete. Not choosing this option means that the transaction will be reported complete as soon as it commits to the primary replica and the secondaries will commit as soon as possible. For a high availability scenario, checking Synchronous Commit is what I wanted.
The Endpoints tab is for managing the AG service endpoint hostnames, ports, and service accounts for the internal communication between the replicas. Microsoft recommends using a separate network for AG internal communication, similar to Failover Cluster management traffic. I could have added second NICs to each SQL VM on the secondary Host-Only network in VirtualBox and changed either the hostname or IP address that is bound to these NICs here to use that private network, but I didn't. I could also change the ports on either side for obscurity or some other reason. If I was multi-instancing (please don't), I would need to use a separate port for each instance. I kept the defaults.
The Backup Preferences tab is where I selected how I wanted backups to be routed. Prefer Secondary will attempt to back up to a secondary replica if one is available. Otherwise, it will back up on the primary replica. Secondary Only will not allow a backup to the primary replica, even if no secondary is available. Primary will only backup on the primary replica. Any Replica will not take preference of primary or secondary. And below those options, there is an area where I could give relative weights to all replicas as well as exclude particular replicas. Of course, all these options are only available in Enterprise Edition since the primary replica can only be backed up in Standard Edition. The below screen is normally grayed out in Standard Edition.
GUI's can be awesome. This is one such example.
The Listener tab can be used to create the Availability Group Listener in this wizard, or I could choose to perform this individual step later. The Availability Group Listener is the DNS name, IP address, and port that the AG will use for client connections. I provided a unique DNS name and IP address for the listener. The port number can be 1433 or otherwise.
By choosing the Full option on the next screen, I could have the wizard automatically backup and restore the database to each replica via a network share that all the replica SQL Server Accounts have read and write access to. Or I could manually restore the database to the secondary replica servers and choose Join Only to initialize synchronization or Skip initial data synchronization if you are not ready to perform this step yet. I chose full and cheated by using the File Share Witness share I had previously created that already had all the permissions necessary.
Finally, you can validate that your AG configuration is free from errors and then create the Availability Group. Hooray!
Tune in to Part 2 of this blog where I take a hammer to various aspect of this setup to see just how robust it all is.