Before I begin, I must apologize that this post is two years late! Back when I was still in CSS supporting AlwaysOn, I wanted to get this entire series blogged, but as you can imagine CSS engineers are usually too busy to do just about anything else. Now that I’ve switch roles, I’m making more of an effort to make the time to blog on a regular basis. So please forgive me! I’ll try to do better. Picking up where we left off in the first post of this series:
Describe the log backup process on a ROSR
This is the second post in a planned series of articles relating to SQL Server Transaction Log Backups of databases using a read-only secondary replica (ROSR). Since the introduction of ROSRs in SQL Server 2012, administrators have been able to perform full database (copy only) backups as well as transaction log backups on any of the secondary replicas within the same Availability Group. This allows the administrator to reduce or eliminate resource contention between production activity and backups. This article describes the steps and communications between the primary and secondary that take place when the secondary performs a backup.
What is the process for taking a backup on an ROSR?
The following is a high-level description showing the sequence of events that must take place on both the secondary and the primary to complete a successful backup. The valid types of backups that can be taken on an ROSR are: FULL and LOG. For FULL backups, the following are supported when using the COPY_ONLY option: DATABASE, FILE, or FILEGROUP. Differential backups are not supported against secondary replicas.
- The secondary first checks to see that the BACKUP command is not inside another transaction.
- The secondary then acquires a “backup” lock on the database so other sessions on the secondary cannot do a backup at the same time.
- The secondary sends a HADR message (HadrMsgTypeBackupSyncMsg) to the primary for it to request a “backup” lock.
- The primary acquires a “backup lock” if it can.
- The primary sends a HADR message (HadrMsgTypeBackupInfoMsg ) that includes the backup LSN of where to start (the “first_lsn” stored in msdb.dbo.backupset for this backup).
- The secondary performs the requested backup.
- The secondary sends a HADR message (HadrMsgTypeBackupInfoMsg ) back to the primary signifying backup completion and notifying the primary of the “last_lsn” backed up (the “last_lsn” stored in msdb.dbo.backupset for this backup).
- The primary issues a checkpoint.
- The primary releases the “backup lock”.
- The primary sends a HADR message (HadrMsgTypeBackupSyncMsg) which will signal the secondary the primary has checkpointed and released its lock.
- The secondary releases its “backup” lock.
- The secondary updates its MSDB database of the backup.
The next screen shot shows an Excel spreadsheet of the Xevents from both servers listed in order and labelled with the same step numbers as above so you can see in one comprehensive view the sequence of events.
|Blue||Signifies the event was captured on the Primary.|
|Orange||Signifies the event was captured on the Secondary.|
|Yellow||Messages sent from Secondary à Primary (along with the corresponding “recv” on the primary.|
|Light pink||Messages sent from Primary à Secondary (along with the corresponding “recv” on the secondary.|
|Light blue||Locks and checkpoints during the process.|
|Green||TSQL Statements issued: “backup” & “insert”
(into the msdb backupset table).
What do the XEvents look like for tracking this process?
The next screen shot is from the Extended Events viewer in SSMS for the Secondary. It shows most of the XEvents captured during the conversation. Some were omitted because they are not covered in this article.
This is a screen shot of the Xevents from the Primary’s perspective. (Again a few have been filtered out for clarity of this article.)
Zooming in on a subset of the Xevents that show the request from the secondary to the primary that essentially says – “I want to do a backup. Tell me if it’s okay and where to start.”
- You can see the secondary “Send” the HadrMsgTypeBackupSyncMsg (step 3 from above) on the left. Correspondingly you can see the “Recv” of that same message on the primary (right).
- In the blue square boxes, you can see the primary on the right “Send” the backup LSN “00000028:00000090:0001”, and the secondary “Recv” it on the left (Step 5 from above).
- Finally, after the secondary completes the backup, in the black ovals we see the secondary “Send” the backup LSN “00000028:00000120:0001” to the primary, and the primary “Recv” it. This is Step 7 above.
Okay, I see the backup LSN go back and forth, how can I relate that to my backups and the LSNs I see in MSDB tables? They don’t look the same.
In the Xevent screen shots above we saw two backup LSNs sent back and forth. The first was from the primary to the secondary: “00000028:00000090:0001”. The second was from the secondary to the primary indicating where it finished: “00000028:00000120:0001”. These two LSNs are the same exact LSNs as can be seen in MSDB.dbo.BackupSet for this backup – except the LSN in the Xevent is HEX based and the one in BackupSet is Decimal based.
This is where part 1 of this blog series comes in handy. If we take the various components of the HEX based LSN and convert them to decimal, we’ll see they are the same.
To convert 00000028:00000090:0001 to decimal, take each section and convert to decimal.
|Hex value||Decimal value|
After left padding the second two values with 0s we get a “decimal” based LSN:
Sometimes you will also see the decimal LSN in the following format – still with colon separators:
I have attached a script called ConvertHexLSN_to_DecLSN.sql to this article that contains two TSQL functions that together, will convert a Hex based LSN to decimal format. They are useful when you want to compare LSNs you see in Xevents or from fn_dblog() to LSNs you see in HADR related DMVs, backupsets or the HADR dashboards. Feel free to use it. It is _not_ guaranteed to be bug free and should only be used for testing purposes.
I have included here a zip file that contains two SQL files that i used in writing the article. The first one mentioned above, to convert Hex LSNs to Dec LSNs and the other to do the event captures. The event captures (BackupEvents.SQL) assumes that the database ID is the same on both the primary and secondary – which it may not be. Adjust the script accordingly if you want to try and capture the events yourself. I also included two XEL files from when I captured the events if you just want to load them and look through them. Feel free to contact me if you have any questions or comments about the scripts: firstname.lastname@example.org.
When performing a log backup on an AlwaysOn read-only secondary replica, there is a series of communications that take place between the secondary and the primary to ensure only 1 log backup at a time is being done. We can see the conversation by setting up an extended event session to look at specific AlwaysOn messages that go back and for—which include the primary telling the secondary where to start, and the secondary telling the primary where it finished.
Coming Up: Part 3: Various transaction log backup scenarios
What happens if I attempt to perform a transaction log backup on a AlwaysOn ROSR that is behind? Disconnected? From multiple secondary replicas at the same time? These are some of the scenarios we will discuss in the next post – which I promise will be next month! 😉 Remember to seek well and prosper!