Advertisement

Health Monitoring and Disaster Recovery

  • Rob Garrett
Chapter

Abstract

As more and more organizations deploy SharePoint within an enterprise, and user adoption of the SharePoint solutions snowballs with each deployment, disaster recovery becomes very important. SharePoint 2013 has the capability to host terabytes of important data, so backup of this data is likely to be high on the agenda for any IT group that maintains SharePoint in an organization.

Keywords

Site Collection Disaster Recovery Virtual Server Incoming Request Content Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

As more and more organizations deploy SharePoint within an enterprise, and user adoption of the SharePoint solutions snowballs with each deployment, disaster recovery becomes very important. SharePoint 2013 has the capability to host terabytes of important data, so backup of this data is likely to be high on the agenda for any IT group that maintains SharePoint in an organization.

SharePoint includes capabilities to make backup and restoration easier than in the past, and SharePoint continues to provide good backup and restore functionality to maintain the integrity of important data. I devote a large chunk of this chapter to the administration of disaster recovery features in SharePoint 2013, but, as the title suggestions, this chapter also contains details on health and monitoring of a SharePoint 2013 infrastructure.

SharePoint provides extensive logging, but previous versions before SharePoint 2010 lacked some key monitoring functionality that administrators require to ensure that their SharePoint solution is operating at peak performance. Administrators of SharePoint 2007 could consult the ULS logs and the Windows event log, but these features require a certain amount of proactive behavior from the SharePoint administrator. SharePoint 2010 and 2013 include sophisticated health and monitoring features to alert the administrator when SharePoint is feeling a little under the weather.

After reading this chapter, any SharePoint administrator will, I am confident, be able to provide his or her organization with the peace of mind that its data integrity is intact and that the organization can count on bounce back of its SharePoint service in the event of downtime or disaster.

Planning for Disaster Recovery

It is never a happy day for the IT group when an online service goes down, and this includes SharePoint. As fantastic as SharePoint is, it is inevitable that at some point in the life cycle, your SharePoint solution will suffer from downtime. Of course, downtime may occur for any number of reasons: human error, underlying hardware failure, power outage, faulty customizations, and so on. Since failure cannot be entirely averted, your role as a SharePoint administrator is to account for such downtime and restore service to the users of the platform in a timely manner. Planning for and recovering from loss of service is what I refer as to as planning for disaster recovery.

Minimizing downtime and averting loss in a disaster involves proactive processes and planning. Those unfortunate readers who have experienced loss of data are likely all too familiar with data backup, which is one aspect of disaster recovery—I will discuss managing content and data integrity shortly in this chapter. Another important aspect of disaster recovery includes techniques to minimize service downtime.

Minimizing downtime of a service factors both the total time to recover the service and the point in time from which recovery resumes. In short, if recovery consists of restoring data in a SharePoint site collection because of database corruption, then the time to restore the database from backup and the time when the last backup took place are both important factors for the success of restoration of the SharePoint site collection. A speedy restore is one thing, but if the data is already three months old then, depending on the frequency of change of the live data, the restoration is not necessarily successful.

Data/content recovery is one piece of a good disaster recovery plan—restoration of system hardware, the underlying operating system, system software, and configuration are all part of the plan. Since this book is about SharePoint administration, the topics concerning hardware and operating system recovery are outside its scope, but I will say that virtualized platforms and snapshots now play a major part in alleviating many of the ills associated with hardware failure and/or operating system failure. At a conceptual (and practical) level, consider the techniques in the following sections to minimize downtime and provide warm recovery of service.

Warm recovery is the quickest form of recovery in the event of a disaster and typically involves a level of hardware and software redundancy. Conversely, cold recovery refers to the restoration of service from scratch in a completely inoperable state. Cold recovery typically involves restoration of data from an offline backup store. A good disaster recovery strategy involves both warm and cold recovery methods.

Load-Balanced Service

Load balancing involves either a hardware or a software load balancer, which intercepts all incoming web traffic on a specific IP address and redirects it to one of at least two web servers to service the request. The load balancer directs traffic either to the server with the least load (intelligent load balancing) or in turn, based on which server served the previous request (round-robin).

Load balancing serves two purposes: distributing user requests load and warm redundancy in the event of a server failure that was serving requests. SharePoint 2013 includes a new request manager service to manage intelligently which servers in a multiple server farm handle which requests. I shall discuss the request manager later in this chapter.

Load balancing SharePoint consists of pointing a configured load balancer to multiple front-end SharePoint servers in the farm that serve pages. A SharePoint farm may include as many front-end web and application servers as the infrastructure can provide; thus, scaling out to handle more traffic is simply a case of adding a new web server to the farm and registering the IP with the load balancer.

As well as providing for distributed load, most load balancers can detect if one of the servers in the pool is not responding and then redirect all traffic to the other responding servers. Large enterprise organizations that have the capability to host different servers in multiple geographic locations may redirect traffic to passive SharePoint servers, or completely mirrored SharePoint farms, to achieve redundancy and rapid recovery if a primary site hosting the main SharePoint infrastructure fails.

SQL Server Failover Clustering

SQL Server clustering consists of multiple SQL Server nodes, managed by a root cluster that provides redundancy at the SQL Server application level.

A cluster typically consists of an active node and at least one passive node, although you can have multiple nodes. The cluster maintains all nodes so that any database write operations update both the active and passive nodes, but the active node is handling all of the incoming requests. In the event that the active node fails, then the Windows Failover Cluster Service switches over to use one of the passive nodes (running on different hardware). I should highlight some important points about SQL clustering:
  • SQL clustering does not help performance, since only one node of the cluster is active at any one time

  • Recovery in the event of failure of the active node is dependent on the time it takes to bring a passive node online—this is not always an immediate process and dependent on when the Windows Failover Cluster Service detects a down node

  • SQL clustering uses shared storage to ensure timely and accurate copies of data from the active node to the passive nodes

The specifics of Microsoft SQL Server 2008 and 2012 clustering are outside the scope of this book, but I will note that clustering abstracts redundancy away from SharePoint and provides data integrity without SharePoint ever needing to switch database servers. This is the beauty of it. SharePoint talks to a SQL cluster in the same way it talks to a single SQL Server and never gets involved when the cluster fails over to a passive node.

Note

You can read more about setting up clustering on SQL Server 2008 R2 at the following location: http://msdn.microsoft.com/en-us/library/ms189134.aspx .

I recommend the use of SQL clustering in any large organization or enterprise where SharePoint data is critical and exceeds 100GB, and the organization must limit the downtime in the event of failure. Traditionally, large-scale organizations using SharePoint with SQL clustering would host the actual data on a Storage Area Network, attached to the cluster, to provide an extra level of data redundancy and hot swap capability with inexpensive disk storage.

SQL Server Database Mirroring

SQL Server mirroring also provides data redundancy at the SQL Server, but unlike clustering, where the cluster is the data repository in entirety, mirroring consists of a warm backup SQL Server, separate from the main live server.

Clustering involves multiple storage nodes, connected by network links to a root SQL instance. Mirroring consists of two completely independent SQL Servers with either synchronous or asynchronous copy, managed by each SQL Server instance. Synchronous mode provides hot standby because SQL Server ensures no data discrepancy between the principal and the mirror, whereas asynchronous provides warm backup and operates in a more passive copy mode.

Note

You can read more about SQL Server 2008 mirroring at the following link: http://msdn.microsoft.com/en-us/library/ms189852.aspx .

Administrators may provide high availability for SharePoint when using SQL Server mirroring in synchronous mode and using the database failover capabilities built into the SharePoint platform. SharePoint requires a SQL Server witness to manage the failover, in the event that the principal fails. The details surrounding SQL Server mirroring fall outside the scope of this book, but I will show you how to configure SharePoint for it, assuming you have SQL Server 2008 or 2012 mirroring established in your infrastructure.

The following steps consist of PowerShell commands. Launch the SharePoint Management Shell to begin, where you will enter the following commands:
  1. 1.

    Enter the following command into the PowerShell console to configure mirroring for the SharePoint configuration database:

    $database = Get-SPDatabase | where {$_.Name –match "SharePoint_Config"}

    $database.AddFailoverServiceInstance("mirror server name")

    $databse.Update()

     
  2. 2.

    Enter the following command into the PowerShell console to configure mirroring for your content database:

    $database = Get-SPDatabase | where {$_.Name –match "WSS_Content"}

    $database.AddFailoverServiceInstance("mirror server name")

    $databse.Update()

     

Note

Both of the preceding commands assume your configuration database has the name SharePoint_Config and you have named your content database as WSS_Content. Change the names in the script to match your database names.

If you prefer to configure database mirroring via Central Administration, follow these steps:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click the Application Management heading link.

     
  3. 3.

    Click the Manage Content Databases link.

     
  4. 4.

    Choose the relevant web application from the drop-down list.

     
  5. 5.

    Select the relevant content database.

     
  6. 6.

    On the settings page for the selected database, populate the Failover Database Server field with the mirrored server.

     
  7. 7.

    Click the OK button.

     

SharePoint Farm Design

The design of your SharePoint farm has a large impact on the level of disaster recovery. At the lower end of the scale, a simple farm with minimal redundant hardware provides little to no recovery in the event of failure, whereas a multiple server farm with multiple redundant servers provides rapid recovery. Microsoft designed SharePoint 2013 to scale, to allow reuse of common services across multiple infrastructure hardware, and to embrace virtualization. In this section, I shall discuss some of the high-level considerations when planning an Enterprise SharePoint infrastructure for maximum uptime.

Looking at a SharePoint farm from a 50,000-foot view, we see the farm essentially consists of a data storage component, some service middleware, and web-front-end to render pages to end users. Consider Figure 5-1 as the bare minimum components of a SharePoint farm, which consists of
  • A SQL data store

  • An application server for middleware services

  • Two web-front-end servers

Figure 5-1.

Minimal SharePoint infrastructure for an enterprise

The diagram in Figure 5-1 provides very little redundancy—should the SQL Server fail, the farm goes offline. SharePoint can partially operate without a working application server, but services like search, user profiles, Business Connectivity Services, business intelligence, managed metadata, etc. that rely on the application server will fail, rendering SharePoint to basic collaboration. There is minimal redundancy with two web-front-end servers and the ability to distribute user request load to these servers. This is important because the WFE servers are the entry to the SharePoint farm for users, and without them, the farm might as well be inoperable.

Now consider the diagrams in Figure 5-2 and Figure 5-3—this infrastructure is vastly larger than the example presented in Figure 5-1. This design separates the farm into six tiers, consisting of the web server tier, application server tier, search index and query tier, other search components tier, database search tier, and database content tier. One immediate observation in this larger design is the separation of search services and search data from other tiers in the farm. SharePoint 2013 relies heavily on the Search platform (FAST) to allow users’ to search and discover content. Unlike previous versions of SharePoint, SharePoint also leverages the search platform for content rollup and rendering of dynamic content, which constantly changes. As you can imagine, with the search platform playing such an important role in the SharePoint farm, it deserves big consideration in the overall farm design. The search platform itself consists of multiple components, which, like the rest of the farm, require redundancy to combat anticipated failure.
Figure 5-2.

Large Enterprise SharePoint Farm Design (Part A)

Figure 5-3.

Large Enterprise SharePoint Farm Design (Part B)

The design in Figure 5-2 and Figure 5-3 leverages virtual server technology, which provides greater number of redundant virtual servers. Infrastructure consisting of large number of servers benefit from virtual servers, running on multiple virtual host servers (the physical hardware) and save the organization the cost for procurement of physical hardware and costs associated with maintaining physical hardware.

Notice the design provides redundancy across the physical infrastructure – multiple virtual hosts – as well as redundancy with multiple virtual servers. This is important because physical hardware often fails. Operating multiple redundant virtual servers on one physical host fails disaster recovery if the physical server dies.

The design in Figure 5-2 and Figure 5-3 also caters for distribution of services and data across multiple virtual servers and across multiple host servers. In the event that either a virtual server fails, or a physical server fails, the data and operating service resides on another virtual server on another physical host.

Of course, the design in Figure 5-2 and Figure 5-3 is quite elaborate and caters for many disaster scenarios. There are plenty of scaled-down designs that provide a good level of redundancy, which fall in between the design shown in Figure 5-1, Figure 5-2 and Figure 5-3. This is the beauty of SharePoint; you can design your SharePoint farm around the business need of the organization.
Figure 5-4.

Global SharePoint

Depending on the size of your organization, you might have to consider multiple global office locations across in your SharePoint infrastructure design (Figure 5-4). If your organization shares data across multiple offices and that data resides in SharePoint, it may not make sense to host one copy of the data in a single SharePoint farm at one office location. SharePoint 2013 scales globally and allows cross-pollination of data between multiple offices. This design provides location redundancy—if an entire office goes dark (perhaps because of power failure or natural disaster), the business can continue using one of the other office locations.

Design of globalized SharePoint farms is nontrivial and impacts network connectivity design. Globalized scenarios require considerable planning for how to replicate data and must consider peak usage of data for multiple offices in multiple time zones. This design is outside the scope of this book, but the diagram in Figure 5-4 illustrates the vast capabilities of SharePoint 2013 for almost any deployment scenario.

Maintaining Content Integrity

When I discuss disaster recovery, I am really talking about minimizing loss of user data and access to that user data. Loss of actual data (documents, files, web page content, data in a database or line of business system, and so on) is a disaster, and we strive to avoid it in any trustworthy data management system, but loss of access to data because of downtime of the management system is almost as bad. In today’s connected world, users rely on the uptime of data management systems—like SharePoint—and trust that these systems maintain the integrity of their content. Fortunately, SharePoint provides a number of approaches to maintaining content/data integrity.

SharePoint stores all content of your site in a content database. SharePoint content databases may contain one or many site collections, associated with a web application. One site collection may not span multiple content databases, which is important to note because backing up your content databases ensures complete recovery of your site collection. I discuss backup and restore of content and configuration databases a little later in this chapter.

Database backup is good in a disaster scenario. What if a user loses a single document from a document library and wants to recover it? A complete database restore would be overkill, not to mention considerable work in restoring the database to a separate location to retrieve the file. As administrators, we know users tend to lose files all the time. Fortunately, SharePoint includes features to retain content and data integrity without the need for complete database restore after small losses.

The Recycle Bin

Since SharePoint 2007 (WSS 3), the Recycle Bin has provided a mechanism for users to retrieve deleted lists and list items—this includes documents and document libraries. Users can find the Recycle Bin on the top right of the Site Contents page (see Figure 5-5). In addition to lists and libraries, since SharePoint 2010 Service Pack 1, SharePoint allows administrators to recover deleted sites.
Figure 5-5.

Location of Recycle Bin

The Recycle Bin works in two stages, as described in Table 5-1, and scopes to the web application level. Different web applications may have different configurations for their Recycle Bin.
Table 5-1. 

Recycle Bin Stages

Stage

Location

Details

Stage one

Site

The stage one Recycle Bin is available to users with Contribute, Design, or Full Control permissions. Items and lists deleted at the site level reside in the stage one Recycle Bin until a time (defined by the administrator in Central Administration—typically 30 days) when the content moves to the stage two Recycle Bin. Content in the stage one Recycle Bin counts toward user storage quota.

Stage two

Site Collection

The stage two Recycle Bin lives at the site collection level and is populated from stage one Recycle Bin content by a timer service. Only a site collection administrator may restore content from a stage two Recycle Bin, and content resides in this Recycle Bin for a time or until the Recycle Bin reaches a size, both specified by an administrator in Central Administration, before SharePoint deletes the oldest items. In addition to lists and list items, populated from the stage one Recycle Bin, the stage two Recycle Bin contains deleted sites.

The size of the stage two Recycle Bin is a percentage of the quota allocated to the entire site collection. Items in the stage two Recycle Bin do not count toward user storage quota, but they do eat up space in the overall site collection quota.

The following steps demonstrate how administrators may configure the Recycle Bin from Central Administration, to allow different item expiration times.
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click the Manage Web Applications link under the Application Management heading.

     
  3. 3.

    Select the desired web application.

     
  4. 4.

    Click the General Settings icon on the ribbon.

     
  5. 5.

    Scroll to the Recycle Bin section (see Figure 5-6).

     
Figure 5-6.

Recycle Bin settings for the web application

The following steps demonstrate working with the stage one and stage two Recycle Bins:
  1. 1.

    Navigate to a SharePoint site with at least contributor permissions.

     
  2. 2.

    Navigate to the All Site Contents page.

     
  3. 3.

    Click the Recycle Bin link.

     
  4. 4.

    Figure 5-7 shows my stage one Recycle Bin for my root site.

     
  5. 5.

    Check the boxes next to the items you wish to restore to original location before deletion, or to delete, and send to the stage two Recycle Bin.

     

Note

The root site collection Recycle Bin is not the stage two Recycle Bin; the root site also has a stage one Recycle Bin.

Figure 5-7.

Stage one Recycle Bin

Items in the stage one Recycle Bin remain there until you either delete them or the time elapses (see Figure 5-6) and SharePoint moves the items to the stage two recycle bin. Once they are in the stage two Recycle Bin, you can view these deleted items, as follows:
  1. 1.

    Navigate to the root site of the site collection.

     
  2. 2.

    Clear the gear icon and select the Site Settings menu option.

     
  3. 3.

    Click the Recycle Bin link under the Site Collection Administration heading.

     
  4. 4.

    Click the link in the left navigation to show items deleted from the end user Recycle Bin.

     
  5. 5.

    Figure 5-8 shows the site collection stage two Recycle Bin page.

     
Figure 5-8.

Stage two Recycle Bin, showing items deleted from the stage one Recycle Bin

You might be wondering about the difference between the views for the End User Recycle Bin items and Deleted from End User Recycle Bin. As I demonstrated previously, the link for Deleted from End User Recycle Bin shows all items moved from stage one Recycle Bins in subsites to the stage two Recycle Bin in the site collection. The link for End User Recycle Bin shows all items that currently reside in stage one Recycle Bins across the site collection hierarchy.

In similar fashion to the stage one Recycle Bin, you can delete items from the stage two Recycle Bin by selecting items in the Deleted from End User Recycle Bin page and clicking the link to delete selection.

Note

You cannot recover any item deleted from the stage two Recycle Bin.

Versioning

Document and page versioning is another way in which users may self-maintain integrity of their content in SharePoint. Library owners may enable versioning on a list or library so that when users with collaborative permissions upload changes, SharePoint keeps track of the version history. SharePoint library versioning is not new to SharePoint 2013; Microsoft introduced it with WSS 2.0, and it comes in two flavors:
  • Major version numbers

  • Major and minor version numbers

Major and minor version numbers tie into the publication status of a document item. A major version in the format of xx.0 constitutes a published version, meaning that it is available to all users (including anonymous if the site allows anonymous user access). A minor version number, in the format of xx.1-9, constitutes an intermediate revision, and only the owner of the document, users with approval permissions, and site owners may see the latest changes.

The following steps detail how to enable versioning for a document library:
  1. 1.

    Navigate to your SharePoint site or subsite.

     
  2. 2.

    Navigate to the default view of the document library.

     
  3. 3.

    Click the Library tab on the ribbon.

     
  4. 4.

    Click the Library Settings icon on the ribbon.

     
  5. 5.

    Click the Versioning Settings link.

     
  6. 6.

    SharePoint displays a page like that in Figure 5-9.

     
  7. 7.

    Under Document Version History, select the desired versioning type.

     
  8. 8.

    Select the maximum number of draft and major versions to keep.

     

Note

Lists allow major versioning but do not provide minor (draft) versioning capability.

Figure 5-9.

Enable versioning in a document library

On the Version Settings page (Figure 5-9), you may have noticed the other options, to enable content approval and require check out before editing. These options also allow you to maintain the integrity of your content.

Enabling content approval turns on the parallel approval workflow, which requires one or several approvers (in the approvers security group) to approve changes to content before SharePoint publishes the content to a major version. I discuss the parallel approval workflow in  Chapter 10 as part of Publishing.

Requiring users to check out content before editing ensures no two users can overwrite each other’s changes. Of course, this limits work to a single thread, and only one user can make changes to a document at a time. Furthermore, there is nothing preventing one user from checking out a document indefinitely. Microsoft introduced co-authoring to address the need for multiple users to edit a document at the same time. I discuss co-authoring in  Chapter 14.

Backup and Restore

Backup and restoration of user data and system configuration is an intricate part of disaster recovery planning. After all, the user data is most precious and typically tantamount to the running of the organization’s business. SharePoint 2013 includes a number of backup and restoration methods, from complete farm backup/restore to granular backup/restore, such as site import and export and site collection backup. In this section, I will visit each method and discuss the specific benefits and shortcomings of each, enabling you as the SharePoint administrator to make effective decisions in your disaster recovery plan.

As a general rule of thumb, I recommend that you employ various backup methods to ensure that you are able to recover your SharePoint farm in the event of a disaster. The following list summarizes, from a high level, what you should back up:
  • All content databases

  • All configuration and service application databases

  • The SharePoint 2013 hive on each web server (c:\program files\common files\Microsoft shared\web server extensions\14\)

  • All virtual application directories on each web server (c:\Inetpub\wwwroot\wss\VirtualDirectories)

  • Any custom databases or additional files that do not live in the hive or virtual application directories on each web server

  • Site collection backups for faster restore, in the event of isolated data corruption or data loss in a particular site collection

When it comes to backup, more is better. If space for backup is not as plentiful, then backup of all databases and custom “changes” to the hive and virtual application directories should allow you to recover your farm after a new installation.

With the high-level stuff out of the way, I shall now detail the various backup methods available in SharePoint 2013.

Site Collection Backups

Site collection backups are compelling in that they enable administrators to save a complete site collection to a file on disk. Administrators may back up a site collection using the STSADM command, PowerShell, or Central Administration—I shall demonstrate each.

Note

Site collection backup puts stress on SharePoint and consumes resources to complete the process. Microsoft does not recommend backing up site collections of more than 15GB, because of the drain on the live site collection, hosting web application, and the time to complete the backup. Site collection backup works well when moving data from one farm to another, or in conjunction with another backup scheme to ensure data integrity.

Site Collection Backup and Restore Using PowerShell

The following steps demonstrate backing up a site collection to a disk file, using PowerShell:
  1. 1.

    From the Start menu, choose All Programs.

     
  2. 2.

    Click Microsoft SharePoint 2013 Products.

     
  3. 3.

    Click SharePoint 2013 Management Shell to launch the console.

     
  4. 4.

    Type the following text into the console, replacing the appropriate placeholders:

    Backup-SPSite <site collection URL> -Path <backup file> [-Force] [-NoSiteLock] [-UseSQLSnapshot] [-Verbose]

     

Include the [Force] parameter to overwrite an existing backup file. I recommend not using the [NoSiteLock] option, as this prevents SharePoint from putting the site collection in read-only state, meaning that users can write to the site collection during backup and potentially corrupt the database. Use the [UseSQLSnapshot] option if you have SQL Server Enterprise edition, for more consistent backup. The [Verbose] option provides additional output.

Now that I have shown you how to back up your site collection, restoring it is just as easy. The following command demonstrates restoring a site collection from a backup file, using PowerShell:
  1. 5.

    Type the following text into the console, replacing the appropriate placeholders:

    Restore-SPSite <site collection URL> -Path <backup file> [-DatabaseServer <database server name> ] [-DatabaseName <database name>] [-HostHeader <host header> ] [-Force] [-GradualDelete] [-Verbose]

     

Include the [Force] parameter to overwrite an existing backup file. Use the [-DatabaseServer] option if the server is not part of your farm. Include the [-GradualDelete] option to minimize locks on the database and provide for better restore performance for backups over 1GB when replacing an existing site collection, which SharePoint marks as deleted; the timer service deletes the legacy site collection later. Use the [-HostHeader] option if restoring a site collection to a web application that requires a unique host header.

Site Collection Backup and Restore Using STSADM

Use the following STSADM command, inside a command shell, to back up a site collection to a disk file, replacing the appropriate placeholders:

STSADM –o backup –url <site collection url> -filename <filename>

Similar to the PowerShell command for backing up a site collection, you may provide the -overwrite option to overwrite an existing backup file, -nositelock to prevent site collection lock, and -usesqlsnapshot to use SQL Server Enterprise snapshot. Use the following STSADM command to restore a site collection from a backup file, replacing the appropriate placeholders:

STSADM –o restore –url <site collection url> -filename <filename>

Similar to the PowerShell command for restoring a site collection, you may provide the –overwrite option to overwrite an existing site collection, -hostheaderwebapplicationurl to provide a host header URL, and -gradualdelete to provide better performance in overwriting an existing site collection (marks the overwritten site collection as deleted and the timer service deletes it later).

Site Collection Backup and Restore Using Central Administration

The following steps demonstrate backing up a site collection to a disk file, using Central Administration:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click the Backup and Restore heading.

     
  3. 3.

    Click the Perform a Site Collection Backup link. You will see a page like that in Figure 5-10.

     
  4. 4.

    Click the drop-down control to change the site collection.

     
  5. 5.

    In the resulting dialog box, select the web application containing the site collection.

     
  6. 6.

    Provide the UNC path of the file name to save the backup.

     
  7. 7.

    Click the Start Backup button.

     
Figure 5-10.

Perform a site collection backup in Central Administration

Export and Import

SharePoint supports granular export and import of sites, lists, and libraries. In this section, I shall demonstrate exporting and importing using the three tools of choice—PowerShell, STSADM, and Central Administration.

Export and Import Using PowerShell

The following steps demonstrate using PowerShell commands from the PowerShell console to export a site to a file:
  1. 1.

    From the Start menu, choose All Programs.

     
  2. 2.

    Click Microsoft SharePoint 2013 Products.

     
  3. 3.

    Click SharePoint 2013 Management Shell to launch the console.

     
  4. 4.

    Type in the following text into the console, replacing the appropriate placeholders:

    Export-SPWeb <site/list/library URL> -Path <filename>

     
  5. 5.

    To export a specific list or library, provide the full URL to the list or library, otherwise PowerShell will export the site if the URL is to the main site location.

     
  6. 6.

    Include the [-Force] option to overwrite the file.

     
  7. 7.

    Include the [-HaltonError] or [-HaltOnWarning] options to stop the export process in the event of an error or warning.

     
  8. 8.

    Specify the [-IncludeUserSecurity] option if you need to ensure that all permissions applied to exported sites, lists, libraries, and contained items are included in the export file.

     
  9. 9.

    Include the [-IncludeVersions] option to instruct PowerShell to include version information of items in the export file.

     
  10. 10.

    Include the [-NoFileCompression] option to turn off file compression; this makes for a faster export but larger files on disk.

     
  11. 11.

    The [-NoLogFile] option prevents PowerShell from creating a log of the export (not recommended generally).

     
  12. 12.

    The [-UseSQLSnapshot] option is the familiar SQL snapshot option for deployments running on SQL Server Enterprise.

     

Note

PowerShell provides help for all commands—you may get help on the export command by typing Get-help Export-SPWeb into the PowerShell console.

In partner to the export command, the following step demonstrates importing an export file to a SharePoint site, list, or library:
  1. 13.

    Type the following text into the console, replacing the appropriate placeholders:

    Import-SPWeb <site/list/library URL> -Path <filename>

     

To import a specific list or library, provide the full URL to the list or library, otherwise PowerShell will import the site if the URL is to the main site location. For brevity, most of the options specified in the previous export steps exist for the import command. Use the Get-help feature of PowerShell to see all options.

Exporting of lists and libraries was new to SharePoint 2010. In SharePoint 2007, administrators could export and import sites only, using STSADM. SharePoint 2010, and now 2013, supports STSADM export/import, but adds the capability of list and library export by providing the full URL to the list or library.

Export and Import Using STSADM

The following steps demonstrate export of a site, list, or library using the STSADM command from a regular Windows command prompt:
  1. 1.

    Type the following text into the console, replacing the appropriate placeholders:

    STSADM –o export –url <site/list/library url> -filename <filename>

     
  2. 2.

    To export a specific list or library, provide the full URL to the list or library. Otherwise, PowerShell will export the site if the URL is to the main site location.

     
  3. 3.

    Include the –overwrite option to overwrite the file.

     
  4. 4.

    Include the -haltonfatalerror or -haltonwarning options to stop the export process in the event of an error or warning.

     
  5. 5.

    Specifying the -includeusersecurity option will ensure that all permissions applied to exported sites, lists, libraries, and contained items are included in the export file.

     
  6. 6.

    The -versions option instructs PowerShell to include version information of items in the export file.

     
  7. 7.

    Include the -nofilecompression option to turn off file compression; this makes for a faster export but larger files on disk.

     
  8. 8.

    The -nologfile option prevents PowerShell from creating a log of the export (not recommended generally).

     
  9. 9.

    The -usesqlsnapshot option is the familiar SQL snapshot option for deployments running on SQL Server Enterprise.

     
The following steps demonstrate using STSADM to import a site, list, or library. Look back through the command options previously listed for the export, as some also apply to the import command.
  1. 10.

    Type the following text into the console, replacing the appropriate placeholders:

    STSADM –o import –url <site/list/library url> -filename <filename>

     
  2. 11.

    To import a specific list or library, provide the full URL to the list or library.

     

Export Using Central Administration

In this section, I demonstrate how to use Central Administration to export a site, list, or library. You may have noticed that this section does not cover import via the Central Administration web browser interface—this is because Central Administration does not provide a mechanism for site, list, or library import from a file. To import, use either STSADM or PowerShell options, previously discussed.
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click on the Backup and Restore heading link.

     
  3. 3.

    Click the Export a Site or List link under the Granular Backup heading.

     
  4. 4.

    Select the site collection and then the site and/or list (see Figure 5-11).

     
  5. 5.

    Provide the file location and toggle options for security and versions.

     
  6. 6.

    Click the Start Export button to begin the export.

     
Figure 5-11.

Exporting a site, list, or library from Central Administration

Unattached Content Database Data Recovery

IT staff and database admins like to back up SQL databases—and there is nothing wrong with that! SQL Server provides options to administrators to run nightly backups, and many good backup applications include a SQL agent to back up live SQL database data to backup storage. What is not to like? The problem is that full SQL Server database backups only provide all-or-nothing restore of data. Restoring a single piece of data, such as a document, requires standing up content backup in a new SharePoint web application and site collection to access the required data.

Rewind the clock a couple of years to the days of SharePoint 2007. In the event that an administrator wanted to restore selected data (such as a site, list, or library) from an offline database, the process went something like the following:
  1. 1.

    Restore the SQL database backup from cold storage to a disk location, seen by SQL Server.

     
  2. 2.

    Attach the offline database data file and log file to SQL Server, using a different name from the current live SQL Server, now becoming the backup database.

     
  3. 3.

    Associate the backup database with a fresh web application in SharePoint 2007, or another SharePoint 2007 farm.

     
  4. 4.

    Export the selected content from the backup, using STSADM (the minimum granularity was a subsite).

     
  5. 5.

    Import the exported content to the current live site collection.

     
  6. 6.

    Restore the site, list, or library to the correct place in the live site collection using SharePoint content tools, such as the Content Management UI.

     

The steps above seem like a lot of work to me. Further complications arose for the administrator in that SharePoint 2007 required installation of any feature customizations to the backup web application before the administrator could access the backup site collection. If using a separate farm to host backup content data, the administrator would have to ensure that the version of the production farm was equal to or exceeded that of the backup farm for the data import to work. Yuk!

You no longer need to worry. SharePoint now allows you to drill into a SQL content database without ever having to attach it to the farm, as the following steps demonstrate:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click the Backup and Restore heading link.

     
  3. 3.

    Click the Recover Data from an Unattached Content Database link.

     
  4. 4.

    Provide the SQL Server name and database name for the warm unattached database backup (you still need to host the offline database in SQL Server somewhere).

     
  5. 5.

    SharePoint displays a page like that in Figure 5-12.

     
Figure 5-12.

Unattached Content Database Recovery

  1. 6.
    In the Operation to Perform section, you have three choices:
    1. a.

      Browse content in the backup database.

       
    2. b.

      Backup a site collection contained in the database.

       
    3. c.

      Export a site or list from the database.

       
     
  2. 7.

    Select the Browse content option.

     
  3. 8.

    Click the Next button.

     
  4. 9.

    On the next page (Figure 5-13), you may browse a site collection, site, and list and then either back up the site collection or export the selected site and list.

     
Figure 5-13.

Browse an unattached content database

  1. 10.

    Try the site collection backup option and click the Next button.

     

SharePoint navigates you to a page to provide the site collection backup details, similar to the page for site collection backup of an attached content database, discussed earlier in this chapter.

Had you selected the option to export a site or a list, or gone directly to the site collection backup or export operation on the main page, you would see the appropriate page for site collection backup or export.
  1. 11.

    Click the Start Backup button.

     
So far, I have covered the granular backup methods. Next, I will visit complete farm backup and restore capabilities for SharePoint 2013. Before leaving granular backup, navigate back to the Backup and Restore page in Central Administration and click the Check Granular Backup Job Status link. Figure 5-14 shows the Job Status page for all granular backups, which provides for easy review of the health of your backup operations.
Figure 5-14.

Granular backup status

Farm Backup and Restore

SharePoint provides complete SharePoint farm backup, using Central Administration. SharePoint also allows for complete backup and restore of the farm using PowerShell, and I will present steps for both procedures in this section of the chapter.

Up to now, I have discussed backup of content. Of course, content is vitally important because it is the user data of a system that gives the system its value, and contributes to the running of the business for which the organization employs the system. But now that I am discussing farm backup, more than just content has to be considered—for example, system configuration settings. When faced with a total disaster and system loss, the IT team and administrators want to get a new system online as quickly as possible. Unfortunately, SharePoint, like most other enterprise systems today, has a considerable number of configuration options, and no administrator wants to reconfigure a virgin SharePoint farm installation, from the ground up, under the pressure of disaster recovery. Fortunately, the features in SharePoint 2013 that provide for complete farm backup allow for configuration backup. I shall discuss configuration backup and restore as part of farm backup.

In the following sections, I shall walk you through backup via the Central Administration browser interface, and then I shall cover PowerShell backup and restore commands.

Farm Backup Settings

Before you begin your first SharePoint farm backup, you should first visit the settings page, as follows:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click on the Backup and Restore heading link.

     
  3. 3.

    Click the Configure Backup Settings link.

     
  4. 4.

    SharePoint displays a page for you to configure the number of threads for backup and restore, and a directory location (UNC path) to store farm backup files (Figure 5-15).

     
  5. 5.

    The default of three threads is fine for most purposes.

     
  6. 6.

    Provide a UNC path for the backup directory because the timer service (which performs the backups) may not have the same drive mappings as your current user context.

     
Figure 5-15.

Backup and Restore Settings

“Threads,” in computer terms, much like the threads in clothing, consist of granular processing in the overall application process life cycle. The CPU in the server slices time given to threads in a process to give the illusion of multi-threading or multiple things happening at once. Modern CPUs consist of multiple cores, which can process separate threads of a process at the same time (true multi-threading). Backup and restore operations work well with multi-threading because each thread dedicated to a CPU core may run independent backup and restore operations, thus providing for a more efficient backup and restore, which by its very nature is a timely process.

Performing a Backup

With the overall farm backup settings configured, you are now ready to perform your first backup. Follow the steps below:
  1. 1.

    Navigate to the Backup and Restore page in Central Administration.

     
  2. 2.

    Click the Perform a Backup link.

     
  3. 3.

    SharePoint displays a page like that in Figure 5-16.

     
Figure 5-16.

Back up the farm from Central Administration

This page is where it is all happening! Looking at Figure 5-16, which shows a summary of my farm, you can see several selection options to include in the farm backup.

Checking the check box at the top Farm level will enable all the options below it, which include backup of the content databases, web application settings, and service application configuration. At this point you may choose what to back up à la carte style, but for demonstration purposes, I shall assume backup of the entire farm. This will also give you an idea of how long the process for complete farm backup usually takes (which changes by order of magnitude based on the content in your farm and services installed).

Note

When backing up content, backup files typically consume 1.5 times as much space as the original content databases.
  1. 4.

    Check the check box next to the Farm level, and then click the Next button.

     
  2. 5.

    Figure 5-17 shows the next page, where you specify the backup type.

     
Figure 5-17.

Select farm backup options

If you provided a UNC path for backups in the backup settings (earlier), then SharePoint suggests this location in the Backup Options page (Figure 5-17). If you didn’t, there is no need to worry; just provide it now.

SharePoint provides you with a helpful summary of the running services, required for a farm backup, which include the timer service and administration service. The Backup Component section reminds you what you selected in the previous screen.
  1. 1.

    Select the backup type as either Full or Differential.

     

A full backup is exactly that—SharePoint backs up everything. Differential backups run much smaller and faster, but they only back up changes since the last full backup. Consider the restore process when choosing the backup types. Full backup restores are easier but take longer than differential, which require multiple restores of the various differential backups to get the system current after a disaster.

Note

As a good practice, I recommend a weekly full backup and daily differential backups.
  1. 2.

    For demonstration purposes, and since this is your first backup, choose Full.

     
The next section of the Backup page allows you to specify backup of both content and configuration, or just configuration. The latter option comes in handy if you already have a content redundancy or backup process in place and now just want to save the farm configuration.
  1. 3.

    Click the option to back up both content and configuration.

     
  2. 4.

    Click the Start Backup button to begin the process.

     
  3. 5.

    SharePoint shows the status of the backup (Figure 5-18).

     
Figure 5-18.

Farm backup status

  1. 6.

    Navigate back to the Central Administration main Backup and Restore page.

     
  2. 7.

    Click the View Backup and Restore History link to see a history of past backups.

     
  3. 8.

    If the backup is still running, then SharePoint will inform you with a link to the Status page at the top of the History page—Backup and Restore Job Status.

     
  4. 9.

    You may also get to the Backup and Restore Status page by clicking the Check Backup and Restore Job Status link in the Backup and Restore page.

     

Performing a Restore

Performing a SharePoint farm restore is much the inverse of the backup process. Assuming you have performed a successful farm backup, the following steps demonstrate the farm restore process:
  1. 1.

    Navigate to the Backup and Restore main page in Central Administration.

     
  2. 2.

    Click the Restore from a Backup link.

     
  3. 3.

    SharePoint displays a page like that in Figure 5-19.

     
  4. 4.

    Provide the backup directory location.

     
Figure 5-19.

Restoring the farm from backup

  1. 5.

    Choose the backup instance from the history list, and then click the Next button.

     
  2. 6.

    SharePoint shows a page like that in Figure 5-20.

     
Figure 5-20.

Farm restore selection

Similar to the backup process, this selection screen allows you to choose what configuration and content in the farm to restore.
  1. 7.

    Make your selection and then click the Next button.

     
  2. 8.

    The next page (too large to illustrate here) shows various options for the selected service and content configuration.

     
  3. 9.

    Choose whether you wish to overwrite configuration or create new.

     
  4. 10.

    The option to create new services from backup is useful when restoring a new farm from scratch after a disaster.

     
  5. 11.

    Use Overwrite when replacing the existing configuration.

     
  6. 12.

    Click the Start Restore button to begin the restore process.

     

Using PowerShell

As one might expect, SharePoint allows administrators to perform farm backup with PowerShell.

Note

Before embarking on this route of backup/restore, ensure that the user running the script is a member of the SharePoint_Shell_Access role in the main SharePoint configuration database and is a member of the Windows security group WSS_ADMIN_WPG.

Follow these steps to back up the farm using PowerShell.
  1. 1.

    From the Start menu, choose All Programs.

     
  2. 2.

    Click Microsoft SharePoint 2013 Products.

     
  3. 3.

    Click SharePoint 2013 Management Shell to launch the console.

     
  4. 4.

    Type the following text into the console, replacing the appropriate placeholders:

    Backup-SPFarm –Directory <Backup Folder> -BackupMethod {Full | Differential} [-Verbose]

     
  5. 5.

    Add the [-Force] parameter to force overwrite of existing backup files.

     
  6. 6.

    Add the [-ConfigurationOnly] option to backup configuration without content.

     
Follow these steps to restore a farm from backup using PowerShell.
  1. 1.

    From the Start menu, choose All Programs.

     
  2. 2.

    Click Microsoft SharePoint 2013 Products.

     
  3. 3.

    Click SharePoint 2013 Management Shell to launch the console.

     
  4. 4.

    Type the following text into the console, replacing the appropriate placeholders:

    Restore-SPFarm –Directory <Backup Folder> -RestoreMethod {New | Overwrite} [-Verbose]

     
  5. 5.

    Add the [-Force] parameter to force overwrite of existing backup files.

     
  6. 6.

    Add the [-ConfigurationOnly] option to backup configuration without content.

     

SharePoint 2013 Request Management

Request Management is a new feature of SharePoint 2013. Although the topic of Request Management does not directly relate to disaster recovery and health, I wanted to include mention of this new service in this chapter because Request Management maintains load on a SharePoint farm, and therefore correlates to the overall health of your SharePoint 2013 farm.

Request Management allows SharePoint to understand more about, and control the handling of, incoming requests for pages, documents, and any other content that SharePoint may deliver to end users. The Request Management service encompasses a rules engine to make decisions on delegation of server requests to different servers in a multi-server SharePoint 2013 farm.

A new SharePoint service called “Microsoft SharePoint Foundation Web Application” handles Request Management for the SharePoint farm. The following steps show the location of this service in Central Administration:
  1. 1.

    Navigate to Central Administration.

     
  2. 2.

    Click the link for Manage Services on Server, under the System Settings heading.

     
  3. 3.

    Scroll down the list of services.

     
  4. 4.

    The Microsoft SharePoint Foundation Web Application should have a state of Started.

     

As the service name suggests, Request Management is part of the core SharePoint platform and available to all versions of SharePoint 2013, including Foundation.

Wait a minute! Request Management sounds like the job of a load balancer, outside the responsibility of SharePoint. Yes and no. Any administrator responsible for a multi-server and multiple web-front-end SharePoint farm is probably aware of the role of a hardware or software load balancer. A load balancer sits in front of all incoming requests and redirects traffic to one of a pool of web servers, depending on availability of these servers. Typically, load balancers make determinations on what server they forward requests to based on DNS settings and servers that respond to IP requests. Some more intelligent load balancers can monitor server utilization and route traffic based on available load. However, SharePoint is a dynamic platform, which might consist of many different servers, providing different services. Some servers in a SharePoint farm may provide multiple functions. Request Management provides better granular control over which servers receive which requests, based on data in each request. For example, by looking at the user agent, content type, etc. within a request, the Request Management service can direct traffic to a SharePoint server that is best equipped to service a response.

The Request Management Process

Request Management consists of the Web Application service, running on every SharePoint server in the farm. This is important to note—Request Management requires knowledge of the performance and characteristics of each SharePoint server available to service requests, and this is the job of this service.

The Request Manager (the service running on each SharePoint server) provides three levels of operation:
  • Load balancing

  • Prioritization

  • Throttling and routing

Figure 5-21 illustrates rule flow of Request Management in SharePoint 2013. Based on a set of routing rules, Request Management makes decisions on where to route server requests.

Each potential target server to respond to a request resides within a machine pool. Each server in a machine pool has a static weighting and health weighting, which the routing rules use to determine the eligibility of servers to service requests. Static weights are numeric values assigned by administrators to weight particular servers in the farm, whereas SharePoint changes health weights as the performance and health of servers changes over time.

Routing rules group into execution groups, of which there are three: Execution Group 0, 1, and 2 (Execution Group 2 not shown in Figure 5-21). Any rule not explicitly assigned to an execution group assumes Execution Group 0. Execution groups denote precedence; rules in Group 0 are evaluated before those in Group 1, which are evaluated before those in Group 2. It is the job of routing rules to determine which machine pool will service an incoming request.

Throttling rules (not shown in Figure 5-21) refuse incoming requests that match these rules, and act as a gatekeeper for all requests. For example, requests that have inappropriate parameters or request data might trigger a throttling rule.
Figure 5-21.

Request Management flow

The Request Manager evaluates which server shall service an incoming request as follows:
  1. 1.

    Compare the request with a set of throttling rules; if the request matches any of these rules then refuse the request.

     
  2. 2.

    Evaluate the request by matching it against all routing rules in Execution Group 0, followed by Execution Group 1, and then Execution Group 2.

     
  3. 3.

    Depending on matching to routing rules in a specific execution group, route the request to the machine pool associated with the routing rule satisfied by the request.

     

Any routing rule can route requests to any machine pool. The presence of the routing rule in one of the execution groups ascertains priority. Thus, rules in Execution Group 0 will evaluate first and target machine pools determined best equipped to satisfy the requests.

Request Management Administration

There is no browser user interface for Request Management in SharePoint 2013. Instead, administration of Request Management is via PowerShell Cmdlets.

The following example demonstrates how to get access to the Request Management settings for a particular web application:
  1. 1.

    From the Start menu, choose All Programs.

     
  2. 2.

    Click Microsoft SharePoint 2013 Products.

     
  3. 3.

    Click SharePoint 2013 Management Shell to launch the console.

     
  4. 4.

    Type the following text into the console: $app = Get-WebApplication   "http://webappUrl" $rmSettings = $app | Get-SPRequestManagementSettings$rmSettings

     
You should see a list of settings for the Request Manager associated with the web application. Figure 5-22 shows a screenshot from my console when I executed the previous PowerShell Cmdlets to retrieve the Request Management settings for my default web application.
Figure 5-22.

Request Management settings for web application

The following set of steps demonstrates how to create a couple of machine pools:
  1. 5.

    Type the following PowerShell Cmdlets into the console:

    $pool1 = Add-SPRoutingMachinePool –RequestManagementSettings $rmSettings

    -Name "Machine Pool 1" –MachineTargets @("Server1", "Server2")

     
  2. 6.

    The previous PowerShell assumes Server 1 and Server 2 belong to a new machine pool, called Machine Pool 1.

     
  3. 7.

    Add another machine pool, as follows:

    $pool2 = Add-SPRoutingMachinePool –RequestManagementSettings $rmSettings

    -Name "Machine Pool 2" –MachineTargets @("Server3")

     
  4. 8.

    The previous PowerShell assumes Server 3 belongs to a new machine pool, called Machine Pool 2.

     
  5. 9.

    Now to add some static weightings for servers in the pools.

     
  6. 10.

    Enter the following PowerShell Cmdlets:

    $rmServerInfo = $rmSettings | Get-SPRoutingMachineInfo –Name "Server1"

    Set-SPRoutingMachineInfo –Identity $rmServerInfo –StaticWeight 8

     
  7. 11.

    Repeat step 10 for each server in both pools.

     
  8. 12.

    With the machine pools created and servers in those pools, I will demonstrate adding a throttling rule.

     
  9. 13.

    Type the following PowerShell Cmdlets to add a throttling rule when the user agent includes “Robot”—this rule will prevent any search engine with the word “Robot” in the user agent from issuing requests.

    $criteria = New-SPRequestManagementRuleCriteria –Property UserAgent –MatchType Regex –Value ".*Robot.*"

    $rmSettings | Add-SPThrottlingRule –Name "Refuse Robot Agents" –Criteria $criteria

     
  10. 14.

    Now to add some routing rules, which bind to machine pools. Enter the following PowerShell Cmdlets:

    $criteria = New-SPRequestManagementRuleCriteria –Property Url –MatchType Regex –Value ".*\.pdf"

    $rule = Add-SPRoutingRule –RequestManagementSettings $rmSettings –Name "Handle PDF Requests" –ExecutionGroup 0 –MachinePool $pool1 –Criteria $criteria

     
  11. 15.

    The previous PowerShell Cmdlets create a new request rule that forwards all requests for PDF files to Machine Pool 1. The rule resides in Execution Group 0.

     
  12. 16.

    Experiment by creating more throttling and routing rules. Once complete, you can survey the rules with the following PowerShell Cmdlet:

    Get-SPRoutingRule | $rmSettings

     

The previous set of steps took you on a whirlwind tour of Request Management with PowerShell. I recommend that you read more on the subject of Request Management and planning for your particular scenario. Many of the Cmdlets in the previous examples include additional parameters—especially the rules Cmdlets, which support multiple property and matching types.

Health and Monitoring

The health of your new SharePoint 2013 deployment is very important. Your organization, you, and your administration team have likely spent considerable time installing, configuring, and deploying SharePoint to accommodate the needs of the enterprise. In my time as a SharePoint architect, I have seen a number of organizations stop here, but the fact of the matter is that SharePoint requires a certain amount of care and feeding, just like any enterprise computer system. This is not to say that SharePoint left alone will fall over in time, but as more users pump data into the system, eating up storage space, and the system grows a larger user base, administrators should expect to monitor SharePoint and the underlying server infrastructure for stress areas and efficiency optimization.

Organizations understand that it is costly to stand up large-scale enterprise systems, and they rely on them as an integral part of their daily business. Spending more money ensuring that such systems remain healthy and sustain significant uptime is just as important as the upfront investment in the creation of the system. Consider how much money an organization might lose if its core information system falls over and suffers downtime.

Earlier in this chapter, I covered disaster recovery. I demonstrated several planning techniques to recover in the event that your SharePoint infrastructure fails. Disaster recovery is akin to planning for what to do when a hurricane hits your town, but it would sure be nice to factor in some notice before the storm hits—this analogy is what health and monitoring is all about.

In the previous versions of SharePoint, administrators tended to work in reactive mode—typically, users of the system would report performance issues or loss of access to their data in SharePoint, and the IT department would then jump on the case to rectify the issue. SharePoint now provides health and monitoring features to give the IT group a heads-up of potential issues in the platform, long before users ever see an issue. In the remainder of this chapter, I shall describe these new features. I will discuss how to configure these features to give you advanced warning of problems brewing in the platform, so that you may remedy issues and users may never know there was a problem in the first place.

Logging

Logging is an important part of health monitoring because it is via various log files that SharePoint may alert administrators to issues in the system. The Unified Logging Service (ULS provides administrators with an extensive dump of information, warnings, and errors occurring in the platform. When something goes wrong, the user typically sees either a custom-developed “oops” message in his or her browser, or a default SharePoint error message. It is the job of SharePoint administrators to find out what went wrong, and the ULS logs will likely give an indication of the problem—especially if it is recurring.

Note

By default, the ULS logs live on each SharePoint 2013 server in the Logs folder of the hive, typically c:\program files\common files\Microsoft shared\web server extensions\15\logs.

Figure 5-23 shows the explorer view of the ULS log folder on my SharePoint 2013 development server. The log folder consists of a number of files, both log and usage files (all text files), that have a file name in the format of year, month, day, and time. If you crack open any of the log files you can see lots of detail, reported by the various functional areas of the SharePoint platform—notice that the Timer Service reports lots of information events.
Figure 5-23.

The ULS log folder on a SharePoint 2013 server

Viewing the ULS log files in the raw is not always helpful. Fortunately, you can download a ULS viewer application to browse ULS. Explicit details of the ULS viewer application are outside the scope of this book, but the tool provides filtering capabilities and continued monitoring of the log entries in real time. I strongly recommend this tool to anyone looking to scrutinize the ULS log for SharePoint issues.

Note

Download the ULS viewer tool from http://archive.msdn.microsoft.com/ULSViewer .

SharePoint allows you to fine-tune the ULS log files to contain information most important to you. The Trace Log Windows Service, which controls output of the ULS log files, also operates in a variety of verbosity modes, ranging from error reporting to very detailed information for every action in the platform. As you might expect, Central Administration is the place to configure the ULS settings, as demonstrated in the following steps:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click on the Monitoring link.

     
  3. 3.

    Click the Configure Diagnostic Logging link.

     
  4. 4.

    SharePoint shows a page like that in Figure 5-24.

     
Figure 5-24.

ULS logging configuration

  1. 5.

    Expand the Categories node.

     
  2. 6.

    Specify the types of events you wish SharePoint to log in the ULS logs.

     
  3. 7.

    When an error occurs in the platform, SharePoint reports events to both the ULS and Windows event log; you may control the severity (verbosity level) of events logged to both in the Throttling section of the page.

     

Note

This page does not show you the current configuration for throttling; it defaults to empty drop-down controls and no categories selected.
  1. 8.

    Flood protection consists of preventing SharePoint logging the same repeated event to the Windows event log when a consistent problem arises. For example, if a timer service job runs every five minutes and fails, you really do not want hundreds of event log errors of the same message because an administrator did not get to the issue for a few hours.

     
  2. 9.

    Finally, the Trace Log section defines the location of ULS log files, the number of days of history to store, and the maximum size of log files.

     

Note

When changing settings for diagnostic logging, I recommend you restart the SharePoint 2013 Tracing Service in Windows Services. Also, stop this service if you need to delete any of the ULS log files.

Correlation IDs

Since the previous version of SharePoint, Microsoft has introduced Correlation IDs GUIDs (Global Unique Identifiers) that map an event in SharePoint with the error or warning in the ULS log (see Figure 5-25). Prior to SharePoint 2013, the administrator had to hunt and peck through the log files looking for the event that caused the error. Correlation IDs now allow a user experiencing a problem and an error page to send the ID to the administrator to find more details about the issue.
Figure 5-25.

Correlation ID in a SharePoint error page

As well as using a text-editor-find action to find errors in the ULS log files, SharePoint includes a very nice PowerShell command to simplify finding the messages with a given Correlation ID:

Get-SPLogEvent | ?{$_.Correlation -eq "<ID>"}

The Logging Database

The logging database in SharePoint provides developers with a central data store to capture all events occurring in the platform. Microsoft introduced the logging database both to provide a transactional database of all events for easy query and to herd developers away from executing custom queries directly against content and configuration databases in the farm.

The logging database provides a central location to query all events occurring in the farm, whereas ULS logs only report information per the verbosity settings (see previous sections of this chapter) and spread across servers in the farm. The following steps demonstrate how to configure the logging database for your farm:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click the Monitoring heading link.

     
  3. 3.

    Click the Configure Usage and Health Data Collection link.

     
  4. 4.

    Figure 5-26 shows a page for configuring the health data collection events.

     
Figure 5-26.

Configure Health Data Collection

  1. 5.

    Ensure that the topmost check box is checked to enable usage data collection.

     
  2. 6.

    Select the events you wish SharePoint to capture.

     

Note

In the Usage Data Collection Settings section, notice that the location defaults to the same folder as ULS logs; if you look into this folder you should see usage files as well as the familiar log files.
  1. 7.

    Check the box for the Health Data Collection setting to monitor SharePoint farm health, which is in addition to usage.

     
  2. 8.

    Click the Health Logging Schedule link if you wish to change the schedules that the health logging timer services run (several of them).

     
  3. 9.

    SharePoint populates the logging database using the various usage files on each SharePoint server.

     
  4. 10.

    A timer service collects data from these files and populates the database configured in the Logging Database Server section.

     
  5. 11.

    Click the link to configure the schedule of the log collection timer service.

     
Allow the usage collection to run for a day or two and interact with your farm to generate usage events. Next, I shall show you the logging database, which in my farm is the ROBDEMO_UsageandHealth database.
  1. 1.

    Open SQL Server Management Studio.

     
  2. 2.

    Navigate to the logging database.

     
  3. 3.

    If you expand the Tables node, you should see a large number of partitioned tables, which is not too helpful.

     
  4. 4.

    Expand the Views node instead.

     
  5. 5.

    You may execute SQL queries against the views.

     
  6. 6.

    In Figure 5-27, I ran a select T-SQL statement over the dbo.FeatureUsage view.

     
Figure 5-27.

SQL Server Management Studio and the logging database

The logging database also contains a number of stored procedures that return tabular usage data. As you can see, the logging database provides a nice collection of usage event data that developers may query in custom controls, without having to dip into the main farm content and configuration databases. The premise here is that Microsoft optimizes the configuration and content databases for SharePoint and does not guarantee consistency in the schema between versions. The logging database is isolated from the other farm databases and offers consistency, allowing developers the confidence that their queries remain working with future upgrades of the platform.

Analytics

In the previous version of SharePoint—SharePoint Server 2010—the Web Analytics Service Application maintained usage and analytics data for the SharePoint farm. With the new SharePoint 2013 platform, Microsoft redesigned the analytics components and integrated analytics with SharePoint search.

Note

SharePoint 2013 replaces the Web Analytics Service Application of SharePoint 2010 with the new analytics engine that is part of search.

From a high level, the new analytics features of SharePoint 2013 provide the following advantages:
  • User recommendations based on usage data tracking

  • Promoted search results based on usage and visit tracking of content

  • More sophisticated usage tracking with the SharePoint search engine platform

  • Search is ubiquitous across the SharePoint platform and, therefore, better equipped to manage usage and analytics

I cover search configuration and usage reports in  Chapter 15.

The Health Analyzer

The previous few sections of this chapter were concerned with reviewing the health of SharePoint proactively. When I first mentioned health and monitoring in this chapter, I said that SharePoint has the capability to monitor and report itself and give administrators a heads-up when potential problems in the platform are brewing. This is the job of the Health Analyzer. The following steps demonstrate how to access the Health Analyzer settings and reports from Central Administration:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click the Monitoring heading link.

     
  3. 3.

    Review the links under the Health Analyzer heading.

     

Because the job of the Health Analyzer and reporting issues is important, you may notice that the Health Analyzer displays a banner on the Central Administration home page when it detects errors or warnings. If you do not see this banner on your Central Administration home page then all is good with your farm. Do not be alarmed if you just installed SharePoint 2013 and now see a red or yellow banner (see Figure 5-28). The Health Analyzer has a number of extensive rules, which it uses to report anything that might pertain to a configuration, security, or operational issue. Sometimes, these rules trigger to warn users, but the issue is not always serious—such as the rule that warns users of the potential to run out of disk space, which occurs if the amount of memory in the system is more than half the available disk space on the system drive (for core dump purposes). This being said, you should pay attention to every warning and error, just in case SharePoint reports a serious issue.

Note

You should pay close attention to every warning and error reported by the Health Analyzer.

Figure 5-28.

Health Analyzer alerts in Central Administration

Click the View these issues link, which navigates you to the same page as the Review problems and solutions link under the Monitoring heading. If the Health Analyzer has picked up issues to address in your farm, the Review Problems and Solutions page should list those issues. See Figure 5-29 for an example from my development farm. In my case, I was expecting a number of warnings in my development environment because I had not completed a full farm configuration at the time I wrote this chapter, such as configuration of search and outgoing e-mail. If I were configuring a farm for production, I would want to address all the issues reported.
Figure 5-29.

Issues reported by the Health Analyzer

  1. 1.

    Click any of the issues, and SharePoint will open a page with more specifics about the issue.

     
  2. 2.

    In some cases, SharePoint can help you fix issues, with the Repair Automatically icon on the dialog ribbon.

     
  3. 3.

    If SharePoint cannot automatically fix an issue, fix the issue manually and then come back to the issue and click the Reanalyze Now icon to request that the Health Analyzer determine if you remedied the issue.

     
The Health Analyzer uses a series of rules to determine if a particular area of the SharePoint platform needs attention.
  1. 1.

    Navigate back to the Monitoring page in Central Administration.

     
  2. 2.

    Click the Review Rule Definitions link.

     
  3. 3.

    SharePoint shows a page consisting of a standard list of rules (Figure 5-30).

     
Figure 5-30.

Rule definitions for the Health Analyzer

  1. 4.

    Click the name of any list item in the appropriate category to view the rule definition.

     
  2. 5.

    You may click the Edit icon to edit the rule list item—you may change the name, scope, schedule, and whether SharePoint can configure the issue automatically.

     

Timer Jobs

Timer jobs work at the heart of a SharePoint farm. Each SharePoint server (web-front-end or application server) hosts a SharePoint timer service, which is a Windows service. This service is responsible for running SharePoint jobs—designated units of functionality to execute a designated time and perhaps recurring.

SharePoint relies on a vast number of timer service jobs to maintain operation of the farm. The following steps demonstrate how to view the available timer job definitions in the farm:
  1. 1.

    Open Central Administration.

     
  2. 2.

    Click the Monitoring heading link.

     
  3. 3.

    Click the Review Job Definitions link, under the Timer Jobs heading.

     
  4. 4.

    SharePoint displays a page like that in Figure 5-31.

     
Figure 5-31.

Timer Job Definitions

Timer job definitions are SharePoint Foundation Timer services, or associated with other SharePoint services, such as the Access or Excel services.
  1. 5.

    Click the View drop-down box in the top right to list timer services by web application, services, or list all jobs.

     
  2. 6.

    Click the name of any of the timer job definitions to see the details of the job.

     
Administrators may change the schedule of most jobs. They may also disable and enable jobs. SharePoint allows creation of new jobs only via code and feature deployment, so seek a developer if you need a special job created. Some of the functional features of SharePoint create timer jobs to perform their tasks; for example, Content Deployment creates a new timer job to deploy content to another farm.
  1. 7.

    Navigate back to the Monitoring page of Central Administration.

     
  2. 8.

    Click the Check Job Status link.

     
  3. 9.

    SharePoint shows you a page of upcoming scheduled jobs, running jobs, and a history of jobs executed, with their completion status (Figure 5-32).

     
Figure 5-32.

Timer Job Statuses

The Developer Dashboard

As much as this book is about administration and not development, I need to say a few words about the SharePoint Developer Dashboard. Microsoft introduced this feature with SharePoint 2010, and it provides performance and tracing information within SharePoint rendered pages. Developers (and administrators) may diagnose slow-rendering pages using the Developer Dashboard. Figure 5-33 is an example of the Developer Dashboard output.
Figure 5-33.

Example output from the Developer Dashboard

The following STSADM command demonstrates enabling the Developer Dashboard:

STSADM-o setproperty –pn developer-dashboard –pv ondemand on

The following command disables it:

STSADM-o setproperty –pn developer-dashboard –pv ondemand off

Summary

In this chapter, I discussed planning for disaster, and you read about how to recover in the event of service downtime. Good planning of your infrastructure enables you to take advantage of warm standby scenarios, and I covered how SharePoint may leverage SQL clustering, SQL mirroring, and failover.

SharePoint provides users of the platform with a degree of control over content integrity, via document versioning and the Recycle Bin to recover deleted lists and list items.

No disaster recovery plan is complete without a mention of backup and restore of content and configuration. I walked you through backup and restore of both content and SharePoint configuration, using Central Administration, PowerShell, and STSADM tools.

Toward the end of this chapter, you explored the Health Analyzer, usage, and health monitoring capabilities of SharePoint 2013 to alert administrators of potential problems in their SharePoint farm. As a nice treat for developers, I introduced you to the Developer Dashboard, so that you may troubleshoot slow-rendering pages in SharePoint.

In  Chapter 6, I will change topics and discuss user profiles and the social capabilities of SharePoint 2013. See you on the next page.

Copyright information

© Rob Garrett 2013

Authors and Affiliations

  • Rob Garrett
    • 1
  1. 1.MDUSA

Personalised recommendations