Archiving

Discusses archiving strategy and best practices.

Archiving is a critical part of maintaining a fail-safe environment and designing a archiving strategy is particularly important in a web analytics environment. The Webtrends Archiving feature allows administrators to automate backups of Webtrends databases after incremental analysis cycles.

Occasionally, after you analyze data, you may need to return your analysis to a point when you knew the analysis results were based on a correct configuration. For example, suppose you add a new content group to your Webtrends installation. This content group contains a group of new pages that relate to a new product. A week later, when you review your weekly report, you discover that the content group is not included in your reports. Investigation shows that improper syntax was used to define the pages in the content group. As a result, Webtrends did not analyze hits to those pages.

If you created periodic backup copies of your summary tables database along the way, Webtrends Analytics software offers the ability to take a snapshot of the database. Depending on what the analysis software is configured to create, the snapshot may include a copy of the daily, weekly, monthly, quarterly, and/or yearly summary tables at a point in time. You can restore that copy in the event that you run into problems with your analysis later on. After you restore the data to the last known good copy, you will need to fill in the data that was not contained in that backup. This requires you to restore and re-analyze the raw log files for the data from the time of the backup to the most current log file. For more information about how to back up and restore your Webtrends Analytics installation, see Backing Up and Restoring Webtrends Data.

Let’s go back to the earlier example in which the content group was incorrectly set up. If your web site has a significant amount of traffic, and each daily log file analysis requires around 10 minutes to run, you might determine that you can afford the time it would take to re-analyze up to twenty-eight days of data at any given time. You also decide that 28 days is enough time to discover any issues, given that you review reports once a week. You can afford to store four backups of the data. This means that when you create a fifth backup, it replaces the oldest backup.

In this situation, a sensible solution is to back the data up every seven days, and maintain four backups. This solution allows you to maximize the amount of storage space you have and assures that you will catch any problems with the data long before your oldest archive is overwritten.

Figure 1. Archiving Scenario. Archives created over several days enable you to resolve issues discovered with analysis data.

In this situation, you have these options:

  • Correct the syntax for the new content group and re-analyze the data, and then go back and import all the raw log files from day one (assuming you still have those log files).
  • Go back to the last known good set of summary tables and then re-analyze the data from that day up to the current day. In this case, you would restore Archive 2, the last archive that contained data without the syntax problem, correct the syntax for the new content group, and then you would re-analyze the raw log file data up to the current day.

As you can imagine, creating and maintaining multiple backup copies of an entire database can require substantial storage space on your computer. It’s important to balance the storage space you have available with the number of backup copies you can afford to store at any given time. This calculation also depends on how long it would take to restore lost data, which in turn depends on how much traffic your site experiences, which summary tables you choose to create, and how powerful your system is.

How often you may need to back up data also depends on how closely you monitor the results of your data. If you only review results once a day, creating backups every day or every two days may be sufficient, because you will probably find any issues within a few days.

Recommendations: Archives

  • Check how much disk storage space you have to save the backups versus the average size of a backup.
  • Determine how long it takes to restore data by analyzing it from the raw log file. This is affected by how much traffic your site generates, which summary tables you choose to create (daily, weekly, monthly, etc.), and how fast your system can process the data.
  • Figure out how soon you are likely to find issues that may necessitate restoring a backup by how closely and frequently you monitor your analysis results.
  • Make sure you store your backups in a location that allows you to restore them in the event of a disk failure.

Was this topic helpful? Send feedback.