Log File Rotation
Discusses decisions associated with log file size and rotation.
With web traffic analysis that relies on web server logs, the first consideration you must make is how long to hold onto the raw, unaggregated log files. You may need to access old log files to reanalyze them. For example, you might want to reanalyze raw data based on new configuration settings. Or you might need to reanalyze the log file from a server belonging to a cluster that was not available at the original time of analysis, and then add that reanalysis into an entire day’s worth of logs.
In a log file, a typical hit might range roughly from 250 to 750 bytes. Given that number, consider what happens if your site experiences an average of 10,000 hits per day. This means that your log file can be anywhere from 2.5 MB to 7.5 MB. If your site experiences up to 5,000,000 hits per day (not unusual for enterprise-level organizations) your log file size can easily be several gigabytes. For large organizations with extremely active web sites, generating terabytes of data per year is common.
Because data activity file sizes for even a daily web data activity file can require gigabytes of storage space, most organizations implement a log file rotation scheme that keeps computing resources available for processing tasks. Depending on the volume of traffic that your site experiences, you may wish to rotate (roll over) log files daily, weekly, or monthly.
Rotation schedules can also depend on how you access your log files, and how often you intend to report on those log files. Webtrends should always be configured to analyze log files that have been closed; that is, log files that have been rotated and will no longer be written to with new traffic. If the log files will be analyzed using FTP, the entire log file needs to be transferred to the analysis engine before analysis.
Typically, organizations rotate their log files daily, however, you can rotate log files more frequently if needed. After you rotate the log files and analyze them, determine how long to archive them. How long you archive log files depends on your reasons for keeping the data. Some organizations never intend to reanalyze their data, so they discard data shortly after analysis. Other organizations keep their data forever. Most organizations archive data for a period between one quarter and one year.
Recommendations: Log Files
- Rotate log files daily. Consider rotating log files hourly if you access your log files using FTP, and if your site has a large amount of traffic.
- Archive analyzed log files for one year.