What Factors Make Tables Fill Up, and How Can I Address Them?

Table limiting is a reasonable way to limit the huge amount of data generated by web activity files and still collect meaningful data. You may need to adjust table limits upward or consider modifying your reporting strategy to create reports that result in smaller tables.

How Tables Fill Up

The main factor that causes tables to reach their limits is large numbers of unique elements for a report dimension. You can use this expectation to detect problems with the JavaScript code-if the Java Support report collects more than two different values, the tracking code may have been modified or corrupted. For example, the JavaSupport table has only two possible unique elements: enabled and disabled. Even with the default limit of 10 elements, this table never reaches its limit. The TopVisitors table, however, contains as many unique elements as there are unique visitors. Each of these elements requires an entry in the table. If you do not limit the table, performance can be affected.

In many cases, the number of possible values for a given dimension also depends on the website. Some sites have a very limited number of pages, but many unique visitors. A knowledge base site that stores each article on a separate page, however, may generate a very large number of page views even given a relatively small number of visits or visitors.

Webtrends provides several ways of limiting the number of unique elements analyzed. One powerful way of limiting unique elements is using filters at either the profile or the custom report level to include or exclude data. Another possibility is to use URL Rebuilding to modify URLs containing non-significant parameters so that fewer unique URLs are analyzed.

Combining Report Dimensions

In a two-dimensional report, the number of unique elements for both the first and second dimensions comes into play. To assess potential table sizes, multiply the number of possible unique values for the primary dimension by the possible unique values for the second dimension. For example, a report where Pages is the primary dimension and Visitors is the secondary dimension has the potential to generate a huge number of unique elements. A website with 200 unique pages and 5000 visitors could yield 1,000,000 unique elements.

For custom reports with two dimensions, you can limit these combinations by specifying analysis and report limits for both the Primary and Secondary dimensions. For the Secondary dimension, you can limit the total number of Secondary Dimension elements Webtrends analyzes, the number of elements analyzed per Primary Dimension element, or both.

For example, if you select Campaign IDs as the Primary Dimension and you limit the number of Secondary Dimension values per dimension to 10,000 elements, Webtrends can analyze 10,000 Secondary Dimension values for each campaign. If you then specify an overall Secondary Dimension limit of 30,000 elements, Webtrends stops after analyzing 30,000 elements, and further data about Secondary Dimensions is reported only in aggregate. Even if there are seven campaigns and only the Secondary elements for the first three are included in the analysis tables, further data collected for that analysis is not reported individually.

You can use limits to keep report tables in check and maximize performance when creating 2-dimensional reports, or you can expand your table limits to maximize the unique data you can collect, while keeping in mind that performance may be a problem. However, some combinations of dimensions, like Pages by Visits reports, simply generate too many unique elements for you to preserve a reasonable balance of meaningful data and manageable performance. One alternative approach for this particular combination is to create Content Groups to isolate smaller groups of pages, and then use Content Groups as a dimension instead of all Pages. If you use this method, keep in mind that the product of your expected visits and the number of Content Groups can still be very large.

Drilldowns

Drilldown reports present some additional issues with data sizing. When you create a drilldown dimension, you are factorially combining the elements for several unique dimensions. Depending on the structure of your data, this means unique elements can add up very quickly. You can limit the tables for a drilldown by limiting the overall number of elements for the drilldown, the number of elements in each level of the drilldown, or both.

You can also minimize the amount of accumulated data in drilldowns by keeping in mind the following guidelines.

Hierarchical Drilldowns

In a hierarchical drilldown, there is a many-to-one relationship between dimensions in the drilldown as you move from a higher to a lower level. In a drilldown where this is the case, you can assess table size by adding all the elements at the lowest level of the hierarchy. For example, the Products drilldown reports the following levels:

Each Product Group contains multiple Product Families: each Product Family contains multiple Product Categories; and so on. Because the numbers become larger as you move down the levels, meaningful categories are less likely to be trimmed from the report table.

Non-Hierarchical Drilldowns

In contrast, if the top levels of your drilldown contain dimensions that can contain many possible elements, or if different elements at the same level contain the same sub-elements, your tables fill more quickly and meaningful categories are more likely to be trimmed from your reports. You can assess the table size for non-hierarchical drilldowns by multiplying the possible elements in each level, which can yield much higher numbers of elements than in a hierarchical drilldown.

For a drilldown where the top levels contain hundreds of possible items, most of the overall table space is filled by the first few thousand significant items if you use the default limits. Alternately, if you choose not to limit tables, you may need to allocate an unreasonable amount of memory to analysis. Even so, you may not have enough memory to process the profile.

Your tables also fill up more quickly if the structure of your data is non-hierarchical, so that the same subcategories can be found in more than one branch at the same level. For example, suppose the Product Categories level of your Product drilldown contains the subcategories DVDs and Home Video. If each DVD you sell belongs to both the DVD subcategory and to the Home Video category, you generate more unique elements because each combination of subcategory and item constitutes a unique element.

You can, of course, increase table limits. However, if you find that performance becomes a problem, another approach is to divide your drilldowns into smaller drilldown reports that contain only two or three dimensions. You can also create segments of your drilldowns as 2-dimensional custom reports.