|
No matter what your personal opinion about CIPA, there are some important facts about filtering that will help you decide
what product (if any) to purchase, what configuration options are available, and how to evaluate a filter's effectiveness.
The University of Michigan's School of Information and Health System recently completed an extensive study of filtering software
for the Kaiser Family Foundation, in a project headed up by Professors Paul Resnick and Caroline Richarson. Derek Hansen,
a member of the research team, offers this report.
Not All Filters Are Created Equal There is an enormous amount of variation in the types of filtering software available. Some key variables and related questions
that should be considered when selecting a product include:
Hardware/Software compatibility: Will the filtering software be installed on each individual computer (i.e., client) or on a central computer (i.e., server)?
Does the filtering software require you to have additional software?
Cost: Are there any ongoing fees associated with updating the blocklist? Are there separate installation fees?
Ease of installation and maintenance: Does the company install the software for you? How frequently does it need to be updated, modified, etc.? How difficult is
it to add/remove computers?
Monitoring and reporting capabilities: What statistics are kept and how can they be accessed? Are there standard ready-made reports that you need?
Effectiveness of filtering technology: How frequently are blacklists/whitelists updated? Does the filter have the ability to dynamically classify a site, even when
that site has not already been placed on a blacklist/whitelist?
Configuration Options: Can different computers be configured differently? How many blocking categories (e.g., pornography, gambling, hate, web chat)
are offered, and are they categories that correspond with the libraries policies?
Error handling capabilities: How difficult is it to turn on or off the filter for a given computer? Can custom messages be displayed when a site is blocked?
How difficult is it to add a site to or remove it from a whitelist or blacklist? Who can initiate such a procedure?
Filters are Flexible--Make Them Stretch Over the past several years, most commonly used filters (at least in the library and school setting) have become extremely
versatile. As a result, the goals of the institution using the filter can be better met than in prior years. Notably, most
products offer a wide range of categories that can be blocked (often numbering in the dozens), including categories like “pornography,”
“hate,” “internet chat,” and “gambling,” as well as some categories that can be allowed through, like “health” and “sex education.”
Currently, few (if any) filters include a category that only blocks sites that meet the legal criteria outlined in CIPA. Instead,
to comply with CIPA, libraries must block all categories that include any of the content not permitted by CIPA. This results
in more content being blocked than the law itself would technically require. In most cases, blocking only the “pornography”
category (which may be called “sexually explicit” or something related) is sufficient; however, depending upon how the categories
are defined there may be additional categories that must be blocked in order to comply with CIPA (e.g., “extreme” or “adult”).
The decision about which categories to block (i.e., how to configure the product) is at least as important as which filtering
product to purchase, and should not be made in haste. In fact, in a recent study focused on finding health information, we
found that the configuration of the product had a far greater impact on the amount of over- and underblocking than the choice
of product itself (see report titled “See No Evil: How Internet Filters Affect the Search for Online Health Information” available
at http://www.kff.org/entmedia/20021210a-index.cfm). It is also worth noting that most server-based filters allow different computer stations to be configured differently,
so that terminals in the Youth or Young Adult area of the library can be configured to block more categories than the computers
in the main section of the library, if desired.
Another important way in which filters are flexible is their ability to deal with errors. Many products allow custom messages
that appear when an attempt is made to access a blocked site. These messages can include text or links that prompt the user
about what steps to take if they believe the site should not be blocked. Many products currently include a link which, if
selected, will send a message to the filtering company prompting them to re-evaluate the site. However, this process takes
days at best and could take months, which is hardly beneficial in the short-term. In addition, some products can be set up
so that a library patron could submit a site (perhaps even anonymously) to a librarian, who can review the site and add it
to the “allowed” list (if appropriate) without much difficulty. While this is a bit more labor intensive from the library
standpoint, it considerably reduces the negative effects of overblocking.
No Filter is Free from Errors…But Some are More Error-Free than Others Because of the enormous amount of uncensored, constantly changing information on the Internet, no filter will ever be free
from over- or underblocking errors. Overblocking refers to situations where an “appropriate” site is erroneously blocked by the filter. Underblocking refers to situations where an “inappropriate” site is not blocked. The rates (i.e., percentage) of over- and underblocking
are important measures of the effectiveness of a filter. There are a few things to keep in mind when looking at over- and
underblocking rates.
There is often a tradeoff between over- and underblocking. Similar to the interplay between recall and precision, it is often
the case that as one measure improves the other worsens. The moral of the story: look at both the over- and underblocking rates when determining the effectiveness of a filter.
Over- and underblocking rates depend upon a variety of factors, including the filtering product, the product's configuration
(as described previously), and the set of URLs being tested. In fact, in a recent study performed for the Kaiser Family Foundation,
we found that a product's configuration and the topic of the URL list made a larger difference in the amount of overblocking
than did differences between products (see http://www.kff.org/content/2002/20021210a/ for details). In summary, comparisons of filtering products must use as similar a configuration as possible and be based upon the same set of URLs to
be meaningful.
Unfortunately, there are two completely different (and independent) statistics used to measure the percentage of overblocking
(and underblocking, although there is generally more confusion and disagreement related to overblocking), both of which are
commonly referred to as the “overblocking rate.” One overblocking rate is calculated as a percentage of all “appropriate”
sites in the test set, while the other overblocking rate is taken as a percentage of all blocked sites. Filtering critics
like to look at the fraction of all blocked items that are in error since this number is generally larger; however even if
that percentage is large patrons may rarely experience a block while looking at innocuous information. (See the postscript
below for a more thorough description of these error rates.) Always understand which error rate is being presented and how it should be interpreted.
There have historically been many methodologically weak studies that have calculated and publicized misleading over- and underblocking
rates. The primary problems with these studies relate to:
the selection of URLs to be tested (e.g., no objective and repeatable process of selecting particular URLs; too small a sample
of URLs)
testing of the filtering software itself (e.g., unclear as to which configuration is used) classifying of sites (e.g., researchers do not follow or document a consistent procedure when classifying sites) error rate reporting (e.g., researchers don't present all of the over- and underblocking error rates and/or misinterpret their
meaning - see the postscript below)
Lesson: don't believe everything you read. With these comments in mind, let's look at a few actual numbers (in Table 1) from two of the better designed recent studies.
These two studies are especially pertinent because they review some of the most popular products in use in libraries and schools
and they use definitions similar to those outlined in CIPA to classify sites themselves before comparing them against the
filters to see if they will be blocked. However, these studies are certainly not the last word, since the Department of Justice
study (available at http://www.etestinglabs.com/clients/reports/usdoj/usdoj.pdf) includes a rather small sample of websites and the Kaiser study published in JAMA (available at http://jama.ama-assn.org/
) focuses on health information.
Table 1 Smartfilter 8e6 Websense CyberPatrol Symantec N2H2 % of Non-Pornography URLs blocked (i.e., OK-sites overblock rate) DOJ 7.1 N/A 0.0* 6.1 N/A 1.0 Kaiser** Least 2.3 1.1 0.6 1.6 1.9 0.8 Moderate 5.8 4.5 3.8 2.8 7.6 6.5 Most 18.2 15.1 35.4 22.4 33.5 19.5 % of Pornography URLs blocked (e.g., 1 - Bad-sites underblock rate) DOJ 94.4 N/A 92.4 82.7 N/A 98.0 Kaiser** Least 87.2 89.1 83.9 85.7 87.8 89.5 Moderate 88.7 90.9 91.3 85.7 89.3 92.8 Most 89.0 92.1 93.8 87.2 90.5 94.0 Categories blocked under “Least” & presumably sufficient to comply with CIPA Blocked Sex and Extreme Pornography Sex (in Adult Material) Adult/Sexually Explicit Sex/acts Pornography Allowed All Exceptions All Exceptions *Numbers with the lowest error rates are italicized **Three different configurations were used in the Kaiser study including: Least restrictive (designed to only block out content
forbidden by CIPA - this configuration exactly matches the DOJ configuration), Moderate restrictive (modeled after a large
school system), and Most restrictive (which blocks all categories except educationally related ones)
So what do these numbers tell us? They tell us that there is a significant, but small difference between products. There is
a large difference in the amount of overblocking when the configurations differ. When configured at the least restrictive
setting (as required by CIPA), only a very small portion of the “appropriate” sites encountered by a library patron would
be erroneously blocked. Even on the most restrictive setting, statistically speaking nearly 1 in 10 “inappropriate” sites
are not blocked by filters, implying that librarians may need to rely upon other methods such as education and monitoring
to completely eliminate pornography from the library.
Finally, one piece that is not obvious from these numbers, but which comes out more in the detailed Kaiser report (found at
http://www.kff.org/content/2002/20021210a/) is the impact of different topics on overblocking rates in particular. For example, even on the Least restrictive setting,
around 9% of websites that came up when searching for “safe sex” or “condom” were erroneously overblocked. This percent increases
to over 50% on the Most restrictive setting. These numbers emphasize that overblocking can become a problem when searching
for certain topics, even if the overblocking rates for most topics are low. When overblocking becomes a problem, it is helpful
to have a workaround as described earlier in this paper.
Conclusions Filters are a bit like children. They come in all shapes and sizes. They don't always do what they are told, although they
generally get it right. They are at their best when they are taught to use all of their capabilities. And at times they require
some discipline. In short, they'll never be perfect, but they can influenced to reach their potential.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Documents
| CIPA: Which Filtering Software to Use? |
Derek Hansen of the University of Michigan's School of Information summarizes the key findings of a substantial filtering software research project--required reading if you are evaluating filtering solutions.
|
|
Contribute to this topic
Do you have an article, presentation, or other content to share on this topic?
You can post it on this topic page. Find out more about submitting documents in the Member Center.
Ratings You must be signed in to rate this item
|
Average (0 Votes)
![]() ![]() ![]() ![]()
|
Comments
