Thursday, February 28, 2008

Website: Odeo

Millions of MP3s and 1000's of audio channels—podcasts, music, and more. Listen, download, subscribe... FREE

Hi, this is Odeo.

We have 3,848,993 mp3s from all over the web, which have been played from Odeo 29,820,226 times.

You can download or play them straight from here for free. (You can also put them on your web site.)

And like 378,823 other people, you can create an account, so you can subscribe to things and save the stuff you like.

Tuesday, February 26, 2008

What is: Personal Learning Network (PLN)

Personal Learning Networks (personal and community-focused)

Following up on the distinction between groups and networks, Downes says, “A group, in other words, is a school (of thought, of fish…) or a class of some sort. Or: classes and schools are just groups. They are defined as groups. Can we even think of schools - and of learning - without thinking at the same time of the attributes of groups? A group is elemental, defined by mass and sameness - like an ingot of metal. A group is a collection of entities or members according to their nature; what defines a group is the quality [of sameness] members possess and [their] number.”

One can certainly argue with Downes, but isn’t the slide interesting when thinking about schools as groups and online personal learning spaces as networks?

SOME CONTEXT

I’ve been thinking and studying a lot about personal learning networks for quite some time, on both a personal, individual level and also a shared, community level. The buzz words that resonate the most for me around these topics right now seem to be personal learning networks, personal learning environments, professional learning communities, communities of practice, and virtual learning communities. In this consideration, I don’t mean to leave out face-to-face communication, but am particularly interested in how web 2.0 technologies support and enhance personal and community-focused learning, and can even occur with great impact without any face-to-face contact at all.

Individualized Networked Learning

The problem of the edublogasphere (and actually the whole blogasphere) in the context of learning is that people in the sphere do not - at least often - form any groups (an entity of individuals with an objective).

As I’m trying to think more and more deeply about what networked learning really means in the context of how I might want my own children to apply it their own lives, I think this quote struck me because it made me consider how little I’ve actually engaged in group learning around a particular objective within the network. It is, as Teemu says something that doesn’t really appear very often. This has become, for me at least, a very individualized experience. I’ve referred to it in the past as “nomadic learning” because it happens in a very non-linear, concrete objective-less way. (Technically, I think most are attaching the word nomadic to it because of the mobility of the technology to learn, not the randomness of it.) My learning has a general focus and direction, to be sure, but it’s trajectory is determined by whatever is in my aggregator or on my screen at the moment. There are no written down goals or outcomes that I am attempting to achieve which is one of the reasons this is so different from classroom learning.

What does learner-centric look like?

There is a paradigm shift from instructor-centric to learner-centric model of learning. (Refer to diagram)

The priority at upper management level may well be principally around the cost-cutting benefits of online delivery, but it is a different story for those who have operational responsibility for the outcomes achieved by e-learning and blended learning programmes.

There is already an 'old days' of organisational learning, looked back on by few with nostalgia, when it was all about the 'country house' model; very 'event-driven', very 'top-down' in character. Times have definitely changed.

The most obvious difference, with the advent of the internet, is one of location. Learning can now take place in a variety of environments, including the workplace, a learning centre attached to the workplace, or even in the home.

This change is not only being driven by what is happening within the training departments, however. The way we work has been altered radically, in many cases, by the advent of email, web access and intranets. These provide tools for just-in-time learning, reference and knowledge management much of which falls outside the remit of training per se, but which is nevertheless altering attitudes to what constitutes a learning experience.

As a result, a new generation of learners is coming up that expects to access learning in different ways.

So what is it that these new learners want, exactly, from a learning experience?

Moving from 'training push' to 'learning pull'

When the mode of the music changes, wrote Plato, the walls of the city shake. Since the advent of e-learning, and particularly blended learning, there has been a definite change of tune from the training and development function (now rebranded 'the learning community'). A chorus of voices - heard in conference and exhibition halls across the land - urges us to 'put the learner at the centre of the experience'. We need to move, it is trumpeted, to learner-centric learning.

And indeed, this particular change of mode poses a palpable threat to certain key bits of masonry within the ambit of organisational learning. We're not talking solely metaphorically here. More than one global concern in recent times has closed down its bricks-and-mortar training centre in favour of an online equivalent.

Are we experiencing a paradigm shift? Or is this no more than mood music designed to cover the sound of axes being swung? Is the real driver behind e-learning adoption a cost-cutting agenda - which seeks merely to pare away expensive face-to-face interventions, while leaving existing organisational structures untouched?

What do learners want?

Clearly, not all learners want the same thing.

What they want might vary widely depending on the type of company they are in, as we have seen. Likewise, different groups of learners within an organisation will have different needs and priorities.

Segmentation is the art (or science) of dividing up an audience into appropriate target groups for marketing purposes. It allows these sub-groups to be marketed to according to needs and motivators that they share in common. It also allows an appropriate allocation of resources among the various groups according to the value that can be expected to be derived as a result of marketing activity.

The learner-centric organisation will need to take this logic on board when marketing its learning provision internally. The first step in doing so is to establish what the various needs and motivators are.

Useful information in this line can often be derived from looking at take-up of existing learning programmes. For instance…

Marketing water to horses

Early experiments in creating learning-centred environments, where generic e-learning was offered without any marketing or line manager support, and with no blending, had disappointing results. The moral of this story was that you can lead a horse to water, but you can't make it drink.

So let's take a marketing approach to this problem.

The horse needs to be thirsty (incite desire). It might need some reassurance about the quality of the water (create a trusted brand). If it is an extremely sceptical horse it might need to be convinced that drinking water here will deliver health improvements (sell the benefits), and told a little about the deleterious effects on health of not drinking water in the long term (okay, a bit of stick). Lastly, and most importantly, it needs to be told about these things in a language that it can understand - i.e. horse language...

Website: FotoFlexer

The world's most advanced online image editor. It performs advanced effects previously only available to professionals using expensive software. FotoFlexer was founded by Arbor Labs, a team of graduate students and alumni from the Center for Entrepreneurship and Technology, University of California at Berkeley (and just one MIT/Stanford alum too).

Tuesday, February 12, 2008

How To: Repair WMI

Symptoms of the problem

When you try to access a Windows XP-based network computer, you receive an error message that resembles the following:

xxxxxxx is not accessible. You might not have permission to use this network resource. Access is denied.

Note In this error message, xxxxx is the IP address or the computer name of the Windows XP-based computer.

You may experience this issue when you use the IP address or the computer name to access a shared folder that is stored on the Windows XP-based computer. You may also experience this issue when you use My Network Places to access a shared folder in this situation.

Cause:

WMI (Windows Management Instrumentation) is corrupted.

Solution:

Re-install WMI:

This is the one that worked in my case:

Comprehensive rebuild method

Click Start, Run and type the following command, and press ENTER:

rundll32.exe setupapi,InstallHinfSection WBEM 132 %windir%\inf\wbemoc.inf

Insert your Windows XP CD into the drive when prompted. Repair process should take few minutes to complete. Then restart Windows for the changes to take effect.

Microsoft's WMI diagnosis utility

Microsoft has released a new diagnosis tool that helps system administrators diagnose and repair problems with the WMI. It is advisable to go through the WMIDiag_ReadMe.doc file to understand how the utility works, and the supported command-line arguments. Here is the download link: The WMI Diagnosis Utility

You may use the utility (WMIDiag.vbs) to find if a repository rebuild is necessary or not. The utility runs a comprehensive testing of WMI and reports the results to a log file, which is placed in the user's Temp folder (%Temp%) folder by default.

Mr. Alain Lissoir [MS] was kind enough to share the following info with me:

If there is a need among all problems detected by WMIDiag to rebuild the repository after attempting all fix suggested by WMIDiag, WMIDiag will list the rebuild of the repository as the last action among thing that can be done (before rebuilding the repository).

Here are some of the useful links that I came across from Alain's homepage:

Rebuilding the WMI Repository

If you experience behavior when using WMI, such as application errors or scripts that used to work are no longer working, you may have a corrupted WMI repository. To fix a corrupted WMI repository, you have to reinstall WMI. Follow these steps:

For Windows XP Service Pack 2

Click Start, Run and type the following command:

rundll32 wbemupgd, UpgradeRepository

This command is used to detect and repair a corrupted WMI Repository. The results are stored in the setup.log (%windir%\system32\wbem\logs\setup.log) file.

For Windows Server 2003

Use the following command to detect and repair a corrupted WMI Repository:

rundll32 wbemupgd, RepairWMISetup

On other versions of Windows you can rebuild the Repository by doing the following:

Click Start, Run and type CMD.EXE
Type this command and press Enter:

net stop winmgmt

Using Windows Explorer, rename the folder %windir%\System32\Wbem\Repository. (For example, %windir%\System32\Wbem\Repository_bad.). %windir% represents the path to the Windows directory, which is typically C:\Windows.
Switch to Command Prompt window, and type the following and press ENTER after each line:

net start winmgmt

EXIT

Courtesy: The above is excerpted from Microsoft Technet article WMI Isn't Working!
© 2007 Microsoft Corporation. All rights reserved.

Re-registering the WMI components (Ref WMI FAQ)

The .DLL and .EXE files used by WMI are located in %windir%\system32\wbem. You might need to re-register all the .DLL and .EXE files in this directory. If you are running a 64-bit system you might also need to check for .DLLs and .EXE files in %windir%\sysWOW64\wbem.

To re-register the WMI components, run the following commands at the command prompt:

cd /d %windir%\system32\wbem
for %i in (*.dll) do RegSvr32 -s %i
for %i in (*.exe) do %i /RegServer

Note that none of the above two methods restore the missing files related to Windows Management Instrumentation (WMI). So, below is a comprehensive repair procedure that restores all the missing WMI modules. In case of missing WMI modules, you may use the following method.

Comprehensive rebuild method

Important note: If you've installed a Service Pack, you need to insert your Windows XP CD with Service Pack integration (called as the Slipstreamed Windows XP CD). If you don't have one, you may point to the %Windir%\ServicePackFiles\i386 folder for a recent version of the system files required during WMI repair. Or you may create a slipstreamed Windows XP CD and insert it when prompted.

Click Start, Run and type the following command, and press ENTER:

rundll32.exe setupapi,InstallHinfSection WBEM 132 %windir%\inf\wbemoc.inf

Insert your Windows XP CD into the drive when prompted. Repair process should take few minutes to complete. Then restart Windows for the changes to take effect.

http://windowsxp.mvps.org/repairwmi.htm

Sunday, February 10, 2008

How To: Install Squid Cache for Windows

Complete Guide on Installing and Configuring Squid Proxy Server for Windows

Here’s another guide again by me to configure / installing Squid Cache Proxy to be an Anonymous Proxy Server and filtering some ads / banners (on Windows not Linux since most Linux users already know about this) :)

First download Squid 2.6 Stable 1 for Windows from Acme-Consulting and don’t forget to extract it into a directory of your choice, and Download JAP from here. And as a note, i wrote this tutorial using Squid 2.6 Stable 1 so if you’re using Squid 2.5 series there’s some parameters you’ll need to change first in order for it to work

Note: in this tutorial i’ll use G:\Squid as the Squid Directory

After extracting it into G:\Squid, go into etc directory “G:\Squid\etc” and rename all the .default file into .conf file. For example squid.conf.default -> squid.conf, mime.conf.default -> mime.conf, etc
Open squid.conf file using your favorite text editor such as Notepad, Ultraedit, etc. and configure it like this (you can change it later) :) but im not going into more detail here since most squid configuration are self explanatory and that’s depending on your hardware too (for example the memory size, cache size, etc). But you can always use this squid.conf directly in your squid configuration without changing any of its parameter as long as you extract squid into G:\Squid directory otherwise you’ll need to change every parameter that include G:\squid into your squid path

# HTTP Port (in this tutorial squid will run on localhost at port 3128)
http_port 127.0.0.1:3128

# ICP Port and HTCP Port (we’ll disable this since we are not going to use it)
icp_port 0
htcp_port 0

# Cache Peer (we’ll forward all request into parent proxy)
cache_peer 127.0.0.1 parent 4001 7 no-query

# Cache directory (in this example i was using 3000 MB / 3 GB space to store squid cache)
cache_dir awin32 g:/squid/var/cache 3000 16 256

# access_log
access_log g:/squid/var/logs/access.log squid

# cache_log
cache_log g:/squid/var/logs/cache.log

# cache_store_log
cache_store_log none

# mime_table
mime_table g:/squid/etc/mime.conf

# pid_filename
pid_filename g:/squid/var/logs/squid.pid

# unlinkd_program
unlinkd_program g:/squid/libexec/unlinkd.exe

# refresh_pattern (you can configure this as you like it, to get more hits from a website)
# note: if you change this parameter “refresh_pattern . 1 100% 20160 reload-into-ims ignore-reload” into something else for
# example like “refresh_pattern . 10 100% 20160 reload-into-ims ignore-reload”
# there’ll be some error on some page (Gamefaqs.com for an example) because the page didnt reload correctly after login into Gamefaqs
refresh_pattern ^http://.*\.gif$ 1440 50% 20160 reload-into-ims
refresh_pattern ^http://.*\.asis$ 1440 50% 20160
refresh_pattern -i \.png$ 10080 150% 40320 reload-into-ims
refresh_pattern -i \.jpg$ 10080 150% 40320 reload-into-ims
refresh_pattern -i \.bmp$ 10080 150% 40320 reload-into-ims
refresh_pattern -i \.gif$ 10080 300% 40320 reload-into-ims
refresh_pattern -i \.ico$ 10080 300% 40320 reload-into-ims
refresh_pattern -i \.swf$ 10080 300% 40320 reload-into-ims
refresh_pattern -i \.flv$ 10080 300% 40320 reload-into-ims
refresh_pattern -i \.rar$ 10080 150% 40320
refresh_pattern -i \.ram$ 10080 150% 40320
refresh_pattern -i \.txt$ 1440 100% 20160 reload-into-ims override-lastmod
refresh_pattern -i \.css$ 1440 60% 20160
refresh_pattern ^http:// 1 100% 20160 reload-into-ims ignore-reload
refresh_pattern ^ftp:// 240 50% 20160
refresh_pattern ^gopher:// 240 40% 20160
refresh_pattern /cgi-bin/ 0 0% 30
refresh_pattern . 0 100% 20160 reload-into-ims

# Deny requests to unknown ports
http_access deny !Safe_ports

# Deny CONNECT to other than SSL ports
http_access deny CONNECT !SSL_ports

# Block access to Malware & ads farm site
# Insert your own rule here by using
# acl blablabla url_regex -i “path to file”
# or
# acl blablabla url_regex “path to file”

http_access allow localhost
http_access deny all
cache_mgr Reaper-X
httpd_suppress_version_string on
visible_hostname Reaper
via off
forwarded_for off
log_icp_queries off
client_db off
never_direct allow all

#Some anonymizing
header_access From deny all
#there’s some website which use referer check
#so its better to disable this
#header_access Referer deny all
header_access WWW-Authenticate deny all
header_access Link deny all
header_access Warning deny all
header_access Via deny all
header_access User-Agent deny all
header_access Proxy-Connection deny all
header_access X-Forwarded-For deny all

Now the next step would be running JAP, configure browsers to use Squid Proxy and start squid and you’re finished …. ;)

http://www.reaper-x.com/2006/07/18/complete-guide-on-installing-and-configuring-squid-proxy-server-for-windows/

Installing Squid Cache for Windows

Linux users mostly already know Squid proxy server as the best and most used proxy server. As on my previous post “Bandwidth Shaping Using Squid Cache and WIPFW” I need a free proxy server for my windows server. I found SquidNT which is ported from its Linux version by Guido Serassio.

You can download SquidNT from Acme Consulting’s website, click here. If you want to do bandwidth shaping then you must download SquidNT with Delay Pool version. On this installation guide, I use the Delay Pool version as I want to do bandwidth shaping.

Step 1: download SquidNT Delay Pool version here: http://squid.acmeconsulting.it/download/squid-2.6.STABLE12-bin-DELAYP.zip

Step 2: extract the zip file and put it on C: drive

Step 3: configure the squid.conf file on /etc folder. There is squid.conf.default you can rename it to squid.conf and edit it.

Step 4: configure the DNS nameserver. On squid.conf find:

# TAG: dns_nameservers
# Use this if you want to specify
# a list of DNS name servers (IP addresses)
# to use instead of those given in your
# /etc/resolv.conf file.
#
# Example: dns_nameservers 10.0.0.1 192.172.0.4
#
#Default:
# none
dns_nameservers 192.168.0.1

To find what is your nameserver is type: ipconfig on command prompt and find the IP number on Default Gateway field. Copy it to your squid.conf file like above.

Step 5: setup ACL

# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS

# Example rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from
# where browsing should be allowed
#acl our_networks src 192.168.1.0/24 192.168.2.0/24
#http_access allow our_networks
acl our_networks src 192.168.0.0/16
http_access allow our_networks

Here you can setup which network that allowed to use your proxy server. From ipconfig command you can find out what is your IP address, usually it have 192.168.0.x format so you can apply the configuration above.

Step 6: Setup the hostname

# TAG: visible_hostname
# If you want to present a special hostname …
# then define this. Otherwise, the return …
# will be used. If you have multiple caches …
# get errors about IP-forwarding you must …
# names with this setting.
#
#Default:
# none
visible_hostname localhost

Here you can define the name for your hostname, for example you can use “localhost” or “server.youdomain.com”

Step 7: Setup cache directory

Run this command from command prompt: c:\squid\sbin\squid -D –z

Step 8: On Windows XP/2000/2003 you can setup SquidNT as a service

Run this command from command prompt: c:\squid\sbin\squid –i

You can start/stop/restart the service called Squid from: Control Panel > Administrative Tools > Services

Step 9: Setup your browser to use proxy server

For Internet Explorer users, go to: Tools > Internet Options. Select Connection tab and click on LAN Settings

On the pop up window you’ll find proxy box, give a check on “Use a proxy server for your LAN…” and fill your server’s IP (where you install SquidNT) on the address field and fill “3128” on port field. 3128 is the default port for SquidNT.

Click Ok to save the configuration. Now try to open a web page and see if you can open it. If you can then the configuration is set correctly.

Step 10: Setup the Delay Pool.

As I want to do bandwidth shaping then I needs to setup the Delay Pool. Here is the configuration:

#
#Default:
# delay_pools 0
delay_pools 1
delay_class 1 1

Then create delay_access:

# delay_access 2 allow lotsa_little_clients
# delay_access 2 deny all
#
#Default:
# none
delay_access 1 allow our_networks
delay_access 1 deny all

Now we setup how much bandwidth we want to allocate. For example, you have 384 Kbps ADSL connection which means you can download at around 40KB/s. Now you want to shape the maximum to around 30KB/s download rate, here is the configuration:

#
#delay_parameters 2 32000/32000 8000/8000 600/8000
#
# There must be one delay_parameters line for each delay pool.
#
#Default:
# none
delay_parameters 1 30000/30000

Step 11: Restart the Squid service from: Control Panel > Administrative Tools > Services

Done! Now you have 30KB/s for browsing and another 10KB/s reserved for other internet connection like chatting or streaming radio

Markus

http://markus.revti.com/index.php/2007/06/10/installing-squid-cache-for-windows/

Squid-setup for Windows NT/2K/XP

What is Squid?

· This is an easy setup for the Squid Proxyserver compiled on windows by Guido Serassio.

· full-featured Web proxy cache

· designed to run on Unix systems, this versions runs on WindowsNT -core

· free, open-source software

Do I need Squid?

· No! Nobody needs squid! Yo can get happy without it. But if you need an free web proxy that runs on Windows, this program is an little step to get happy in your life ;-).

· With this version of Squid you can limit access to the web by proxy authentication

· Squid saves bandwith by caching often used web sites.

· More secure: You can limit access to the proxy by ip-ranges etc and only 1 computer needs direct acess to the internet.

· You can filter web sites like xxx.com

Features of this setup

· Runs out of the box

· You can choose between External (dialup) and Internal (LAN) DNS handling good for dialup and LAN conections.

· You don´t need to install it manuall.

· The setup creates an cache with max 100 MB space, if you need more space you have to change this in the squid.conf and build an new cache.

· I have added som little cmd´s for easy controll of the squid service and you can install some at-jobs for automatic maintainance of squid. Rotating log-files before they get too big, this is very important for squid to work 24/7.

Download

Paths with spaces (like C:\Programs Files\Squid) are NOT supported by Squid !!!

HOWTO upgrade from older Squid 2.5 versions to STABLE 5

Howto make an NT-User auth

--!!! New Squid 2.5 Stable 3 Setup english beta !!!--

-!!! New Squid 2.5 Stable 3 Setup deutsch beta !!!--

Squid 2.5 Stable 2 Setup english beta

Squid 2.5 Stable 2 Setup deutsch beta

Squid 2.5 Stable 1 Setup english

Squid 2.5 Stable 1 Setup deutsch

Squid 2.3 Stable 5 Setup english

Squid 2.3 Stable 5 Setup deutsch

· All setups are made with Inno Setup Compiler, Scriptmaker and ISTool

Inno Setupscripts for all Squid versions

http://www.bofi.camelot.de/squid.htm

Installing and configuring SquidNT

Introduction

PaperCut Internet Charging and Quotas requires a proxy server to manage Internet connectivity and log internet usage by your users. Squid is one of the best known proxy servers, and typically is run on a Linux/Unix machine, however in some environments a proxy needs to be run on a Windows machine. Fortunately Squid is available for Windows, and is available for download as the SquidNT package.

(To setup Squid on a Linux/Unix machine and get configure it to authenticate with Windows, see our article Configuring Squid on Linux to authenticate with Active Directory.)

Installing SquidNT

Download the latest version of SquidNT from here.

This guide has been written for 2.5.STABLE14-NT (download). The guide has also been updated to work with Squid 2.6.

Unzip the Squid zip file (e.g. squid-2.5.STABLE14-NT-bin.zip) file to a temporary directory. This will create a folder called squid-2.5.STABLE14-NT-bin\squid. Move the squid subdirectory to a location where you want Squid to be installed. E.g. c:\squid. (NOTE: You cannot install Squid in a directory containing spaces, like C:\Program Files.)

Open a command line window (cmd.exe), and change to the directory you installed Squid to. E.g. cd \squid

Install the Squid service by running the following:

    C:\squid>sbin\squid.exe –i

Setup the default config files by copying the template configuration files in C:\squid. Copy the following three files to C:\squid\etc.

    squid.conf.default      to C:\squid\etc\squid.conf

    mime.conf.default       to C:\squid\etc\mime.conf

    cachemgr.conf.default   to C:\squid\etc\squid.conf

Then create the Squid cache directories by running the following:

    C:\squid>sbin\squid -z

Squid is now ready to start. Start the Squid NT service from the Services Control Panel applet. (Control Panel->Administrative Tools->Services). If Squid starts correctly you will not receive an error, and the cache log file will not contain any errors (C:\squid\var\logs\cache.log).

Configuring user authentication

In this configuration Squid is locked down to not allow any access, so the config file needs to be modified to allow connections to users on the network. Shutdown the Squid service.

Open the Squid config file (C:\squid\etc\squid.conf).

To enable authentication against your Windows domain or Active Directory, add the following to your config file around line 1290. This tells Squid to use NTLM authentication (i.e. automatically login users without prompting for a password).

For Squid 2.5:

    auth_param ntlm program c:/squid/libexec/win32_ntlm_auth.exe

    auth_param ntlm children 5

    auth_param ntlm max_challenge_reuses 0

    auth_param ntlm max_challenge_lifetime 2 minutes

    auth_param ntlm use_ntlm_negotiate on

For Squid 2.6:

    auth_param ntlm program c:/squid/libexec/mswin_ntlm_auth.exe

    auth_param ntlm children 5

Then define an ACL (access control list) entry that allows users on your network to use the proxy if authenticated. Go to approximately line 1830 of the file, and add the lines:

    acl localnet proxy_auth REQUIRED src 192.168.1.0/24

    http_access allow localnet

(But change the IP address mask as appropriate for your network. You can specify multiple network masks by separating them with spaces).

Now restart Squid and ensure that it starts correctly. Configure a browser to use the Squid proxy (port 3128 by default), and try to access an external web site. You should be able to visit the site successfully. To check that the authentication is working correctly open the C:\squid\var\logs\access.log file, and you should see log entries for the web site you visited, and importantly your username in the log file. Below are sample logs from visiting google.com. Note the username vm-domain\administrator, where vm-domain is the name of the domain, and administrator is the name of the user.

    1118015367.061    703 127.0.0.1 TCP_MISS/302 405 GET http://google.com/ vm-domain\administrator DIRECT/216.239.57.99 text/html

    1118015367.749    688 127.0.0.1 TCP_MISS/302 411 GET http://www.google.com/ vm-domain\administrator DIRECT/66.102.7.104 text/html

Allowing access only to members of a Window Group

The next step is to only allow users access if they belong to a Windows security group. This can be used to enforce Internet access policy on your domain, and allow PaperCut to restrict access to users who have used their entire available quota. First we need to add the external ACL types to check for Windows group membership. Go to about line 1396 and add the following:

For Squid 2.5:

    external_acl_type win_domain_group ttl=120 %LOGIN c:/squid/libexec/win32_check_group.exe -G

    external_acl_type win_local_group ttl=120 %LOGIN c:/squid/libexec/win32_check_group.exe

For Squid 2.6:

    external_acl_type win_domain_group ttl=120 %LOGIN c:/squid/libexec/mswin_check_lm_group.exe -G

    external_acl_type win_local_group ttl=120 %LOGIN c:/squid/libexec/mswin_check_lm_group.exe

(The first entry is used to check domain group membership, the second is for local groups. You only have to add the lines you are going to use. Users of PaperCut typically use domain groups, so only the first line would be necessary).

Now we need to define the ACL to only allow access to members of a particular group ( e.g. a domain group called InternetUsers). Go to the line config file where the acl localnet entry was defined (approx line 1850), and replace the previsous ACL definitions with:

    acl localnet proxy_auth REQUIRED src 192.168.1.0/24

    acl InetAllow external win_domain_group InternetUsers

    http_access allow InetAllow

Ensure that you use the IP mask appropriate for your network. In the above example InternetUsers is a domain group. Change the group name as appropriate for your network. If your group is a local group, the use the win_local_group external ACL type instead.

(Make sure you remove the http_access allow localnet line that was defined earlier, otherwise all users on the network will have access, even if they do not belong to the group.)

Restart Squid, and now only members of the InternetUsers group will have access to the Internet via the proxy.

NOTE: If you have the need to deny Internet access for members of another Windows security group, you can set up a InternetDenyGroup the same way as above and then define an InetDeny ACL. You can then specify a http_access deny rule as follows:

    http_access deny InetDeny

Verifying the configuration

Restart Squid (if you have not done so already).

As a user that belongs to the InternetUsers group:

Make sure the browser is setup to user the proxy (port 3128 by default)
Browse the Internet for a few minutes (you should be allowed to view all pages).
Check the C:\squid\var\logs\access.log, and make sure your username is being logged.

As a user that does not belong to the InternetUsers group:

Make sure the browser is setup to user the proxy (port 3128 by default)
Try to browse the Internet (you should be denied access by the Squid proxy).
Check the C:\squid\var\logs\access.log, and make sure you see TCP_DENIED entries that contain the correct username.

If this all works, then you're ready to use PaperCut with SquidNT...

In Options->Net Charging Options, point PaperCut to the C:\squid\var\logs log directory.
Set the log file mask to access.log
And then press the "Test and Apply Settings" button. You should see some summarized net access usage.

Log Rotation

Squid NT does not rotate its log files, so on large sites these files will grow very large. We recommend implementing a simple rotation policy which improves the performance of your system and allows easy archiving of old logs.

We have written a simple batch file that performs a log rotation by stopping squid, renaming access.log to access-YYYY-MM-DD.log, and then restarting Squid. Use the Windows Task Scheduler to schedule the following batch file to be run regularly (e.g. daily or weekly). NOTE: Make sure you setup the scheduled task to run as a user with permissions to stop/start the Squid service.

Squid 2.6 changed the name of the Squid service, so make sure you download the correct version of the script.

Download squid-2.5-log-rotate.bat

Download squid-2.6-log-rotate.bat

Troubleshooting

If Squid fails to start then it is best to check the following two log files. They will usually give you a hint about the cause of the problem (e.g. a syntax problem in the squid.conf file).

    C:\squid\sbin\squid.exe.log

    C:\squid\var\logs\cache.log

If you're looking for information about our print management application, please go here.

http://www.papercut.com/kb/Main/InstallingAndConfiguringSquidNTProxy

Friday, February 08, 2008

What Is: Web Caching

Web Caching Overview

Web caching is the temporary storage of web objects (such as HTML documents) for later retrieval. There are three significant advantages to web caching: reduced bandwidth consumption (fewer requests and responses that need to go over the network), reduced server load (fewer requests for a server to handle), and reduced latency (since responses for cached requests are available immediately, and closer to the client being served). Together, they make the web less expensive and better performing.

Caching can be performed by the client application, and is built in to most web browsers. There are a number of products that extend or replace the built-in caches with systems that contain larger storage, more features, or better performance. In any case, these systems cache net objects from many servers but all for a single user.

Caching can also be utilized in the middle, between the client and the server as part of a proxy. Proxy caches are often located near network gateways to reduce the bandwidth required over expensive dedicated internet connections. These systems serve many users (clients) with cached objects from many servers. In fact, much of the usefulness (reportedly up to 80% for some installations) is in caching objects requested by one client for later retrieval by another client. For even greater performance, many proxy caches are part of cache hierarchies, in which a cache can inquire of neighboring caches for a requested document to reduce the need to fetch the object directly.

Finally, caches can be placed directly in front of a particular server, to reduce the number of requests that the server must handle. Most proxy caches can be used in this fashion, but this form has a different name (reverse cache, inverse cache, or sometimes httpd accelerator) to reflect the fact that it caches objects for many clients but from (usually) only one server.

http://www.web-caching.com/welcome.html

Improving Browser Caches with Extensions and Personal Proxies

A number of companies have offered products in this area. Many of these were ill-designed, and so only a few remain.

Web 3000's NetSonic Internet accelerator replaces your browsers cache with its own larger one, plus it notifies you when the content has changed (when you are looking at an old version of the page). For Windows.
MicroSurfer organizes links, provides offline viewing, and background page loading. For Windows.
Imsisoft's Net Accelerator prefetches available links and graphics, and keeps your favorite links up-to-date. Uses browser cache.
eAcceleration offers Webcellerator, a web accelerator that provides caching improvements, and is free when their portal is used as your home page.

http://www.web-caching.com/personal-caches.html

Proxy Caches

Here are all the available proxy cache systems and services known to me. Note that features and performance vary, such as which net protocols (such as HTTP, FTP, GOPHER, etc.) are supported for caching, and which inter-cache communication methods are used (ICP, CARP, WCCP).

Thanks to Duane Wessels' Information Resource Caching FAQ for many of the entries on this list. There is also a table comparing some of the commercial systems available.

Products listed: AllegroSurf, Apache, Appcelera, Avantis ContentCache, Avirt, Blue Coat, CachePortal, CacheXpress, CacheRaq, engageIP, CERN/W3C, Certeon (InfoLibria), Cisco Cache Engine, DeleGate, Fireclick Blueflame, Harvest, IBM Websphere Edge Server, iMimic DataReactor, Jigsaw, Lucent's IPWorX, Microsoft Internet Security and Acceleration Server, MOWS, NetCache, NetFilter, NetHawk, Netplicator, Netscape Proxy Server, Rebel.com NetWinder, Oops, Oracle Application Server Web Cache, Polipo, Pushcache Cuttlefish, RabbIT, Roxen, SoftRouter Plus, Spaghetti, Squid, Stratacache, Sun Java System Web Proxy, 3Com SuperStack 3 Webcache, Swell Tsunami, SwiftSurf, TeraNode, Viking, Wcol, WebCleaner, WinGate, WinProxy, WinRoute, WWWOFFLE, and XCache.

Rhinosoft's AllegroSurf provides proxies for various protocols, including HTTP, and provides prefetching of page links.

Anonimizer provides anonymous web services including browsing, email, and publishing.

The Apache HTTP server has a caching module.

Kintronics offers the plug and play Avantis ContentCache proxy appliance, supporting remote administration of a network of proxies.

Intelogis offers the Avirt line of internet sharing and caching products for the home as well as small and medium-sized business use under Windows. [Product used to be called Spaghetti.]

Blue Coat's proxy appliances provide visibility and control of Web communications to protect against risks from spyware, Web viruses, inappropriate Web surfing, instant messaging, video streaming and peer-to-peer file sharing - while improving Web performance through caching.

CacheXpress is a high performance proxy server for Windows environments supporting transparent and authenticated modes and WCCP 1 & 2.

NEC Solutions America released CachePortal, which accelerates corporate application performance.

The CERN/W3C HTTPd was the original proxy cache (actually a caching httpd server), developed initially at CERN and later maintained at the W3C.

The Cisco Cache Engine sits next to (mostly) Cisco routers and receives transparently redirected HTTP requests.

Certeon now sells and supports InfoLibria's product line.

DeleGate is a free multi-purpose proxy server which runs on multiple platforms (Unix, Windows and OS/2, source available).

Blueflame, by Fireclick, is server-based software to accelerate content-delivery by predicting requests and pushing those ojects to the client.

IBM Websphere Edge Server supports forward/reverse caching, server load balancing, and content filtering and runs on AIX, Linux, Windows NT/2000, and Solaris.

iMimic offers the DataReactor series of caching products with high price-performance.

Lucent offers the IPWorX product line, which includes web caching, redirection, as well as content management and reporting.

Jigsaw is W3C's reference implementation for HTTP/1.1 and has a caching module.

LogiSense offers caching solutions called engageIP cache server in an appliance form and for Linux.

Microsoft offers the Internet Security and Acceleration (ISA) server which incorporates caching and replaces the Microsoft Proxy Server.

MOWS is a modular, distributed web and cache server written in Java.

NetCache was originally based on the Harvest research code. In March of 1997, Network Appliance acquired the NetCache product.

NetFilter is a proxy cache with a specialized filtering capability and service, ported to many UNIX platforms.

NetHawk, a high-performance software HTTP proxy cache, was originally developed as Peregrine by Pei Cao and her students at the Univ. of Wisconsin-Madison, and was available from Tasmania Network Systems. In October 1999, Cisco purchased Tasmania Network Systems.

Unitech Networks offers Netplicator, a caching appliance that can function either as a proxy cache or a reverse proxy (server accelerator), along with IntelliDNS, a replacement for bind that can distribute requests to local mirrors instead of distant servers (as well as perform load-balancing and ensure high target availability).

The lead architect of the Netscape Proxy Server was also the primary developer of the CERN proxy.

Rebel.com offers a series of Linux-based NetWinder products and appliances that allow a single connection to be shared and use caching to speed access.

The Novell BorderManager FastCache is a proxy cache running under Novell and NT-based networks.

The Oops proxy server is a thread-based proxy cache with objects stored in a few large files.

Oracle offers a reverse cache solution called the Oracle Application Server Web Cache. It uses commodity hardware to cache and compress both static and dynamic Web content.

Polipo is a (mostly) compliant HTTP/1.1 web proxy supporting IPv6, pipelining, and SOCKS.

Pushcache.com offers Cuttlefish, an open-source implementation of their pushcache services based on Squid.

RabbIT is a web proxy that speeds up web surfing over slow links through compression, image re-rendering, ad-blocking, etc.

The free Roxen Challenger httpd server includes a proxy caching services. Available for UNIX and NT.

Squid is a freely available caching package, derived from the Harvest research software. Squid runs on many Unix platforms and caches HTTP, FTP, and Gopher requests.

Stratacache offers a line of proxy cache appliances.

Sun offers its Java System Web Proxy Server, supporting forward and reverse configurations, filtering, and authentication. 3Com sells the line of SuperStack 3 Webcaches as appliances based on Inktomi's caching software.

Swell Technology has a range of linux-based proxy caches in their Tsunami CPX series.

The SwiftSurf web proxy supports page filtering and authentication. Entera offered a multi-protocol caching system called TeraNode, but has since been purchased by CacheFlow.

Viking is a Windows proxy with support for many protocols (HTTP, POP3, SMTP, FTP, ICP) and works as an origin server. Also performs prefetching.

Vicomsoft offers a number of gateway and connection sharing products, including RapidCache, a Windows-based proxy cache, and their InterGate product which includes a caching component. Available for Windows and Macintosh.

Wcol uses prefetching to reduce latency at the expense of increased bandwidth. Also has a catalyst mode (non-caching) that just provides hints to another proxy cache.

WebCleaner is a filtering web proxy, supporting Linux and Windows. Deerfield.com offers WinGate, a proxy server for Windows 95/NT that also allows a local network to share a single Internet connection.

WinProxy, from Ositis Software, is a Windows-based proxy providing caching, filtering, and combined access for multiple machines via a single net connection.

Tiny Software offers a series of WinRoute Windows-based products to connect your workgroup or larger to the net using NAT, and includes a proxy cache component.

After it's 2000 acquisition of Workfire, Packeteer released Appcelera, which uses loss-less compression, page size reduction, client bandwidth detection, and intelligent page rendering in it's role as an httpd accelerator.

The World Wide Web OFFLine Explorer (WWWOFFLE) is a free, simple proxy server for use with dialup net links that caches contents, allows offline browsing, optional blocking, and more.

XCache Technologies (previously known as Post Point Software) offers a server-side software-based accelerator that caches and compresses Active Server Pages.
http://www.web-caching.com/proxy-caches.html

What's a Web Cache? Why do people use them?

A Web cache sits between Web servers (or origin servers) and a client or many clients, and watches requests for HTML pages, images and files (collectively known as objects) come by, saving a copy for itself. Then, if there is another request for the same object, it will use the copy that it has, instead of asking the origin server for it again.

There are two main reasons that Web caches are used:

To reduce latency - Because the request is satisfied from the cache (which is closer to the client) instead of the origin server, it takes less time for the client to get the object and display it. This makes Web sites seem more responsive.
To reduce traffic - Because each object is only gotten from the server once, it reduces the amount of bandwidth used by a client. This saves money if the client is paying by traffic, and keeps their bandwidth requirements lower and more manageable.

Kinds of Web Caches

Browser Caches

If you examine the preferences dialog of any modern browser (like Internet Explorer or Netscape), you'll probably notice a 'cache' setting. This lets you set aside a section of your computer's hard disk to store objects that you've seen, just for you. The browser cache works according to fairly simple rules. It will check to make sure that the objects are fresh, usually once a session (that is, the once in the current invocation of the browser).

This cache is useful when a client hits the 'back' button to go to a page they've already seen. Also, if you use the same navigation images throughout your site, they'll be served from the browser cache almost instantaneously.

Proxy Caches

Web proxy caches work on the same principle, but a much larger scale. Proxies serve hundreds or thousands of users in the same way; large corporations and ISP's often set them up on their firewalls.

Because proxy caches usually have a large number of users behind them, they are very good at reducing latency and traffic. That's because popular objects are requested only once, and served to a large number of clients.

Most proxy caches are deployed by large companies or ISPs that want to reduce the amount of Internet bandwidth that they use. Because the cache is shared by a large number of users, there are a large number of shared hits (objects that are requested by a number of clients). Hit rates of 50% efficiency or greater are not uncommon. Proxy caches are a type of shared cache.

Aren't Web Caches bad for me? Why should I help them?

Web caching is one of the most misunderstood technologies on the Internet. Webmasters in particular fear losing control of their site, because a cache can 'hide' their users from them, making it difficult to see who's using the site.

Unfortunately for them, even if no Web caches were used, there are too many variables on the Internet to assure that they'll be able to get an accurate picture of how users see their site. If this is a big concern for you, this document will teach you how to get the statistics you need without making your site cache-unfriendly.

Another concern is that caches can serve content that is out of date, or stale. However, this document can show you how to configure your server to control this, while making it more cacheable.

On the other hand, if you plan your site well, caches can help your Web site load faster, and save load on your server and Internet link. The difference can be dramatic; a site that is difficult to cache may take several seconds to load, while one that takes advantage of caching can seem instantaneous in comparison. Users will appreciate a fast-loading site, and will visit more often.

Think of it this way; many large Internet companies are spending millions of dollars setting up farms of servers around the world to replicate their content, in order to make it as fast to access as possible for their users. Caches do the same for you, and they're even closer to the end user. Best of all, you don't have to pay for them.

The fact is that caches will be used whether you like it or not. If you don't configure your site to be cached correctly, it will be cached using whatever defaults the cache's administrator decides upon.

How Web Caches Work

All caches have a set of rules that they use to determine when to serve an object from the cache, if it's available. Some of these rules are set in the protocols (HTTP 1.0 and 1.1), and some are set by the administrator of the cache (either the user of the browser cache, or the proxy administrator).

Generally speaking, these are the most common rules that are followed for a particular request (don't worry if you don't understand the details, it will be explained below):

If the object's headers tell the cache not to keep the object, it won't. Also, if no validator is present, most caches will mark the object as uncacheable.
If the object is authenticated or secure, it won't be cached.
A cached object is considered fresh (that is, able to be sent to a client without checking with the origin server) if:

It has an expiry time or other age-controlling directive set, and is still within the fresh period.
If a browser cache has already seen the object, and has been set to check once a session.
If a proxy cache has seen the object recently, and it was modified relatively long ago.

Fresh documents are served directly from the cache, without checking with the origin server.

If an object is stale, the origin server will be asked to validate the object, or tell the cache whether the copy that it has is still good.

Together, freshness and validation are the most important ways that a cache works with content. A fresh object will be available instantly from the cache, while a validated object will avoid sending the entire object over again if it hasn't changed.

How (and how not) to Control Caches

There are several tools that Web designers and Webmasters can use to fine-tune how caches will treat their sites. It may require getting your hands a little dirty with the server configuration, but the results are worth it. For details on how to use these tools with your server, see the Implementation sections below.

HTML Meta Tags vs. HTTP Headers

HTML authors can put tags in a document's section that describe its attributes. These Meta tags are often used in the belief that they can mark a document as uncacheable, or expire it at a certain time.

Meta tags are easy to use, but aren't very effective. That's because they're usually only honored by browser caches (which actually read the HTML), not proxy caches (which almost never read the HTML in the document). While it may be tempting to slap a Pragma: no-cache meta tag on a home page, it won't necessarily cause it to be kept fresh, if it goes through a shared cache.

On the other hand, true HTTP headers give you a lot of control over how both browser caches and proxies handle your objects. They can't be seen in the HTML, and are usually automatically generated by the Web server. However, you can control them to some degree, depending on the server you use. In the following sections, you'll see what HTTP headers are interesting, and how to apply them to your site.

If your site is hosted at an ISP or hosting farm and they don't give you the ability to set arbitrary HTTP headers (like Expires and Cache-Control), complain loudly; these are tools necessary for doing your job.

HTTP headers are sent by the server before the HTML, and only seen by the browser and any intermediate caches. Typical HTTP 1.1 response headers might look like this:

HTTP/1.1 200 OK

Date: Fri, 30 Oct 1998 13:19:41 GMT

Server: Apache/1.3.3 (Unix)

Cache-Control: max-age=3600, must-revalidate

Expires: Fri, 30 Oct 1998 14:19:41 GMT

Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT

ETag: "3e86-410-3596fbbc"

Content-Length: 1040

Content-Type: text/html

The HTML document would follow these headers, separated by a blank line.

Pragma HTTP Headers (and why they don't work)

Many people believe that assigning a Pragma: no-cache HTTP header to an object will make it uncacheable. This is not necessarily true; the HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honor this header, the majority won't, and it won't have any effect. Use the headers below instead.

Controlling Freshness with the Expires HTTP Header

The Expires HTTP header is the basic means of controlling caches; it tells all caches how long the object is fresh for; after that time, caches will always check back with the origin server to see if a document is changed. Expires headers are supported by practically every client.

Most Web servers allow you to set Expires response headers in a number of ways. Commonly, they will allow setting an absolute time to expire, a time based on the last time that the client saw the object (last access time), or a time based on the last time the document changed on your server (last modification time).

Expires headers are especially good for making static images (like navigation bars and buttons) cacheable. Because they don't change much, you can set extremely long expiry time on them, making your site appear much more responsive to your users. They're also useful for controlling caching of a page that is regularly changed. For instance, if you update a news page once a day at 6am, you can set the object to expire at that time, so caches will know when to get a fresh copy, without users having to hit 'reload'.

The only value valid in an Expires header is a HTTP date; anything else will most likely be interpreted as 'in the past', so that the object is uncacheable. Also, remember that the time in a HTTP date is Greenwich Mean Time (GMT), not local time.

For example:

Expires: Fri, 30 Oct 1998 14:19:41 GMT

Cache-Control HTTP Headers

Although the Expires header is useful, it is still somewhat limited; there are many situations where content is cacheable, but the HTTP 1.0 protocol lacks methods of telling caches what it is, or how to work with it.

HTTP 1.1 introduces a new class of headers, the Cache-Control response headers, which allow Web publishers to define how pages should be handled by caches. They include directives to declare what should be cacheable, what may be stored by caches, modifications of the expiration mechanism, and revalidation and reload controls.

Interesting Cache-Control response headers include:

max-age=[seconds] - specifies the maximum amount of time that an object will be considered fresh. Similar to Expires, this directive allows more flexibility. [seconds] is the number of seconds from the time of the request you wish the object to be fresh for.
s-maxage=[seconds] - similar to max-age, except that it only applies to proxy (shared) caches.
public - marks the response as cacheable, even if it would normally be uncacheable. For instance, if your pages are authenticated, the public directive makes them cacheable.
no-cache - forces caches (both proxy and browser) to submit the request to the origin server for validation before releasing a cached copy, every time. This is useful to assure that authentication is respected (in combination with public), or to maintain rigid object freshness, without sacrificing all of the benefits of caching.
must-revalidate - tells caches that they must obey any freshness information you give them about an object. The HTTP allows caches to take liberties with the freshness of objects; by specifying this header, you're telling the cache that you want it to strictly follow your rules.
proxy-revalidate - similar to must-revalidate, except that it only applies to proxy caches.

For example:

Cache-Control: max-age=3600, must-revalidate

If you plan to use the Cache-Control headers, you should have a look at the excellent documentation in the HTTP 1.1 draft; see References and Further Information.

Validators and Validation

In How Web Caches Work, we said that validation is used by servers and caches to communicate when an object has changed. By using it, caches avoid having to download the entire object when they already have a copy locally, but they're not sure if it's still fresh.

Validators are very important; if one isn't present, and there isn't any freshness information (Expires or Cache-Control) available, most caches will not store an object at all.

The most common validator is the time that the document last changed, the Last-Modified time. When a cache has an object stored that includes a Last-Modified header, it can use it to ask the server if the object has changed since the last time it was seen, with an If-Modified-Since request.

HTTP 1.1 introduced a new kind of validator called the ETag. ETags are unique identifiers that are generated by the server and changed every time the object does. Because the server controls how the ETag is generated, caches can be surer that if the ETag matches when they make a If-None-Match request, the object really is the same.

Almost all caches use Last-Modified times in determining if an object is fresh; as more HTTP/1.1 caches come online, Etag headers will also be used.

Most modern Web servers will generate both ETag and Last-Modified validators for static content automatically; you won't have to do anything. However, they don't know enough about dynamic content (like CGI, ASP or database sites) to generate them; see Writing Cache-Aware Scripts.

Tips for Building a Cache-Aware Site

Besides using freshness information and validation, there are a number of other things you can do to make your site more cache-friendly.

Refer to objects consistently - this is the golden rule of caching. If you serve the same content on different pages, to different users, or from different sites, it should use the same URL. This is the easiest and most effective may to make your site cache-friendly. For example, if you use /index.html in your HTML as a reference once, always use it that way.
Use a common library of images and other elements and refer back to them from different places.
Make caches store images and pages that don't change often by specifying a far-away Expires header.
Make caches recognize regularly updated pages by specifying an appropriate expiration time.
If a resource (especially a downloadable file) changes, change its name. That way, you can make it expire far in the future, and still guarantee that the correct version is served; the page that links to it is the only one that will need a short expiry time.
Don't change files unnecessarily. If you do, everything will have a falsely young Last-Modified date. For instance, when updating your site, don't copy over the entire site; just move the files that you've changed.
Use cookies only where necessary - cookies are difficult to cache, and aren't needed in most situations. If you must use a cookie, limit its use to dynamic pages.
Minimize use of SSL - because encrypted pages are not stored by shared caches, use them only when you have to, and use images on SSL pages sparingly.
use the Cacheability Engine - it can help you apply many of the concepts in this tutorial.

Writing Cache-Aware Scripts

By default, most scripts won't return a validator (e.g., a Last-Modified or ETag HTTP header) or freshness information (Expires or Cache-Control). While some scripts really are dynamic (meaning that they return a different response for every request), many (like search engines and database-driven sites) can benefit from being cache-friendly.

Generally speaking, if a script produces output that is reproducable with the same request at a later time (whether it be minutes or days later), it should be cacheable. If the content of the script changes only depending on what's in the URL, it is cacheble; if the output depends on a cookie, authentication information or other external criteria, it probably isn't.

The best way to make a script cache-friendly (as well as perform better) is to dump its content to a plain file whenever it changes. The Web server can then treat it like any other Web page, generating and using validators, which makes your life easier. Remember to only write files that have changed, so the Last-Modified times are preserved.
Another way to make a script cacheable in a limited fashion is to set an age-related header for as far in the future as practical. Although this can be done with Expires, it's probably easiest to do so with Cache-Control: max-age, which will make the request fresh for an amount of time after the request.
If you can't do that, you'll need to make the script generate a validator, and then respond to If-Modified-Since and/or If-None-Match requests. This can be done by parsing the HTTP headers, and then responding with 304 Not Modified when appropriate. Unfortunately, this is not a trival task.

Some other tips;

If you have to use scripting, don't POST unless it's appropriate. The POST method is (practically) impossible to cache; if you send information in the path or query (via GET), caches can store that information for the future. POST, on the other hand, is good for sending large amount of information to the server (which is why it won't be cached; it's very unlikely that the same exact POST will be made twice).
Don't embed user-specific information in the URL unless the content generated is completely unique to that user.
Don't count on all requests from a user coming from the same host, because caches often work together.
Generate Content-Length response headers. It's easy to do, and it will allow the response of your script to be used in a persistent connection. This allows a client (whether a proxy or a browser) to request multiple objects on one TCP/IP connection, instead of setting up a connection for every request. It makes your site seem much faster.

See the Implementation Notes for more specific information.

Frequently Asked Questions

What are the most important things to make cacheable?

A good strategy is to identify the most popular, largest objects (especially images) and work with them first.

How can I make my pages as fast as possible with caches?

The most cacheable object is one with a long freshness time set. Validation does help reduce the time that it takes to see an object, but the cache still has to contact the origin server to see if it's fresh. If the cache already knows it's fresh, it will be served directly.

I understand that caching is good, but I need to keep statistics on how many people visit my page!

If you must know every time a page is accessed, select ONE small object on a page (or the page itself), and make it uncacheable, by giving it a suitable headers. For example, you could refer to a 1x1 transparent uncacheable image from each page. The Referer header will contain information about what page called it.

Be aware that even this will not give truly accurate statistics about your users, and is unfriendly to the Internet and your users; it generates unnecessary traffic, and forces people to wait for that uncached item to be downloaded. For more information about this, see On Interpreting Access Statistics in the references.

I've got a page that is updated often. How do I keep caches from giving my users a stale copy?

The Expires header is the best way to do this. By setting the server to expire the document based on its modification time, you can automatically have caches mark it as stale a set amount of time after it is changed.

For example, if your site's home page changes every day at 8am, set the Expires header for 23 hours after the last modification time. This way, your users will always get a fresh copy of the page.

See also the Cache-Control: max-age header.

How can I see which HTTP headers are set for an object?

To see what the Expires and Last-Modified headers are, open the page with Netscape and select 'page info' from the View menu. This will give you a menu of the page and any objects (like images) associated with it, along with their details.

To see the full headers of an object, you'll need to manually connect to the Web server using a Telnet client. Depending on what program you use, you may need to type the port into a separate field, or you may need to connect to www.myhost.com:80 or www.myhost.com 80 (note the space). Consult your Telnet client's documentation.

Once you've opened a connection to the site, type a request for the object. For instance, if you want to see the headers for http://www.myhost.com/foo.html, connect to www.myhost.com, port 80, and type:

GET /foo.html HTTP/1.1 [return]

Host: www.myhost.com [return][return]

Press the Return key every time you see [return]; make sure to press it twice at the end. This will print the headers, and then the full object. To see the headers only, substitute HEAD for GET.

My pages are password-protected; how do proxy caches deal with them?

By default, pages protected with HTTP authentication are marked private; they will not be cached by shared caches. However, you can mark authenticated pages public with a Cache-Control header; HTTP 1.1-compliant caches will then allow them to be cached.

If you'd like the pages to be cacheable, but still authenticated for every user, combine the Cache-Control: public and no-cache headers. This tells the cache that it must submit the new client's authentication information to the origin server before releasing the object from the cache.

Whether or not this is done, it's best to minimize use of authentication; for instance, if your images are not sensitive, put them in a separate directory and configure your server not to force authentication for it. That way, those images will be naturally cacheable.

Should I worry about security if my users access my site through a cache?

SSL pages are not cached (or unencrypted) by proxy caches, so you don't have to worry about that. However, because caches store non-SSL requests and URLs fetched through them, you should be conscious of security on unsecured sites; an unscrupulous administrator could conceivably gather information about their users.

In fact, any administrator on the network between your server and your clients could gather this type of information. One particular problem is when CGI scripts put usernames and passwords in the URL itself; this makes it trivial for others to find and user their login.

If you're aware of the issues surrounding Web security in general, you shouldn't have any surprises from proxy caches.

I'm looking for an integrated Web publishing solution. Which ones are cache-aware?

It varies. Generally speaking, the more complex a solution is, the more difficult it is to cache. The worst are ones which dynamically generate all content and don't provide validators; they may not be cacheable at all. Speak with your vendor's technical staff for more information, and see the Implementation notes below.

My images expire a month from now, but I need to change them in the caches now!

The Expires header can't be circumvented; unless the cache (either browser or proxy) runs out of room and has to delete the objects, the cached copy will be used until then.

The most effective solution is to rename the files; that way, they will be completely new objects, and loaded fresh from the origin server. Remember that the page that refers to an object will be cached as well. Because of this, it's best to make static images and similar objects very cacheable, while keeping the HTML pages that refer to them on a tight leash.

If you want to reload an object from a specific cache, you can either force a reload (in Netscape, holding down shift while pressing 'reload' will do this by issuing a Pragma: no-cache request header) while using the cache. Or, you can have the cache administrator delete the object through their interface.

I run a Web Hosting service. How can I let my users publish cache-friendly pages?

If you're using Apache, consider allowing them to use .htaccess files, and provide appropriate documentation.

Otherwise, you can establish predetermined areas for various caching attributes in each virtual server. For instance, you could specify a directory /cache-1m that will be cached for one month after access, and a /no-cache area that will be served with headers instructing caches not to store objects from it.

Whatever you are able to do, it is best to work with your largest customers first on caching. Most of the savings (in bandwidth and in load on your servers) will be realized from high-volume sites.

A Note About the HTTP

HTTP 1.1 compliance is mentioned several times in this document. As of the time it was written, the protocol is a work in progress. Because of this, it is virtually impossible for an application (whether a server, proxy or client) to be truly compliant. However, the protocol has been openly discussed for some time, and feature-frozen for enough time to allow developers to use the ideas contained in it, like Cache-Control and ETags. When HTTP 1.1 is final, expect more vendors to openly state that their applications are compliant.

Implementation Notes - Web Servers

Generally speaking, it's best to use the latest version of whatever Web server you've chosen to deploy. Not only will they likely contain more cache-friendly features, new versions also usually have important security and performance improvements.

Apache 1.3

Apache (http://www.apache.org/) uses optional modules to include headers, including both Expires and Cache-Control. Both modules are available in the 1.2 or greater distribution.

The modules need to be built into Apache; although they are included in the distribution, they are not turned on by default. To find out if the modules are enabled in your server, find the httpd binary and run httpd -l; this should print a list of the available modules. The modules we're looking for are mod_expires and mod_headers.

If they aren't available, and you have administrative access, you can recompile Apache to include them. This can be done either by uncommenting the appropriate lines in the Configuration file, or using the -enable-module=expires and -enable-module=headers arguments to configure (1.3 or greater). Consult the INSTALL file found with the Apache distribution.

Once you have an Apache with the appropriate modules, you can use mod_expires to specify when objects should expire, either in .htaccess files or in the server's access.conf file. You can specify expiry from either access or modification time, and apply it to a file type or as a default. See http://www.apache.org/docs/mod/mod_expires.html for more information, and speak with your local Apache guru if you have trouble.

To apply Cache-Control headers, you'll need to use the mod_headers module, which allows you to specify arbitrary HTTP headers for a resource. See http://www.apache.org/docs/mod/mod_headers.html

Here's an example .htaccess file that demonstrates the use of some headers.

.htaccess files allow web publishers to use commands normally only found in configuration files. They affect the content of the directory they're in and their subdirectories. Talk to your server administrator to find out if they're enabled.

### activate mod_expires

ExpiresActive On

### Expire .gif's 1 month from when they're accessed

ExpiresByType image/gif A2592000

### Expire everything else 1 day from when it's last modified

### (this uses the Alternative syntax)

ExpiresDefault "modification plus 1 day"

### Apply a Cache-Control header to index.html



Header append Cache-Control "public, must-revalidate"

Note that mod_expires automatically calculates and inserts a Cache-Control:max-age header as appropriate.

Netscape Enterprise 3.6

Netscape Enterprise Server (http://www.netscape.com/) does not provide any obvious way to set Expires headers. However, it has supported HTTP 1.1 features since version 3.0. This means that HTTP 1.1 caches (proxy and browser) will be able to take advantage of Cache-Control settings you make.

To use Cache-Control headers, choose Content Management | Cache Control Directives in the administration server. Then, using the Resource Picker, choose the directory where you want to set the headers. After setting the headers, click 'OK'. For more information, see http://developer.netscape.com/docs/manuals/enterprise/admnunix/content.htm#1006282

MS IIS 4.0

Microsoft's Internet Information Server (http://www.microsoft.com/) makes it very easy to set headers in a somewhat flexible way. Note that this is only possible in version 4 of the server, which will run only on NT Server.

To specify headers for an area of a site, select it in the Administration Tools interface, and bring up its properties. After selecting the HTTP Headers tab, you should see two interesting areas; Enable Content Expiration and Custom HTTP headers. The first should be self-explanatory, and the second can be used to apply Cache-Control headers.

See the ASP section below for information about setting headers in Active Server Pages. It is also possible to set headers from ISAPI modules; refer to MSDN for details.

Lotus Domino R5

Lotus' (http://www.lotus.com/) servers are notoriously difficult to cache; they don't provide any validators, so both browser and proxy caches can only use default mechanisms (i.e., once per session, and a few minutes of 'fresh' time, usually) to cache any content from them.

Even if this limitation is overcome, Notes' habit of referring to the same object by different URLs (depending on a variety of factors) bars any measurable gains. There is also no documented way to set an Expires, Cache-Control or other arbitrary HTTP header.

Implementation Notes - Server-Side Scripting

Because the emphasis in server-side scripting is on dynamic content, it doesn't make for very cacheable pages, even when the content could be cached. If your content changes often, but not on every page hit, consider setting an Expires header, even if just for a few hours. Most users access pages again in a relatively short period of time. For instance, when users hit the 'back' button, if there isn't any validator or freshness information available, they'll have to wait until the page is re-downloaded from the server to see it.

One thing to keep in mind is that it may be easier to set HTTP headers with your Web server rather than in the scripting language. Try both.

CGI

CGI scripts are one of the most popular ways to generate content. You can easily append HTTP response headers by adding them before you send the body; Most CGI implementations already require you to do this for the Content-Type header. For instance, in Perl;

#!/usr/bin/perl

print "Content-type: text/html\n";

print "Expires: Thu, 29 Oct 1998 17:04:19 GMT\n";

print "\n";

### the content body follows...

Since it's all text, you can easily generate Expires and other date-related headers with in-built functions. It's even easier if you use Cache-Control: max-age;

print "Cache-Control: max-age=600\n";

This will make the script cacheable for 10 minutes after the request, so that if the user hits the 'back' button, they won't be resubmitting the request.

The CGI specification also makes request headers that the client sends available in the environment of the script; each header has 'HTTP_' appended to its name. So, if a client makes an If-Modified-Since request, it may show up like this:

HTTP_IF_MODIFIED_SINCE = Fri, 30 Oct 1998 14:19:41 GMT

Server Side Includes

SSI (often used with the extension .shtml) is one of the first ways that Web publishers were able to get dynamic content into pages. By using special tags in the pages, a limited form of in-HTML scripting was available.

Most implementations of SSL do not set validators, and as such are not cacheable. However, Apache's implementation does allow users to specify which SSI files can be cached, by setting the group execute permissions on the appropriate files, combined with the XbitHack full directive. For more information, see http://www.apache.org/docs/mod/mod_include.html

PHP 3

PHP (http://www.php.net/) is a server-side scripting language that, when built into the server, can be used to embed scripts inside a page's HTML, much like SSI, but with a far larger number of options. PHP can be used as a CGI script on any Web server (Unix or Windows), or as an Apache module.

By default, objects processed by PHP are not assigned validators, and are therefore uncacheable. However, developers can set HTTP headers by using the Header() function.

For example, this will create a Cache-Control header, as well as an Expires header three days in the future:

Remember that the Header() function MUST come before any other output.

As you can see, you'll have to create the HTTP date for an Expires header by hand; PHP doesn't provide a function to do it for you. Of course, it's easy to set a Cache-Control: max-age header, which is just as good for most situations.

For more information, see http://www.php.net/manual/function.header.php3

Cold Fusion 4.0

Cold Fusion, by Allaire (http://www.allaire.com/) is a commercial server-side scripting engine, with support for several Web servers on Windows and Solaris.

Cold Fusion makes setting arbitrary HTTP headers relatively easy, with the CFHEADER tag. Unfortunately, setting date-related functions in Cold Fusion isn't easy as Allaire's documentation leads you to believe; their example for setting an Expires header, as below, won't work.

It doesn't work because the time (in this case, when the request is made) doesn't get converted to a HTTP-valid date; instead, it just gets printed as a representation of Cold Fusion's Date/Time object. Most clients will either ignore such a value, or convert it to a default, like January 1, 1970.

Cold Fusion's date formatting functions make it difficult generate a date that is HTTP-valid; you'll need to either use a combination of DateFormat, Hour, Minute and Second, or roll your own. Of course, you can still use the CFHEADER tag to set Cache-Control: max-age and other headers.

Also, Remember that Web server headers are passed through with some implementations (such as CGI); check yours to determine whether you can use this to your advantage, by setting headers on the server instead of in Cold Fusion.

ASP

Active Server Pages, build into IIS and now becoming available in other implementations, also allow you to set HTTP headers. For instance, to set an expiry time, use the properties of the Response object in your page, like this:

<% Response.Expires=1440 %>

specifying the number of minutes from the request to expire the object. Likewise, absolute expiry time can be set like this (make sure you format HTTP date correctly):

<% Response.ExpiresAbsolute=#May 31,1996 13:30:15 GMT# %>

Cache-Control headers can be added like this:

<% Response.CacheControl="public" %>

When setting HTTP headers from ASPs, make sure you either place the Response method calls before any HTML generation, or use Response.Buffer to buffer the output.
Note that ASPs set a Cache-Control: private header by default, and must be declared public to be cacheable by HTTP 1.1 shared caches. While you're at it, consider giving them an Expires header as well.

References and Further Information

HTTP 1.1 Specification

http://www.w3.org/Protocols/
The HTTP 1.1 spec has many extensions for making pages cacheable, and is the authoritative guide to implementing the protocol. See sections 13, 14.9, 14.21, and 14.25.

Web Caching Overview

http://www.web-caching.com/
An excellent introduction to caching concepts, with links to other online resources.

Cache Now! Campaign

http://vancouver-webpages.com/CacheNow/
Cache Now! is a campaign to raise awareness of caching, from all perspectives.

On Interpreting Access Statistics

http://www.cranfield.ac.uk/docs/stats/
Jeff Goldberg's informative paper on why you shouldn't rely on access statistics and hit counters.

Cacheability Engine

http://www.mnot.net/cacheability/
Examines Web pages to determine how they will interact with Web caches, the Engine is a good debugging tool, and a companion to this tutorial.

About This Document

This document is Copyright © 1998, 1999 Mark Nottingham <mnot@pobox.com>. It may be freely distributed in any medium as long as the text (including this notice) is kept intact and the content is not modified, edited, added to or otherwise changed. Formatting and presentation may be modified. Small excerpts may be made as long as the full document is properly and conspicuously referenced.

If you do mirror this document, please send e-mail to the address above, so that you can be informed of updates.

All trademarks within are property of their respective holders.

Although the author believes the contents to be accurate at the time of publication, no liability is assumed for them, their application or any consequences thereof. If any misrepresentations, errors or other need for clarification is found, please contact the author immediately.

The latest copy of this document can always be obtained from http://www.mnot.net/cache_docs/

Version 1.32 - June 19, 2000

http://www.web-caching.com/mnot_tutorial/