May 31, 2005

Absolute Log Analyzer Redux

Alexey Stcherbic of BitStrike Software wrote in about my comments on the Absolute Log Analyzer. In short, when I tried to use his program, it didn't work, and I mis-diagnosed the issue. Alexey took a look at my log file, and found some odd entries (lines with *way* too many characters...I think this was probably a result of the setup period). He sent me a custom pattern for parsing the file, and low and behold, it worked just fine. Alexey mentioned that the next version of his software would handle this automatically, which sounds like a nice feature to me...

Posted by Karl at 12:37 PM

May 25, 2005

PDF Prototyping

Dave Rogers has a nice two part article on using PDFs for prototyping. The idea is to use Acrobat's linking features to wire up the prototypes, and go clicking through the site.

I'm not sure I'll use the exact technique he describes, as my short-term prototyping needs aren't for entire websites, but rather for a single page (redesign situation). But, the article got me thinking about some variations on the technqiue. What if we scanned some page sketches (paper prototypes), then linked up the links and put 'em in front of users. Sounds like a nice way to try out a couple of design variations in a more real-world situation than just placing a piece of paper in front of the user. Could be useful, in that it would keep the low-fi nature of the paper prototype, yet incorporate hi-fi features like working links and forms.

I'll let you know if we try it...

Posted by Karl at 11:13 AM

May 23, 2005

In defense (sort of) of Statistics

Nick Finck says, "when it comes to web statistics, be very skeptical." I'm inclined to agree with Nick, but I also believe that web stats are a very powerful and useful tool. But, like any power tool, one needs to know how to use it before running around and applying it willy-nilly to whatever one comes across. Thus, I thought it might be nice to spend a few moments expounding on the subject of web stats.

User Centered Design


I've been on the user-centered design bus for quite some time now. Looking at user behavior can help create a more useful experience, and the most valuable insights are often those that come directly from the users. Many techniques—from personas to usability tests—are quite abstracted from the users themselves. You need to create something of an artificial scenario to, for example, run a usability test. This doesn't mean the technique isn't valid, it just means that you need to realize that you're observing a simulation, not real-life behavior

But when using server logs (or other forms of web statistics), you are observing real-life behavior. This direct connection to what the user is actually doing makes the use of server logs an important tool. Important, but, as pointed out by Nick and Tim Bray, not necessarily straightforward.

Know Your Goals


Make sure you've defined goals before you embark into the world of stats. It is too easy to wander aimlessly amongst the pretty, flashing lights. Goals can help anchor you. Are you looking to measure the effect of a design tweak? Or perhaps measure an outside activity, like a training or marketing campaign? Or do you just want to know what percentage of your users are using Internet Explorer 5.0? Different goals will dictate the use of different tools or methods of analysis.

Focus on Trends, Not Numbers


This won't help Tim, who was asked to come up with an exact number of "feed users," but when using stats, I tend to focus on trends, not exact numbers. I'm interested in change over time. Is usage going up? Did the newsletter sent out on Tuesday translate into increased usage of a particular section? Did the re-positioning of this element increase or decrease usage? Answering these types of questions bypasses the inherent fuzziness of determining things like unique visitors. As long as the calculation for determining visitors remains constant, you'll likely have enough information to answer your questions.

Define the Terms


Each web stats package is going to be different. They'll define terms (like "hit", "page", and "visitor") differently. It is up to you, as the analyst, to figure out how each package uses the terms, and adjust accordingly. The good ones will let you adjust the algorithms to, for example, filter out IP addresses belonging to your organization's employees (assuming they're not the target audience).

A more complicated case comes with dynamic pages. Are page.cgi?id=1 and page.cgi?id=2 the same page or two different pages? Obviously, this depends on your setup, but you better be able to tell your stats package which is which.

I remember one case where, at first glance, it looked like one prominent feature on the site wasn't being used much at all. But, I realized that I needed to tell the stats system to account for the query string, and lo and behold, those pages were being used. Good thing we didn't take rash action before I figured out how to use the system!

Multiple Sources


Assuming you can analyze all of it, I'm a fan of using multiple sources of information. At work, we use a number of methods to learn more about our users.

I've recently written about my purchase of ClickTracks, which is neat click stream analysis software. This gives a page-by-page account of where users are clicking. This will be, I think, very helpful in helping us gauge the success of re-designs and newsletter campaigns. It also presents the data in a very nice, non-threatening way. But, it doesn't meet all of the criteria I laid out for my ideal stats system. So, we use a second (and maybe a third) server log analysis tool to help with these other items. The tools each give us an different viewport onto what is happening on the site.

We store queries entered into the site's search engine to get a better idea about the terms people use to search our site. We check those logs to see if the users are actually finding what they're looking for, and if they're entering queries that we don't have answers for. It turns out they are, and so we're working to address this by adding a different type of search functionality (long story).

Our website is a portal, even though I don't care for that term. So, we care about the usage of the resources we link to. I built a little home-grown stats collection tool to track this information. This gives us a real nice view into how our site gets used, and by whom. We can see the effect of, for example, training activity on usage. And, if we see an unexpected spike in usage, we can investigate. Often, there is an interesting story behind dramatically increasing usage.

Assuming each view is actually shedding light on your goals, the more views you can muster, the better.

Use Your Head


Don't throw out common sense just because a stats program prints a nice pie chart. Make sure you have a good sample size, both in sheer numbers and in time. Too few hits will likely lead to distortion. And a short time window could bring other factors into play that might not exist when the viewing multiple weeks or months of data. Don't forget that the stats program you're using quite likely isn't infallible. Don't make big decisions based on a percentage point or two. At the end of the day, you'll probably want to consider other non-stats factors in any decision.


What do you think? Did I miss any obvious points? Let me know!

Posted by Karl at 08:35 PM

May 19, 2005

Netscape 8

I just downloaded the latest Netscape release. This is an odd one, in that it offers two rendering engines. You can choose to view pages in the IE rendering engine or the Firefox (Mozilla) engine. The idea is you'll be compatible with everything out there. PC World has more details.

As with all recent Netscape release, this one packs in a ton of features. I don't really need a browser this jam-packed with stuff, so this isn't a big deal to me. I'm quite happy with Firefox for now.

But, I do find the two-headed nature of the rendering interesting. For web developers, this could be an easy way to test pages in one app, without having to switch apps (which, admittedly, isn't a big deal). And, it introduces another little wrinkle, in that if you ever get an error report from someone using NN8, you won't know which rendering engine they're using...heh. Lastly, I wonder if they're solving a problem that doesn't really exist. I rarely find sites that don't work in Firefox. And if I did, and I really wanted to see the content, I'd just fire up IE. Doesn't seem like that big of a deal...

Posted by Karl at 10:26 AM

May 17, 2005

Book List, Web 2.0 Style

Those brainy kids down at O'Reilly have come up with a really slick way to keep track of their reading via Backback and Amazon.com.

Posted by Karl at 03:04 PM

May 12, 2005

More Web Stats

A couple more developments on the web stats front today.

First, I tried out 123 Log Analyzer. The package installed fine, but took a bit of futzing to configure properly. There really wasn't anything wrong with the reports, but they didn't seem to go far beyond what the open source tools can do. I wasn't inspired.

Next, I contacted the ClickTracks folks and asked if they had any special pricing for non-profits/education customers. Turns out they did, and it brought the price down into that "no-brainer" zone. So, I went ahead and bought a license.

Does ClickTracks hit everything on my wishlist? Nope, but I think that's okay. It does nicely address the first two points on my list (drilling down to specific pages, and being able to view the user's path as they navigate the site). And, it does those things very well. I figure I can backfill some of the other items on the list by using other tools, either the open source ones or perhaps another commercial product.

So, I spent some good time this afternoon browsing through our site with the ClickTracks product. I showed a few co-workers, and I'm looking forward to sharing the stats with the whole team next week. This type of visualization really helps to show what parts of the site are being used and what parts aren't. I think we'll start to make changes to the site, and keep a close eye on how the stats change.

I have learned a couple of things about using ClickTracks. First, and this is likely specific to my setup, but I had noticed that the performance was pretty sluggish when using the demo. Then, duh, I realized that this was likely due to our IT infrastructure. Our "home" directories (My Documents) are stored on a server, and the ClickTracks data files were stored in the home directory. Meaning that it had to hit the network every time it wanted data. Not a big deal when you're throwing around 30k Word files, but a big deal in this case. So, saving the ClickTracks data file to the c: drive helped the performance quite a bit.

Next, I sure am happy that we're running all external links through a "jump" page. This makes it possible to track the percentages of users who leave the site. And, since the goal of the site is actually for users to leave (we want them to get off our site and onto the third-party sites that deliver the resources we provide), this is pretty helpful. And it means that unlike many other sites out there, we're actually looking for *lower* "average time on the page" figures. Lower means the users are finding their way to the resources (well, on those particular pages).

Anyway, thanks to everyone who wrote in with ideas and suggestions.

Posted by Karl at 07:25 PM

May 11, 2005

Web Stats Demos

I spent a few minutes today trying out some of the web stats products that people have recommended.

I started with Absolute Log Analyzer. Absolute is a stand-alone widows product, and it can analyze logs on the local machine, or by FTP connection. I have a decent-sized log file on our production server (400+ MB), so I decided to download it to my workstation before importing it. When I tried importing the file, it churned for a while, then silently stopped. A check of the parsing log revealed that the import had failed because it wasn't able to auto-detect the log file format. I have a sneaking suspicion that the line endings were at fault. The log file undoubtedly uses unix-style line breaks instead of the Windows-style ones the app is likely expecting. This is, to be blunt, a really stupid reason for the program to fail. I did poke around a bit more in this app, fiddled with some settings, and then decided to un-install the program. Having to fiddle with line breaks every time I want to analyze a log file isn't my idea of fun. [Update: Alexey from BitStrike wrote in and fixed the issue...Absolute Log Analyzer works as advertised now.]

Next, I tried out the ClickTracks demo, a stand-alone Windows app like Absolute. ClickTracks is a polished and easy-to-use program. They clearly put a decent amount of time into making the overall user experience a positive one. It handled my logfile with aplomb, then it displayed my site's homepage with statistics overlaid on each link. This is, I think, a very powerful way of visualizing this type of statistic. You can tell at a glance which links are being used and which ones aren't. I can definitely see how showing this type of data to non-technical users could be useful.

But, I can't really see plunking down $500 for the entry-level version of this software. And by entry-level, I mean, feature-limited. I might consider it, but I'd like to see what else is out there before buying this package. Had it been, say, $200, it would have been a no-brainer. Money is tight here in non-profit land!

As I used ClickTracks a bit more, I did notice that it often slows down as it crunches through the logfiles to produce the pretty reports. This may or may not turn out to be an issue with regular use.

Lastly, I tried to install a server-based product, NetTracker Lite, on our linux webserver. The install died because it couldn't find a shared library. A quick check of the NetTracker website didn't turn up anything, so that one gets pushed to the back of the list for now. [Update: George Smith correctly diagnosed the issue, and sent a link to the solution. Thanks, George!]

I'll keep reporting on my experiences, as I get the chance. Meanwhile, if you have ideas about a good product, let me know.

Posted by Karl at 05:39 PM

May 10, 2005

Web Stats Suggestions

[Update: more suggestions...]

I've had a number of responses to yesterday's post on web stats packages. Thanks to everyone who responded so far with ideas. Here's what I've seen (and I haven't really researched any of these in an in-depth way yet, this is mostly a compilation of the emails I've seen so far):

Urchin, recommended by Rich Brooks and the UW server guys, too. Interestingly, Urchin was acquired by Google about a week ago. Hmm....

NetTracker, recommended by George Smith. The NetTracker site uses the word "solutions" a tad too often for my taste, but George mentioned they have a free version: NetTracker Lite. Looks to be worth checking into.

Opentracker.net, recommended by Jon.

StatCounter, recommended by Donald Clark. This looks like one of those tools where you place a bit of javascript code on the pages that then talks to the hosted stats system. I know this isn't uncommon (in fact, it is how all of the hosted products work, I'd imagine), but, call me old fashioned, for some reason I was thinking that a product that analyzed the server logfiles might be better. Especially because I have six months or so of old logfiles sitting around that I could mine for data now. At the very least, I could use it to set a baseline before we go about making changes. But, I can see the advantages in a hosted product, too. There are reports (like screen resolution) that you just can't get through Apache server logs...

LiveStats. Former co-worker Kevin reminded me that we once tried to use this product. It, well, uh, wasn't that great then. But, as he mentioned, it must have improved since then, 'cause it didn't have anywhere to go but up.

So, keep those emails coming in with recommendations or experiences. Meanwhile, here's a list of other products I've run across while researching:

[Update] Nick Finck (of Digital Web fame) echoed the Urchin recommendation (see above), mentioned WebTrends (see above, again), and threw in a heretofore unknown (to me) candidate: Absolute Log Analyzer. This is stand-alone software -- in other words, it doesn't require a web server like most of the products listed above do. I generally picture server-based software when I'm thinking of web stats packages, but I think it might be worth giving the trial version a spin.

Posted by Karl at 03:38 PM

May 09, 2005

Web Stats - Help Me!

Anyone out there have a favorite web statistics package? I've used the open-source products (Analog, AWstats, etc) for years, but I'm wondering if there are other good choices out there. A while back, I set up a small little script called "Visitors." I like the simplicity, but there is information that it doesn't provide that I'd like to have (more on my wishlist below).

A few Google searches turn up hundreds of products. Most look like they put a premium on style (as opposed to substance). Lots of them cost a ton of money, too. I'm not opposed to spending a few bucks on a good product, but I'm thinking that many of these probably aren't worth their high pricetags. I think I'm really looking for a "product," not a "solution." In other words, I don't want to have to call a salesman to buy a web stats product.

So, what am I looking for? Here's a list, in rough order of importance:


  • I'd like to go beyond just a list of the pages on the site that are popular, and be able to track the hits on a specific page (or directory) over time. If we mention a specific page in an email newsletter, I want to know if that translates into increased usage.

  • A report that details the path a visitor takes through the site would be useful. It doesn't have to be fancy, but it would be nice if this was presented in a clear way that helped further our understanding of how people actually use our site.

  • I'd like to know about the types and versions of browsers used on the site. Inexplicably, the Visitors script I mentioned above doesn't include versions in their browser report (they may have added the feature since I installed it). As if there wasn't a difference between IE4 and IE6! Anyway, I'd really like to see browser usage over time, similar to the graphs Tim Bray produces.

  • Detailed referrer reports would be nice, too. Again, as in with the browser information, time is a critical element here. Is a particular site sending more traffic to us all of a sudden? This could be very useful.

  • Reports that list hits and visits. I'd like to always be able to see the "raw" view of how many individual files the site is serving up, just to keep an eye on the server's load. But it is also useful to see details about how many "visitors" use the sites. Preferably, the definition of a "visitor" will be clear, and maybe even customizable. I'd like to eliminate any and all robots, as well as users from specific IP addresses (like those coming from the office!).

  • Plenty of detailed information on 404s could be helpful, too. By detailed, I mean enough information for someone to trackdown and find the broken link. Being able to view this information over time could be helpful as well (in knowing the issue was resolved).


So, is there anything out there that might meet this wishlist? Any ideas? Email me at weblog@karlnelson.net. I'll post any good ideas that come my way...

Posted by Karl at 03:42 PM

May 03, 2005

Simplify, simplify, simplify!

Cliff Atkinson posts a good reminder about the power of simplicity: Information Overload Makes You Dumb.

Update: Conn McQuinn mentions a related story that has been making the blog-rounds. I'm linking to this mostly because I like Conn's title: IQ damaged by - wait, I have to check my email.

Posted by Karl at 03:19 PM