Grant <suppressed> wrote: > > > > > I do keep a separate table of robot UAs and match traffic rows to them > > > > > with op=eq to populate another table with robot IPs and non-robot IPs > > > > > for the day to speed up the report. Don't you think it would be > > > > > slower to match/no-match each IC request to a known robot UA and write > > > > > to the traffic table based on that, instead of unconditionally writing > > > > > all requests to the traffic table? If not, excluding the robot > > > > > requests from the traffic table would mean a lot less processing for > > > > > the report and a lot fewer records for the traffic table. > > > > > > > > > Perhaps you should create a column called "spider" in the traffic table > > > > and save a true or false value depending upon the [data session spider] > > > > value. You can then generate reports "WHERE spider = 0", for ordinary > > > > users, or "WHERE spider = 1" for robots etc. An index on the spider column > > > > would be nice, of course. > > > > > > > I let this roll around in my head for quite a while and I ended up > > > writing the IC page accesses to my traffic table based on [data > > > session spider] like you suggested. This should mean a much smaller > > > traffic table and less processing when running a report on it. We'll > > > see how much time it buys me before running the report takes too long > > > again. I also need to set up indexes. > > > > > Also, you may as well grab the latest robots.cfg file from CVS and > > "include" it into your interchange.cfg file. > > > I just had a look at robots.cfg and I think I see a few opportunities > for false positives. I would think "agent" could be bad, and there > are browser toolbars for GetRight and Yahoo which probably alter the > UA. Is there a crucial set of NotRobotUA entries to go along with > robots.cfg? > > Is anyone using robots.cfg and actively watching for false positives? > I'll look into those. Do you know the UA names for the various toolbars? I can probably look that up somewhere. I wouldn't have thought that "agent" would be a risk. -- _/ _/ _/_/_/_/ _/ _/ _/_/_/ _/ _/ _/_/_/ _/_/ _/ _/ _/ _/_/ _/ K e v i n W a l s h _/ _/ _/ _/ _/ _/ _/ _/_/ suppressed _/ _/ _/_/_/_/ _/ _/_/_/ _/ _/ _______________________________________________ interchange-users mailing list suppressed http://www.icdevgroup.org/mailman/listinfo/interchange-users
Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.