[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cgiapp] Re: cleaning up html?


Mark Stosberg wrote:
On 2005-04-20, Rhesa Rozendaal <suppressed> wrote:
I was wondering what you guys are using to ensure the validity of your html output.


I've spent some time with HTML::Tidy (even got Andy to post an updated version :) and it is very, very good.

I've been thinking about approaching the problem from the other end:

Having valid HTML::Template files in the first place.

If you have full control over the templates, then this is definitely the way to go. Of course, you wouldn't have much use for my proposed plugin then, since validating templates would most likely be done outside the app.

However, many of my clients either manage the templates on their own, or have additional means of defining content over which I don't have full control. Large parts of their sites are generated with fragments of html coming from different sources, which makes it difficult to validate them all beforehand.

In those situations, a final rewrite handler can be useful. I've already added a preliminary version of this to one of our apps, and it seems to work quite well. I haven't put it into a module yet; here's what I have sofar:

# /etc/tidy.conf
tidy-mark:      no
wrap:           120
indent:         auto
output-xhtml:   yes
char-encoding:  utf8
doctype:        loose
add-xml-decl:   yes
alt-text:       [image]


# In my base class:

use HTML::Tidy 1.05_02; # this version has support for config files
our $HTMLTIDY;

sub tidy_clean
{
    my ($self, $html) = @_;
    $HTMLTIDY ||= HTML::Tidy->new({config_file => '/etc/tidy.conf'});
    return $HTMLTIDY->clean($html);
}

# somewhere in postrun:

    if($self->header_type eq 'header')
    {
        my %props = $self->header_props;
        if(!exists($props{'-type'}) or $props{'-type'} eq 'text/html')
        {
            $self->header_add(-type=>'text/html', -charset=>'utf-8');

            $$outputref = $self->tidy_clean($$outputref);
            if( $outputref !~ m/equiv.*charset/ )
            {
                $$outputref =~ s|(<head.*?>)|$1\n  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />|si;
            }

            $self->header_add(-Content_length=>length($$outputref));
        }
    }

With the new proposed callback functionality, this could be added completely transparently. For the current version of cgi-app, it can still be added with very little code. I don't think the utf8 code _has_ to be in there, but I needed it.

Do you think it would make sense to make a plugin for this?

Sometimes i need to use a template token inside of a html tag, like
this:
<form action="<tmpl_var cfg_instance_cgi>">

I did make a filter so that the HTML::Template tag can be URL-encoded.

You'd have to, or they would be rewritten with &lt; entities.

Still, I haven't figured out the last mile of making HTML::Template
files than can validated as Good HTML.

libtidy has some configuration options that allow you to define "new" tags, that might help you out with this. Simply adding the HT tags should be enough, I expect. I haven't played with that yet, though.

There is also the possibility that something in the token will cause the
final result to become invalid HTML, but that is a minimal concern for
me. I might lean towards a 'tidy' based tool to help with that
because I am already familiar with that.

There's also Test::HTML::Tidy. I haven't used it yet, but I expect it could be useful for you.

Rhesa

---------------------------------------------------------------------
Web Archive:  http://www.mail-archive.com/suppressed/
             http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2
To unsubscribe, e-mail: suppressed
For additional commands, e-mail: suppressed


Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.