Hello folks,
I'm working on an output plugin that will convert html content to
pdf. It handles setting the content-type and content-disposition
headers before returning.
You can see preliminary work at:
http://zacks.org/cgiapp/pdf/
The pod is inline at the end of this message.
Right now the actual conversion is done in a helper module. For
the moment, only HTMLDoc (via HTML::HTMLDoc) is supported. I am
planning support for PDF::FromHTML and html2ps/ps2pdf.
Apologies for the long discussion below, but I'm looking for
advice on how to best proceed with development.
I am not sure of the best way to handle calling to the helper
module to handle the conversion. For now, I have this code in
place:
# $opts{converter} is HTMLDoc, for example
my $pack= "CGI::Application::Plugin::Output::PDF::$opts{converter}";
eval "require $pack";
croak "Can't load converter [$pack]" if $@;
return $pack->convert( ... );
Is this filthy? One of the things I don't like about it is calling
convert() as a class method -- it seems unnatural since (as of
now) none of the modules are object-oriented. On the other hand,
I believe 'no strict "refs"' would be necessary to call it as a
function (with the package name as a variable).
Another approach is similar but doesn't invoke the helper routine
via the converter class:
# set $pack as above
my $convert= $pack->can('convert')
or croak "converter [$pack] doesn't know how to 'convert'";
return $convert->( \$html, $args->{converter_args} );
Is one way better than the other? I feel like I'm missing
something obvious and both approaches are poor.
Another issue is how to handle configuration. Right now the user
has the option to import the pdf_output() method, which should be
called at the end of a runmode. This method will set the
content-type header, convert html content to pdf, and return the
pdf content.
This method takes some optional named parameters which can select
the converter to use and specify options specific to that
converter. So a user importing the method can pass parameters for
configuration purposes.
If the user has CGI::Application version 4 or newer, and he does
not request any symbols for import, pdf_output() is automatically
installed as a postrun callback. This allows for transparent
conversion from html to pdf:
use CGI::Application::Plugin::Output::PDF;
# ...
return $template->output; # sent to browser as pdf
This is convenient, but it limits the user's ability to configure
the behavior of the plugin. For example, the user doesn't have a
way to specify which converter to use.
One option would be to use arguments to import to configure the
plugin. For example:
use CGI::Application::Plugin::Output::PDF converter => 'HTMLDoc';
This can get a bit messy, however. It would also be nice for the
user to be able to specify the output filename, or specify some
parameters specific to the selected converter. I don't know how
many options are too many to handle in the import() method.
When calling pdf_output() directly, this is not a problem, as it
takes an optional hash reference of named parameters to handle
these configuration options, among others.
What is the best practice for those who are using the transparent
postrun callback?
Thanks for reading and for any advice you may have.
-E
NAME
CGI::Application::Plugin::Output::PDF - Generate PDF output from a
CGI::Application runmode
SYNOPSIS
For CGI::Application >= 4.0:
use CGI::Application::Plugin::Output::PDF;
# in some runmode...
# html content will be automatically converted to pdf
return $template->output;
For CGI::Application < 4.0:
use CGI::Application::Plugin::Output::PDF qw(pdf_output);
# in some runmode...
return $self->pdf_output( \$template->output );
DESCRIPTION
"CGI::Application::Plugin::Output::PDF" provides a method, "pdf_output",
and a function, "html_to_pdf", to convert html content to pdf.
The "pdf_output" method may be called directly, or, for
CGI::Application(3) version 4 and above, a postrun callback will be
added to automatically, unless the user requests any symbols for export.
XXX should this be the case? or always add the callback?
EXPORT
This module does not export any symbols by default. You may import the
"pdf_output" method and/or the "html_to_pdf" function on request:
use CGI::Application::Plugin::Output::PDF qw(pdf_output);
You may export both routines using the export tag ":all":
use CGI::Application::Plugin::Output::PDF qw(:all);
NOTE: For CGI::Application(3) version 4 and above, a postrun callback
will be added to automatically convert html content to pdf, unless the
user requests that any symbols be exported.
Subclasses of previous versions of CGI::Application(3) will need to
export the "pdf_output" method and call it directly:
return $self->pdf_output( \$template->output );
METHODS
pdf_output
# in a runmode
# $template is an HTML::Template object, for example
my $html_output= $template->output;
return $self->pdf_output( \$html_output,
{ filename => 'download.pdf',
converter => 'HTMLDoc', }
);
This method generates a pdf file from html content and sends it
directly to the user's browser. It sets the content-type header to
'application/pdf' and sets the content-disposition header to
'attachment'.
It should be invoked through a CGI::Application(3) subclass object.
It takes two parameters. The first, which is required, is a
reference to a scalar containing the html content for conversion.
The second is a reference to a hash of named parameters, all of
which are optional:
converter
The module to be used for converting html content to pdf.
The current options are "HTMLDoc" (default), "HTML2PS", and
"PDFFromHTML".
See CONVERTERS below for further discussion of the merits of
each.
filename
The name of the file which will be sent in the HTTP
content-disposition header. The default is "download.pdf".
FUNCTIONS
html_to_pdf
my $pdf= html_to_pdf( \$html_content,
{ filename => 'download.pdf',
converter => 'HTMLDoc', }
);
# do something with $pdf
This function converts html content to pdf content and returns it.
It takes the same parameters as "pdf_output" (above), except that it
is a function, so it should not be invoked through an object.
In addition, the named parameter "filename" is ignored, as it is not
applicable to this function.
CONVERTERS
NOTE: This section is incomplete.
In general, css is not well-supported.
In addition, It may be necessary to use full paths for images and links
in your html to get a close representation of your web page marked up as
pdf.
HTMLDoc
This converter uses the HTML::HTMLDoc(3) module.
From "http://www.htmldoc.org":
HTMLDOC supports most HTML 3.2 elements, some HTML 4.0 elements,
and can generate title and table of contents pages. The 1.8.x
releases do not support stylesheets.
css/stylesheets
Unsupported
paths Under a web environment, had success passing
"$ENV{DOCUMENT_ROOT" to HTML::HTMLDoc(3) object to fix
relative image paths.
PDFFromHTML
This converter uses the PDF::FromHTML(3) module.
css/stylesheets
PDF::FromHTML does not support css.
paths XXX Unknown.
HTML2PS
This converter passes the html content to html2ps(1) and then to
ps2pdf(1).
Be aware that large table cells may not render as expected. From
"http://user.it.uu.se/~jan/html2psug.html":
Rendering HTML tables well is a non-trivial task. For
"real" tables, that is representation of tabular data,
html2ps usually generates reasonably good output. When
tables are used for layout purposes, the result varies
from good to useless. This is because a table cell is
never broken across pages. So if a table contains a cell
with a lot of content, the entire table may have to be
scaled down in size in order to make this cell fit on a
single page. Sometimes this may even result in unreadable
output.
css/stylesheets
html2ps supports css to a limited extent, but the styles
must be specified on the command line or in a configuration
file.
paths html2ps allows the user to specify either a root file path
or a base URL to be used for relative paths in the html
content.
AUTHOR
Evan A. Zacks "<suppressed>"
SEE ALSO
PDF::FromHTML(3), HTML::HTMLDoc, html2ps(1), CGI::Application(3)
COPYRIGHT & LICENSE
Copyright 2005 Evan A. Zacks, All rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
REVISION
$Id: PDF.pm 2 2005-09-22 06:57:17Z zackse $
---------------------------------------------------------------------
Web Archive: http://www.mail-archive.com/suppressed/
http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2
To unsubscribe, e-mail: suppressed
For additional commands, e-mail: suppressed
Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.