In your application you may come across a scenario to convert the HTML to PDF on the fly. In this tutorial we’ll see how to convert HTML to PDF in Java using Flying Saucer and OpenPDF.
Check another option to convert HTMP to PDF in this post- Convert HTML to PDF in Java + Openhtmltopdf and PDFBox
How does it work
Let’s first understand which library is used for what purpose-
- Flying Saucer is an XML/CSS renderer, which means it takes XML files as input, applies formatting and styling using CSS, and generates a rendered representation of that XML as output. As an input you can pass an XHTML file which is an XML document format that standardizes HTML.
- jsoup library is used for parsing HTML using the best of HTML5 DOM methods and CSS selectors. That gives you a well formed HTML that can be passed to the Flying Saucer.
- Flying Saucer renders the input XHTML that still needs to be converted to PDF for that OpenPDF is used. OpenPDF is a free Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is based on a fork of iText.
Maven Dependencies
To get the above mentioned libraries you need to add following dependencies to your pom.xml
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.13.1</version> </dependency> <dependency> <groupId>org.xhtmlrenderer</groupId> <artifactId>flying-saucer-pdf-openpdf</artifactId> <version>9.1.20</version> </dependency> <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.6</version> </dependency>
Convert HTML to PDF Java example
In this Java program to convert HTML to PDF using Flying Saucer and OpenPDF we’ll try to cover most of the scenarios that you may encounter i.e. image in HTML, external and inline styling, any external font.
Following is the HTML we’ll convert to PDF.
Test.html
<html lang="en"> <head> <title>HTML File</title> <style type="text/css"> body{background-color: #F5F5F5;} </style> <link href="../css/style.css" rel="stylesheet" > </head> <body> <h1>HTML to PDF Java Example</h1> <p>Exception Propagation image</p> <img src="../images/Exception Propagation.png" width="300" height="220"> <p style="color:#F80000; font-size:20px">This text is styled using Inline CSS</p> <p class="fontclass">This text uses the styling from font face font</p> <p class="styleclass">This text is styled using external CSS class</p> </body> </html>
External CSS used (style.css)
@font-face { font-family: myFont; src: url("../fonts/PRISTINA.TTF"); } .fontclass{ font-family: myFont; font-size:20px; } .styleclass{ font-family: "Times New Roman", Times, serif; font-size:30px; font-weight: normal; color: 6600CC; }
Directory structure for it is as given below-
That’s how the HTML looks like in browser-
Now we’ll see how to convert this HTML to PDF. To get image properly in the PDF custom implementation of ReplacedElementFactory is used that converts image to byte array and use that to create an instance of ImageElement which is rendered to PDF.
import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import org.apache.commons.io.IOUtils; import org.w3c.dom.Element; import org.xhtmlrenderer.extend.FSImage; import org.xhtmlrenderer.extend.ReplacedElement; import org.xhtmlrenderer.extend.ReplacedElementFactory; import org.xhtmlrenderer.extend.UserAgentCallback; import org.xhtmlrenderer.layout.LayoutContext; import org.xhtmlrenderer.pdf.ITextFSImage; import org.xhtmlrenderer.pdf.ITextImageElement; import org.xhtmlrenderer.render.BlockBox; import org.xhtmlrenderer.simple.extend.FormSubmissionListener; import com.lowagie.text.BadElementException; import com.lowagie.text.Image; public class ReplacedElementFactoryImpl implements ReplacedElementFactory { @Override public ReplacedElement createReplacedElement(LayoutContext c, BlockBox box, UserAgentCallback uac, int cssWidth, int cssHeight) { Element e = box.getElement(); if (e == null) { return null; } String nodeName = e.getNodeName(); // Look for img tag in the HTML if (nodeName.equals("img")) { String imagePath = e.getAttribute("src"); System.out.println("imagePath-- " + imagePath.substring(imagePath.indexOf("/") + 1)); FSImage fsImage; try { fsImage = getImageInstance(imagePath); } catch (BadElementException e1) { fsImage = null; } catch (IOException e1) { fsImage = null; } if (fsImage != null) { if (cssWidth != -1 || cssHeight != -1) { fsImage.scale(cssWidth, cssHeight); }else { fsImage.scale(250, 150); } return new ITextImageElement(fsImage); } } return null; } private FSImage getImageInstance(String imagePath) throws IOException, BadElementException { InputStream input = null; FSImage fsImage; // Removing "../" from image path like "../images/ExceptionPropagation.png" input = new FileInputStream(getClass().getClassLoader().getResource( imagePath.substring(imagePath.indexOf("/") + 1)).getFile()); final byte[] bytes = IOUtils.toByteArray(input); final Image image = Image.getInstance(bytes); fsImage = new ITextFSImage(image); return fsImage; } @Override public void reset() { // TODO Auto-generated method stub } @Override public void remove(Element e) { // TODO Auto-generated method stub } @Override public void setFormSubmissionListener(FormSubmissionListener listener) { // TODO Auto-generated method stub } }
In the last we have a Java program that converts HTML to PDF.
import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.nio.file.FileSystems; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.xhtmlrenderer.layout.SharedContext; import org.xhtmlrenderer.pdf.ITextRenderer; public class HtmlToPdf { public static void main(String[] args) { try { // HTML file - Input File inputHTML = new File(HtmlToPdf.class.getClassLoader().getResource("template/Test.html").getFile()); // Converted PDF file - Output File outputPdf = new File("F:\\NETJS\\Test.pdf"); HtmlToPdf htmlToPdf = new HtmlToPdf(); //create well formed HTML String xhtml = htmlToPdf.createWellFormedHtml(inputHTML); System.out.println("Starting conversion to PDF..."); htmlToPdf.xhtmlToPdf(xhtml, outputPdf); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } private String createWellFormedHtml(File inputHTML) throws IOException { Document document = Jsoup.parse(inputHTML, "UTF-8"); document.outputSettings().syntax(Document.OutputSettings.Syntax.xml); System.out.println("HTML parsing done..."); return document.html(); } private void xhtmlToPdf(String xhtml, File outputPdf) throws IOException { OutputStream outputStream = null; try { ITextRenderer renderer = new ITextRenderer(); SharedContext sharedContext = renderer.getSharedContext(); sharedContext.setPrint(true); sharedContext.setInteractive(false); // Register custom ReplacedElementFactory implementation sharedContext.setReplacedElementFactory(new ReplacedElementFactoryImpl()); sharedContext.getTextRenderer().setSmoothingThreshold(0); // Register additional font renderer.getFontResolver().addFont(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").toString(), true); // Setting base URL to resolve the relative URLs String baseUrl = FileSystems.getDefault() .getPath("F:\\", "Anshu\\NetJs\\Programs\\", "src\\main\\resources\\css") .toUri() .toURL() .toString(); renderer.setDocumentFromString(xhtml, baseUrl); renderer.layout(); outputStream = new FileOutputStream(outputPdf); renderer.createPDF(outputStream); System.out.println("PDF creation completed"); }finally { if(outputStream != null) outputStream.close(); } } }
You need to register additional fonts used in your document so they may be included with the PDF.
renderer.getFontResolver().addFont(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").toString(), true);
Rendering library may not be able to resolve relative paths on its own so you need to pass extra information that’s what baseUrl does.
String baseUrl = FileSystems.getDefault() .getPath("F:\\", "Anshu\\NetJs\\Programs\\", "src\\main\\resources\\css") .toUri() .toURL() .toString();
By looking at the messages in the console you can see how relative paths are resolved.
org.xhtmlrenderer.load INFO:: ../css/style.css is not a URL; may be relative. Testing using parent URL file:/F:/Anshu/NetJs/Programs/src/main/resources/css/ org.xhtmlrenderer.load INFO:: TIME: parse stylesheets 383ms org.xhtmlrenderer.match INFO:: media = print org.xhtmlrenderer.load INFO:: Requesting stylesheet: file:/F:/Anshu/NetJs/Programs/src/main/resources/css/style.css
Here is the generated PDF from the HTML passed as input.
That's all for this topic Convert HTML to PDF in Java + Flying Saucer and OpenPDF. If you have any doubt or any suggestions to make please drop a comment. Thanks!
>>>Return to Java Programs Page
Related Topics
You may also like-
No comments:
Post a Comment