In this tutorial we’ll see how to convert HTML to PDF in Java using Openhtmltopdf and PDFBox.
Check another option to convert HTMP to PDF in this post- HTML to PDF in Java + Flying Saucer and OpenPDF
How does it work
Let’s first understand what do the libraries mentioned here do-
- Open HTML to PDF is a pure-Java library for rendering arbitrary well-formed XML/XHTML (and even HTML5) using CSS 2.1 for layout and formatting, outputting to PDF or images.
- jsoup library is used for parsing HTML using the best of HTML5 DOM methods and CSS selectors. That gives you a well formed HTML (XHTML) that can be passed to the Openhtmltopdf.
- Openhtmltopdf uses the open-source PDFBOX as PDF library which generates PDF document from the rendered representation of the XHTML returned by Openhtmltopdf.
Maven Dependencies
To get the above mentioned libraries you need to add following dependencies to your pom.xml
<dependency> <groupId>com.openhtmltopdf</groupId> <artifactId>openhtmltopdf-core</artifactId> <version>1.0.6</version> </dependency> <!--supports PDF output with Apache PDF-BOX --> <dependency> <groupId>com.openhtmltopdf</groupId> <artifactId>openhtmltopdf-pdfbox</artifactId> <version>1.0.6</version> </dependency> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.13.1</version> </dependency>
Convert HTML to PDF Java example
In this Java program to convert HTML to PDF using Openhtmltopdf and PDFBox we’ll try to cover most of the scenarios that you may encounter i.e. image in HTML, external and inline styling, any external font.
Following is the HTML we’ll convert to PDF. As you can see it uses external CSS file, has an image, uses inline styling too.
Test.html
<html lang="en"> <head> <title>HTML File</title> <style type="text/css"> body{background-color: #F5F5F5;} </style> <link href="../css/style.css" rel="stylesheet" > </head> <body> <h1>HTML to PDF Java Example</h1> <p>String Pool image</p> <img src="../images/Stringpool.png" width="300" height="220"> <p style="color:#F80000; font-size:20px">This text is styled using Inline CSS</p> <p class="fontclass">This text uses the styling from font face font</p> <p class="styleclass">This text is styled using external CSS class</p> </body> </html>
External CSS used (style.css)
@font-face { font-family: myFont; src: url("../fonts/PRISTINA.TTF"); } .fontclass{ font-family: myFont; font-size:20px; } .styleclass{ font-family: "Times New Roman", Times, serif; font-size:30px; font-weight: normal; color: 6600CC; }
Directory structure for it is as given below-
That’s how the HTML looks like in browser-
Now we’ll write Java program to convert this HTML to PDF.
import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.nio.file.FileSystems; import org.jsoup.Jsoup; import org.jsoup.helper.W3CDom; import org.jsoup.nodes.Document; import com.openhtmltopdf.pdfboxout.PdfRendererBuilder; public class HtmlToPdfExample { public static void main(String[] args) { try { // HTML file - Input File inputHTML = new File(HtmlToPdfExample.class.getClassLoader().getResource("template/Test.html").getFile()); // Converted PDF file - Output String outputPdf = "F:\\NETJS\\Test.pdf"; HtmlToPdfExample htmlToPdf = new HtmlToPdfExample(); //create well formed HTML org.w3c.dom.Document doc = htmlToPdf.createWellFormedHtml(inputHTML); System.out.println("Starting conversion to PDF..."); htmlToPdf.xhtmlToPdf(doc, outputPdf); } catch (IOException e) { System.out.println("Error while converting HTML to PDF " + e.getMessage()); e.printStackTrace(); } } // Creating well formed document private org.w3c.dom.Document createWellFormedHtml(File inputHTML) throws IOException { Document document = Jsoup.parse(inputHTML, "UTF-8"); document.outputSettings().syntax(Document.OutputSettings.Syntax.xml); System.out.println("HTML parsing done..."); return new W3CDom().fromJsoup(document); } private void xhtmlToPdf(org.w3c.dom.Document doc, String outputPdf) throws IOException { // base URI to resolve future resources String baseUri = FileSystems.getDefault() .getPath("F:/", "Anshu/NetJs/Programs/", "src/main/resources/template") .toUri() .toString(); OutputStream os = new FileOutputStream(outputPdf); PdfRendererBuilder builder = new PdfRendererBuilder(); builder.withUri(outputPdf); builder.toStream(os); // add external font builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA"); builder.withW3cDocument(doc, baseUri); builder.run(); System.out.println("PDF creation completed"); os.close(); } }
You need to register additional fonts used in your document so they may be included with the PDF.
builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA");
You also need to configure the base URI to resolve the path for resources like image, css.
Here is the generated PDF from the HTML passed as input.
That's all for this topic Convert HTML to PDF in Java + Openhtmltopdf and PDFBox. If you have any doubt or any suggestions to make please drop a comment. Thanks!
>>>Return to Java Programs Page
Related Topics
You may also like-
Nice stuff, it was nice to see this article about HTML5. It was really appreciable. Thank you so much for sharing such an informative article about - HTML5 tutorial in hindi
ReplyDeleteHi, Thanks for your effort. But I cant get Bangla font properly. All words breaks, Plz help ASAP
ReplyDeleteIn the article there is a line-
DeleteYou need to register additional fonts used in your document so they may be included with the PDF.
builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA");
That's where you have to change font as per your requirement. Which font is needed for your requirement that you have to find out.
How can I configure the base URI for additional images and css?
ReplyDeleteIf we have jquery script in html then how to convert that into text or styling and append it into pdf?
ReplyDelete