Persits Software, Inc. Web Site
 Navigator:  Home |  Manual |  Chapter 14: Document Stitching and Other Features
Chapter 16: Color Spaces Chapter 14: Document Stitching and Other Features
  Chapter 15: HTML to PDF Conversion
15.1 ImportFromUrl Method
15.2 Authentication
15.3 Error Log
15.4 Page Breaks
15.5 Direct HTML Feed
15.6 CSS Media Selection
15.7 Obtaining Y-Boundary

15.1 ImportFromUrl Method

AspPDF.NET is capable of converting HTML documents to PDF via PdfDocument's ImportFromUrl method. This method opens an HTML document from a given URL, splits it into pages and renders it onto an empty or existing PDF document. The document can then be further edited, if necessary, and saved to disk, memory or an HTTP stream as usual.

ImportFromUrl's support for various HTML tags and constructs is not quite as extensive as that of major browsers, but still considerably stronger than the limited HTML functionality of Canvas.DrawText. ImportFromUrl recognizes tables, images, lists, cascading style sheets, etc.

ImportFromUrl accepts four parameters, all but the first one optional: the input URL, a parameter list, and a username/password pair.

The URL parameter can be an HTTP or HTTPS address, such as http://www.server.com/path/file.html, or a local physical path such as c:\path\file.html. Note that if you want to open a dynamically generated document such as an .asp or aspx file, you need to invoke it via HTTP even if this file is local to your own script.

You can also specify an HTML string directly via the URL parameter. This is described in Section 15.5 of this chapter.

The following simple code snippet creates a PDF document out of the Persits Software site persits.com:

C#
PdfManager objPdf = new PdfManager();
PdfDocument objDoc = objPdf.CreateDocument();
objDoc.ImportFromUrl( "http://www.persits.com" );

string strFilename = objDoc.Save( Server.MapPath("importfromurl.pdf"), false );
VB.NET
Dim objPdf As PdfManager = new PdfManager()
Dim objDoc As PdfDocument = objPdf.CreateDocument()
objDoc.ImportFromUrl( "http://www.persits.com" )

Dim strFilename As string = objDoc.Save( Server.MapPath("importfromurl.pdf"), false )

Click on the links below to run this code sample:

http://localhost/asppdf.net/manual_15/15_importfromurl.cs.aspx
http://localhost/asppdf.net/manual_15/15_importfromurl.vb.aspx  Make sure AspPDF.NET is installed on your local machine and IIS is running for these links to work.

The ImportFromUrl method's 2nd argument is a PdfParam object or parameter string specifying additional parameters controlling the HTML to PDF conversion process. For example, to create a document in a landscape orientation, the Landscape parameter must be set to true, for example:

objDoc.ImportFromUrl( "http://www.persits.com", "landscape=true" );

When new pages have to be added to the document during the conversion process, the default page size is U.S. Letter. This can be changed via the PageWidth and PageHeight parameters.

When rendering HTML content on a page, AspPDF.NET leaves 0.75" margins around the content area. That can be changed via the LeftMargin, RightMargin, TopMargin and BottomMargin parameters.

The full list of ImportFromUrl parameters can be found here.

IMPORTANT: Avoid calling ImportFromUrl on a URL located in the same virtual directory as the script that makes the call to ImportFromUrl. According to Microsoft KB article Q316451, "this can result in poor performance due to thread starvation," and may produce the error exception "MSXML2::ServerXMLHTTP Error: The request has timed out."

15.2 Authentication

15.2.1 Basic Authentication

The 3rd and 4th arguments of an overloaded version of ImportFromUrl are a username and password that can be used if the URL being opened is protected via Basic Authentication, as follows:

objDoc.ImportFromUrl( "http://www.server.com/script.asp", "landscape=true", "jsmith", "pwd" );

15.2.2 .NET Forms Authentication

Under .NET, the Username and Password arguments can instead be used to pass an authentication cookie in case both the script calling ImportFromUrl and a file being converted to PDF are protected by the same user account under .NET Forms authentication. To pass a cookie to ImportFromUrl, the cookie name prepended with the prefix "Cookie:" is passed via the Username argument, and the cookie value via the Password argument. The following example illustrates this technique.

Suppose you need to implement a "Click here for a PDF version of this page" feature in a .NET-based web application. The application is protected with .NET Forms Authentication:

<authentication mode="Forms">
  <forms name="MyAuthForm" loginUrl="login.aspx" protection="All">
    <credentials passwordFormat = "SHA1">
      <user name="JSmith" password="13A23E365BFDBA30F788956BC2B8083ADB746CA3"/>
      
... other users
    </credentials>
  </forms>
</authentication>

The page that needs to be converted to PDF, say report.aspx, contains the button "Download PDF version of this report" that invokes another script, say convert.aspx, which calls AspPDF.NET's ImportFromUrl. Both scripts reside in the same directory under the same protection.

If convert.aspx simply calls objDoc.ImportFromUrl( "http://localhost/dir/report.aspx", ... ), the page that ends up being converted will be login.aspx and not report.aspx, because AspPDF.NET itself has not been authenticated against the user database and naturally will be forwarded to the login screen.

To solve this problem, we just need to pass the authentication cookie whose name is MyAuthForm (the same as the form name) to ImportFromUrl. The following code (placed in convert.aspx) does the job:

C#
void Page_Load(object sender, System.EventArgs e)
{
	PdfManager objPDF = new PdfManager();

	string strCookieName = "", strCookieValue = "";

	' Search for our authentication cookie
	for( int i = 0; i < Request.Cookies.Count; i++ )
	{
		if( Request.Cookies[i].Name == "MyAuthForm" )
		{
			strCookieName = Request.Cookies[i].Name;
			strCookieValue = Request.Cookies[i].Value;
			break;
		}
	}

	PdfDocument objDoc = objPDF.CreateDocument();
	objDoc.ImportFromUrl( "http://localhost/dir/report.aspx", null, 
		"Cookie:" + strCookieName, strCookieValue );

	objDoc.SaveHttp( "attachment;filename=report.pdf" );
}
VB.NET
Sub Page_Load(Sender As Object, E As System.EventArgs)

	Dim objPDF As PdfManager = new PdfManager()

	Dim strCookieName As String = "", strCookieValue = ""

	' Search for our authentication cookie
	For i As Integer = 0 to Request.Cookies.Count - 1	
		If Request.Cookies(i).Name = "MyAuthForm" Then		
			strCookieName = Request.Cookies(i).Name
			strCookieValue = Request.Cookies(i).Value
			Exit For
		End If
	Next

	Dim objDoc As PdfDocument = objPDF.CreateDocument()
	objDoc.ImportFromUrl( "http://localhost/dir/report.aspx", Nothing, _
		"Cookie:" + strCookieName, strCookieValue )

	objDoc.SaveHttp( "attachment;filename=report.pdf" )
End Sub

Note that the cookie name is prepended with the prefix "Cookie:" before being passed to ImportFromUrl.

15.3 Error Log

ImportFromUrl throws an exception if the specified URL cannot be found or invalid, and no HTML to PDF conversion takes place. However, if the main URL is valid but some of the dependent information (fonts, image URLs, CSS files, etc.) cannot be found, the conversion will go on uninterrupted, although the resultant PDF document may not look as expected.

To simplify debugging, ImportFromUrl can be used in a debug mode. If the parameter Debug=true is used, ImportFromUrl returns a log of non-fatal errors encountered during the conversion process. A log entry consists of the entry type, such as "Image", "CSS", etc., error message, and relevant data, such as the invalid URL, unknown font name, etc. Log entries are separated by two pairs of CR/LF characters.

The following code snippet invokes ImportFromUrl in the debug mode and displays the error log:

string strLog = objDoc.ImportFromUrl( "http://www.server.com/script.asp", "debug=true" );
Response.Write( strLog );

A typical log string may look as follows:

Image: Error opening URL. HTTP Status Code: 404
Data: http://www.persits.com/image.gif

Font: Font name cannot be found.
Data: Arrial

15.4 Page Breaks
HTML allows page breaks for printing purposes via the CSS properties page-break-before and page-break-after. The ImportFromUrl method recognizes these properties for the purpose of page breaking in a limited set of HTML tags. The value for these two properties must be set to "always", other values will have no effect. Just like with any CSS property, inline syntax or a separate style sheet can be used. For example:

<BR style="page-break-before: always">

The property page-break-before: always can be applied to the following tags:

<BR>
<IMG>
<HR>
<TABLE>
<DIV>

The property page-break-after: always can be applied to the following tags:

<BR>
<IMG>
<HR>

15.5 Direct HTML Feed
The ImportFromUrl method allows you to specify an HTML string directly via the first parameter (URL). The string must contain the sub-string <HTML> or <html> to be recognized as a direct HTML feed and not a URL. For example:

string str = "<HTML><TABLE><TR><TD>Text1</TD><TD>Text2</TD></TR></TABLE></HTML>";
objDoc.ImportFromUrl( str );

If an HTML string is to include references to images, or other external objects, you must use fully qualified URLs for these objects. Fractional URLs will not be recognized since there is no "base" URL to be applied here:

' Correct
string str = "<HTML><IMG SRC=\"http://localhost/images/logo.jpg\"></HTML>";

' Correct
string str = "<HTML><IMG SRC=\"c:\path\logo.jpg\"></HTML>";

' Incorrect
string str = "<HTML><IMG SRC=\"images/logo.jpg\"></HTML>";

15.6 CSS Media Selection
The ImportFromUrl method can be configured to choose which cascading style sheets to read and which to ignore depending on the MEDIA attribute of the <STYLE> and <LINK> tags.

ImportFromUrl recognizes the following values for the MEDIA attribute:

"ALL"
"SCREEN"
"PRINT"
"ASPPDF"

The first three are part of the CSS specs, and the last one is a special value which enables you to create a style sheet specifically for AspPDF.

Using the Media parameter, you can specify a combination (sum) of the following values:

ValueMeaning
1MEDIA="ALL" (or no MEDIA attribute)
2MEDIA="SCREEN"
4MEDIA="PRINT"
8MEDIA="ASPPDF"
128MEDIA="<all others>"

For example, the following call makes ImportFromUrl read only the style sheets with the MEDIA attribute set to "ALL" and "ASPPDF" (and also those without a MEDIA attribute):

objDoc.ImportFromUrl( "http://www.someurl.com", "media=9" );

By default, the Media parameter is set to 255 which means ImportFromUrl ignores the MEDIA attribute altogether and loads all the style sheets it encounters.

15.7 Obtaining Y-Boundary
AspPDF.NET enables you to retrieve the (estimated) Y-coordinate of the lowest boundary of the HTML content rendered by the last successful call to ImportFromUrl, and also the index of the page within the document where the rendering ends.

This information is obtained via the PdfDocument property ImportInfo, which returns an instance of the PdfParam object populated with two items, "Y" and "Page", which correspond to the Y-coordinate and page index, respectively.

The following snippet performs some HTML-to-PDF conversion and then draws a horizontal line right underneath the HTML content on the page where the rendering ends:

objDoc.ImportFromUrl( "http://www.server.com/script.asp" );
PdfParam objParam = objDoc.ImportInfo;

int nIndex = objParam["Page"];
float fY = objParam["Y"];

PdfPage objPage = objDoc.Pages[nIndex];
objPage.Canvas.DrawLine( 0, fY, objPage.Width, fY );

Chapter 16: Color Spaces Chapter 14: Document Stitching and Other Features
Search AspPDF.net

Newsletter Signup

Other Products
AspPDF
AspUpload
AspJpeg
AspEmail
AspEncrypt
AspGrid
AspUser
  This site is owned and maintained by Persits Software, Inc. Copyright © 2003 - 2010. All Rights Reserved.