Persits Software, Inc. Web Site
 Navigator:  Home |  Manual |  Chapter 17: PDF to Image Conversion
IE-based HTML-to-PDF Conversion Chapter 16: Color Spaces
  Chapter 17: PDF to Image Conversion
17.1 Page.ToImage Method
17.2 CMYK-to-RGB Conversion
17.3 Error Log
17.4 Other ToImage Parameters
17.5 Image Extraction
17.6 Printing
17.7 Structured Text Extraction

17.1 Page.ToImage Method

Starting with Version 2.6, AspPDF.NET is capable of converting a PDF page to an image via the method PdfPage.ToImage.

The ToImage method returns an instance of the PdfPreview object which performs the PDF-to-image conversion and generates the page image. The image can then be saved to disk, memory or an HTTP stream via the methods Save, SaveToMemory and SaveHttp, respectively. The ToImage method accepts an optional PdfParam object or parameter string argument.

By default, PdfPreview generates images in PNG format. This versatile image format is ideal for the task since it supports true colors and uses artifact-free lossless compression. PNG format is fully supported by all major browsers. PdfPreview can save images in other image formats supported by the .NET framework as well, such as JPEG, BMP, GIF and TIFF. When saving to disk, the format is specified via the file extension (last 3 characters of the path.) When saving to memory or an HTTP stream, the format is specified via the optional ImageFormat argument.

The following code sample generates a PDF document from a URL, saves/reopens it, and converts page 1 of the document to an image. The saving/reopening step is necessary because the PdfPage.ToImage method can only be used on existing, not new, documents.

C#
void Page_Load(Object Source, EventArgs E)
{
   // Chapter 17: PdfPreview Sample


   PdfManager objPdf = new PdfManager();
   PdfDocument objDoc = objPdf.CreateDocument();
   objDoc.ImportFromUrl( "http://www.aspupload.com", "scale=0.6" );

   // Save and reopen as Page.Preview only works on new documents.
   PdfDocument objNewDoc = objPdf.OpenDocument(objDoc.SaveToMemory());

   // Create preview for Page 1 at 50% scale.
   PdfPage objPage = objNewDoc.Pages[1];
   PdfPreview objPreview = objPage.ToImage("ResolutionX=36; ResolutionY=36");

   objPreview.SaveHttp("filename=preview.png"); // PNG format by default
}
 
VB.NET
Sub Page_Load(Source As Object, E as EventArgs)

   ' Chapter 17: PdfPreview Sample

   Dim objPdf As PdfManager = new PdfManager()
   Dim objDoc As PdfDocument = objPdf.CreateDocument()
   objDoc.ImportFromUrl( "http://www.aspupload.com", "scale=0.6" )

   ' Save and reopen as Page.Preview only works on new documents.
   Dim objNewDoc As PdfDocument = objPdf.OpenDocument(objDoc.SaveToMemory())

   ' Create preview for Page 1 at 50% scale.
   Dim objPage As PdfPage = objNewDoc.Pages(1)
   Dim objPreview As PdfPreview = objPage.ToImage("ResolutionX=36; ResolutionY=36")

   objPreview.SaveHttp("filename=preview.png") ' PNG format by default
End Sub

Click on the links below to run this code sample:

http://localhost/asppdf.net/manual_17/17_pdftoimage.cs.aspx
http://localhost/asppdf.net/manual_17/17_pdftoimage.cs.aspx  Make sure AspPDF.NET is installed on your local machine and IIS is running for these links to work.

By default, the resultant image's width and height in pixels match the page's width and height in user units. For example, a standard US Letter page (8.5 x 11 inches or 612 x 792 user units) becomes a 612 x 792 pixel image (in case of a landscape-oriented page, the width and height of the image matches the height and width of the page, respectively.)

To set the page image to a desired size, the ResolutionX and ResolutionY parameters to the ToImage method should be used. These parameters are 72 (dpi) by default. Note that in the code sample above, these parameters are both set to 36 which makes the image dimensions half of what they would be by default. Setting these parameters to a number larger (smaller) than 72 makes the resultant image proportionally larger (smaller). The ResolutionX and ResolutionY values usually equal each other to avoid a distortion in the image's aspect ratio.

To obtain the pixel dimensions of the resultant image, use the PdfPreview object's properties Width and Height.

PDF-to-Image functionality is demonstrated by the Live Demo #15.

17.2 CMYK-to-RGB Conversion

Page images generated by the ToImage method are always in the RGB color space, but the original PDF being converted to an image may contain images and graphics in the CMYK color space, in which case they have to be converted to RGB.

To achieve reasonably good color reproduction, the ToImage method performs a series of complex non-linear color transformations based on profiles, the standard color space definitions established by the International Color Consortium (ICC). Profile-based CMYK-to-RGB conversion is a fairly slow process. If your PDF document contains large high-resolution CMYK images and performance is of essence, use the parameter FastCMYK=True which invokes a simple linear formula for CMYK-to-RGB conversion and offers some performance improvement at the expense of color-reproduction quality.

The following images demonstrate the effect of the FastCMYK parameter:

Original Document FastCMYK=False (default) FastCMYK=True

17.3 Error Log
The majority of PDF documents are self-sufficient since they embed all the fonts they use. Documents like that make it easy and straightforward for PDF viewers such as Acrobat Reader and PDF-to-image converters such as AspPDF to do their jobs.

Some PDF documents, however, only reference font names and do not embed the fonts themselves. While large applications such as Acrobat have the luxury of being deployed with a whole library of fonts they can use, AspPDF only contains the 14 required standard PDF fonts in it and must rely on the fonts already installed on the system in case the PDF document does not embed a certain font. Therefore, not every document can be rendered properly.

To help diagnose such issues, the PdfPreview object provides the property Log that returns a string of errors encountered during the PDF-to-image conversion process. Most of these errors are usually font-related. Error entries are separated by a pair of CRLFs. To enable error logging, the parameter Debug=True must be used:

PdfPreview objPreview = objPage.ToImage( "ResolutionX=20; ResolutionY=20; Debug=True" )
Response.Write( objPreview.Log );

A typical log string may look as follows:

Font 'Palatino-Roman' was replaced by 'Helvetica'.

Could not find external TrueType font 'ArialUnicodeMS'.

17.4 Other ToImage Parameters
A PDF page may contain a /Rotate attribute set to 90, 180 or 270 (degrees), which makes this page appear in the landscape mode or upside-down. By default, the ToImage method takes the Rotate value into account to orient the image appropriately. If, for whatever reason, the Rotate value needs to be ignored, the parameter IgnoreRotate should be set to True.

PdfPreview objPreview = objPage.ToImage( "...; IgnoreRotate=True" );

Also, some PDF pages contain /MediaBox and /CropBox attributes which are different from each other. By default, the ToImage method uses the CropBox attribute to calculate the dimensions of the resultant image, which is consistent with the behavior of all major PDF viewers. If, for whatever reason, the MediaBox attribute needs to be used instead (which usually covers a larger area than the CropBox), the parameter IgnoreCropBox should be set to True. This may cause the image to include areas of the page that are normally invisible when viewed in a PDF viewer.

PdfPreview objPreview = objPage.ToImage( "...; IgnoreCropBox=True" );

17.5 Image Extraction
As an additional bonus, the PdfPreview object is capable of extracting images from a PDF page. The method PdfPreview.ExtractImage returns a new instance of the PdfPreview object representing an image specified by a 1-based index. This image can then be saved the regular way, via the methods Save, SaveToMemory or SaveHttp. If the specified index exceeds the number of images on the page, ExtractImage returns null (C#) (or Nothing in VB.NET.)

By default, the images are saved as PNGs as this is PdfPreview's format of choice, can other formats such as JPEG, GIF and BMP can be used as well.

The following code snippet opens a PDF documents and saves all of the images from page 1 to disk:

C#
PdfManager objPDF = new PdfManager();
PdfDocument objDoc = objPDF.OpenDocument(@"c:\path\mydoc.pdf");
PdfPage objPage = objDoc.Pages[1];
PdfPreview objPreview = objPage.ToImage();
    
int i = 1;
PdfPreview objImage;

while ((objImage = objPreview.ExtractImage(i++)) != null)
{
   objImage.Save(@"c:\path\image.png", false);
}
VB.NET
Dim objPDF As PdfManager = New PdfManager()
Dim objDoc As PdfDocument = objPDF.OpenDocument("c:\path\mydoc.pdf")
Dim objPage As PdfPage = objDoc.Pages(1)
Dim objPreview = objPage.ToImage()

Dim i As Integer = 1

Do While True
  Dim Image As PdfPreview = objPreview.ExtractImage(i)

  If Image Is Nothing Then
     Exit Do
  Else
     Image.Save("c:\path\image.png", False)
  End If

  i = i + 1
Loop

17.6 Printing
17.6.1 Individual Page Printing

The PdfPreview object offers automatic printing functionality via the method SendToPrinter. This method sends the image of a page contained in this PdfPreview object to a printer.

The SendToPrinter method accepts two arguments: the network name of the printer and a parameter list adjusting the appearance of the printout. If the printer name is set to null (C#) or Nothing (VB.NET), the default printer name for the current machine is used.

The print quality is determined by the resolution of the image being printed. It is therefore recommended that the ResolutionX and ResolutionY parameters to the ToImage method be set to at least 300 or, better yet, 600.

By default, the SendToPrinter method prints the image as it is, without stretching it. If the parameter Stretch is set to True, the image is stretched to cover the entire print area. Additionaly, you can use the parameters ScaleX and ScaleY to scale the image up or down. For example, the values "ScaleX = 0.5; ScaleY= 0.5" scales the image down by 50%.

The following code sample sends page 2 of c:\path\document.pdf to the printer. The image is stretched to cover the entire print area.

C#
PdfManager objPDF = new PdfManager();
PdfDocument objDoc = objPDF.OpenDocument(@"c:\path\document.pdf");
PdfPage objPage = objDoc.Pages[2];
PdfPreview objPreview = objPage.ToImage("ResolutionX=600; ResolutionY=600");

objPreview.SendToPrinter(@"\\192.168.1.2\HP LaserJet 6P", "Stretch=true");
VB.NET
Dim objPDF As PdfManager = New PdfManager()
Dim objDoc As PdfDocument = objPDF.OpenDocument("c:\path\document.pdf")
Dim objPage As PdfPage = objDoc.Pages(2)
Dim objPreview = objPage.ToImage("ResolutionX=600; ResolutionY=600")

objPreview.SendToPrinter( "\\192.168.1.2\HP LaserJet 6P", "Stretch=true" )

You may encounter a run-time error when attempting to send the image to a network printer. To avoid the error, you need to impersonate an interactive user account with the LogonUser method of the PdfManager object, as follows:

C#
...		
objPDF.LogonUser( "domain", "username", "password" );
objPreview.SendToPrinter( @"\\192.168.1.2\HP LaserJet 6P", "Stretch=true" );

VB.NET
...
objPDF.LogonUser( "domain", "username", "password")
objPreview.SendToPrinter( "\\192.168.1.2\HP LaserJet 6P", "Stretch=true" )

The first argument to the LogonUser method is a Windows domain and can be an empty string. The 2nd and 3rd arguments are the username and password of the account to be impersonated. The 4th argument is the login type and usually omitted.

As an alternative to the LogonUser method, a user can be impersonated via the <identity> tag in your Web.config file, as follows:

<configuration>
  <system.web>
    <identity impersonate="true" userName="username" password="password" />
  </system.web>

  ...
</configuration>

17.6.2 Document Printing

As of Version 2.7, the page printing functionality described in the previous sub-section has been expanded to support the printing of an entire document or any portion thereof, in both simplex (one-sided) and duplex (double-sided) modes.

In AspPDF.NET 2.7+, the PdfDocument object has been given its own SendToPrinter method which sends the entire document (or an arbitrary set of pages) to the printer as opposed to just an individual page. Internally, the PdfDocument.SendToPrinter method iterates through the pages of the document, creates PdfPreview objects for each of them and sends the page images to the printer one by one. If your printer supports duplex printing, this method can optionally print double-sided documents in both long-edge and short-edge binding modes.

The PdfDocument.SendToPrinter method expects the same arguments as its PdfPreview.SendToPrinter counterpart: the printer name (local or network) and a list of parameters. In addition to the parameters described above, this method also accepts parameters controlling the duplex mode, as well as the page ranges to be printed.

The Duplex parameter controls duplex printing. Duplex=1 enables duplex printing in the regular long-edge-binding mode, and Duplex=2 in the short-edge binding mode. If this parameter is set to 0 or omitted, the regular simplex (one-sided) printing is used.

The From1/To1, From2/To2, ..., FromN/ToN pairs of parameters specify the ranges of pages to be printed. Page indices are 1-based. If the specified ranges overlap, the overlapping pages will be printed multiple times. By default, the entire document is printed.

The following code prints pages 2, 5, and 7-10 of a document in a duplex long-edge-binding mode:

C#
PdfDocument objDoc = objPdf.OpenDocument( @"c:\path\doc.pdf" );
objDoc.SendToPrinter( "Brother HL-2270DW series",
	"Stretch=true; Duplex=1; From1=2;To1=2; From2=5;To2=5; From3=7;To3=10" );
VB.NET
Dim objDoc As PdfDocument = objPdf.OpenDocument( "c:\path\doc.pdf" )
objDoc.SendToPrinter( "Brother HL-2270DW series", _
	"Stretch=true; Duplex=1; From1=2;To1=2; From2=5;To2=5; From3=7;To3=10" )

As of Version 2.9, the PdfDocument.SendToPrinter method also supports printer tray (or bin) selection via the Tray parameter. The Microsoft documentation defines the following values for this parameter:

First (1), upper (1), only one (1), lower (2), middle (3), manual (4), envelope (5), envelope manual (6), auto (7), tractor (8), small format (9), large format (10), large capacity (11), cassette (14), form source (15), last (15).

However, many printers use driver-specific tray values that start at 256 and up. The correct Tray values for such printers should be determined by trial and error.

As of Version 3.2.0.30569, the PdfDocument.SendToPrinter method also supports printing multiple copies via the Copies parameter (1 by default.)

As of Version 3.4.0.31147, the Collate parameter (False by default) is supported, which enable collation if set to True. For example, if you print two copies of a three-page document and you choose not to collate them, the pages print in this order: 1, 1, 2, 2, 3, 3. If you choose to collate, the pages print in this order: 1, 2, 3, 1, 2, 3.

Also, Version 3.4.0.31147 adds two more parameters to support label printers: PaperWidth and PaperHeight. These parameters specify the paper width and height in tenths of a millimeter. For example, if your label printer uses 2 5/16" x 4" labels, the PaperWidth and PaperHeight parameters should be set to 590 and 1020, respectively. If these parameters are not used, a document may come out shrunk when printed on a label printer.

17.7 Structured Text Extraction
As of version 2.8, the PdfPreview object has been expanded to perform yet another useful task: extracting text strings from the document along with their respective coordinates. This feature enables you to know exactly where on the page a particular text item is. Regular (coordinate-less) text extraction is described in Section 9.4 - Content Extraction.

PdfPreview's TextItems property returns a collection of objects, each encapsulating the text fragment and its respective coordinates and dimensions. To avoid adding a new object to AspPDF.NET's already populous object diagram, we have retrofitted the PdfRect for the task by adding a new Text property to this object which returns the actual fragment of text in Unicode format. The existing PdfRect properties, Left, Bottom, Width and Height, return the coordinates of the lower-left corner and horizontal and vertical dimensions of the fragment, respectively.

To populate the TextItems collection, the PdfPage.ToImage method must be called with the parameter ExtractText set to a non-zero value. ExtractText can be a combination (sum) of the following flags:

  • Bit 1 (1): Enables text extraction. If this flag is not set, text extraction is not performed and the TextItems collection is empty.
  • Bit 2 (2): Sorts text fragments in the order from top to bottom, and from left to right. If this flag is not set, the text fragments in the TextItems collection appear in an arbitrary order.
  • Bit 3 (4): Glues adjacent text fragments together. If this flag is not set, a single text fragment may contain a single word, a part of the word or even a single character. Setting this flag usually combines all or most text fragments of a paragraph line into a single long string. For this flag to work, bit 2 must also be set.
  • Bit 4 (8): Does not glue adjacent text fragments if there is a space character separating them. For this flag to work, flags 2 and 3 must also be set.

The following code sample draws red outlines around all text fragments it finds on a page, as well as the order in which each fragment is encountered in the collection (as shown on the image below.)

C#
PdfManager objPdf = new PdfManager();
PdfDocument objDoc = objPdf.OpenDocument(Server.MapPath("population.pdf"));
PdfPage objPage = objDoc.Pages[1];
PdfPreview objPreview = objPage.ToImage("extracttext=7"); // sort/glue

objPage.Canvas.LineWidth = 0.5f;
objPage.Canvas.SetFillColor(1, 0, 0);

int i = 1;

foreach(PdfRect rect in objPreview.TextItems)
{
    objPage.Canvas.SetColor(1, 0, 0);
    objPage.Canvas.SetFillColor(1, 0, 0);

    // Red outline
    objPage.Canvas.DrawRect(rect.Left, rect.Bottom, rect.Width, rect.Height);

    // Small box on top to display count
    objPage.Canvas.FillRect(rect.Left, rect.Top, 10, 5);
    objPage.Canvas.DrawRect(rect.Left, rect.Top, 10, 5);
    objPage.Canvas.DrawText(i.ToString(), "x=" + (rect.Left + 1).ToString()+";y=" + 
		(rect.Top + 6).ToString() + ";color=white; size=5",
        objDoc.Fonts["Helvetica"]);

    i++;
}

String strFilename = objDoc.Save(Server.MapPath("extracttext.pdf"), false);
VB.NET
Dim objPdf As PdfManager = New PdfManager()
Dim objDoc As PdfDocument = objPdf.OpenDocument(Server.MapPath("population.pdf"))
Dim objPage As PdfPage = objDoc.Pages(1)
Dim objPreview As PdfPreview = objPage.ToImage("extracttext=7") ' sort/glue

objPage.Canvas.LineWidth = 0.5f
objPage.Canvas.SetFillColor(1, 0, 0)

Dim i As Integer = 1

For Each rect As PdfRect in objPreview.TextItems
    objPage.Canvas.SetColor(1, 0, 0)
    objPage.Canvas.SetFillColor(1, 0, 0)

    ' Red outline
    objPage.Canvas.DrawRect(rect.Left, rect.Bottom, rect.Width, rect.Height)

    ' Small box on top to display count
    objPage.Canvas.FillRect(rect.Left, rect.Top, 10, 5)
    objPage.Canvas.DrawRect(rect.Left, rect.Top, 10, 5)
    objPage.Canvas.DrawText(i.ToString(), "x=" + (rect.Left + 1).ToString() + _
        "; y=" + (rect.Top + 6).ToString() + ";color=white; size=5", _
        objDoc.Fonts("Helvetica"))

    i = i + 1
Next

Dim strFilename As String = objDoc.Save(Server.MapPath("extracttext.pdf"), false)

Click on the links below to run this code sample:

http://localhost/asppdf.net/manual_17/17_extracttext.cs.aspx
http://localhost/asppdf.net/manual_17/17_extracttext.vb.aspx  Make sure AspPDF.NET is installed on your local machine and IIS is running for these links to work.

IE-based HTML-to-PDF Conversion Chapter 16: Color Spaces
Search AspPDF.net

Newsletter Signup

Other Products
AspPDF
AspUpload
AspJpeg
AspEmail
AspEncrypt
AspGrid
AspUser
  This site is owned and maintained by Persits Software, Inc. Copyright © 2003 - 2014. All Rights Reserved.