<?xml version="1.0" encoding="utf-8"?>
			
			<rss version="2.0" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cc="http://web.resource.org/cc/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">

			<channel>
			<title>Cutter&apos;s Crossing - PDF</title>
			<link>http://blog.cutterscrossing.com/index.cfm</link>
			<description>ColdFusion Development, Life, and Other Stuff</description>
			<language>en-us</language>
			<pubDate>Tue, 07 Sep 2010 17:57:07 -0400</pubDate>
			<lastBuildDate>Mon, 03 Dec 2007 13:48:00 -0400</lastBuildDate>
			<generator>BlogCFC</generator>
			<docs>http://blogs.law.harvard.edu/tech/rss</docs>
			<managingEditor>web.admin@cutterscrossing.com</managingEditor>
			<webMaster>web.admin@cutterscrossing.com</webMaster>
			<itunes:subtitle></itunes:subtitle>
			<itunes:summary></itunes:summary>
			<itunes:category text="Technology" />
			<itunes:category text="Technology">
				<itunes:category text="Podcasting" />
			</itunes:category>
			<itunes:category text="Technology">
				<itunes:category text="Tech News" />
			</itunes:category>
			<itunes:keywords></itunes:keywords>
			<itunes:author></itunes:author>
			<itunes:owner>
				<itunes:email>web.admin@cutterscrossing.com</itunes:email>
				<itunes:name></itunes:name>
			</itunes:owner>
			<itunes:image href="" />
			<image>
				<url></url>
				<title>Cutter&apos;s Crossing</title>
				<link>http://blog.cutterscrossing.com/index.cfm</link>
			</image>
			<itunes:explicit>no</itunes:explicit>
			
			<item>
				<title>CF8 PDF Manipulation: Pulling Text Out</title>
				<link>http://blog.cutterscrossing.com/index.cfm/2007/12/3/CF8-PDF-Manipulation-Pulling-Text-Out</link>
				<description>
				
				So, this morning a friend called me up with a problem. They had received some PDF files from their insurance company, and they needed the data in Word or Excel for manipulation. Now, they could cut and paste the information, but this was time consuming. She went to the Adobe site, trying to find info, and saw &apos;ColdFusion&apos; on the homepage. This sparked her brain, because she immediately went, &quot;Hey, Cutter does something with ColdFusion! Maybe he can help me!&quot;

Lucky for her, we now have ColdFusion 8, with it&apos;s built-in PDF support through the use of the CFPDF tag. I had to do a tiny bit of research on this, because Adobe&apos;s CF LiveDocs weren&apos;t overly clear, but I eventually found out that I could extract text with some very simple DDX processing directives.

&lt;a href=&quot;http://www.coldfusionjedi.com&quot; target=&quot;_blank&quot;&gt;Ray&lt;/a&gt; did a series of posts recently about working with PDF documents. Although none of them answered my question directly, he had written one about &lt;a href=&quot;http://www.coldfusionjedi.com/index.cfm/2007/7/24/ColdFusion-8-Working-with-PDFs-Part-7&quot; target=&quot;_blank&quot;&gt;using the DDX processing directives.&lt;/a&gt; This sent me searching the Adobe site for more information, which is where I came upon the &lt;a href=&quot;http://livedocs.adobe.com/livecycle/es/sdkHelp/programmer/sdkHelp/wwhelp/wwhimpl/js/html/wwhelp.htm?href=assemblePDFDDX_basics.95.1.html&quot; target=&quot;_blank&quot;&gt;Understanding DDX&lt;/a&gt; developer documentation. Basically, by rewriting Ray&apos;s simple example, I was able to extract all of the &lt;i&gt;DocumentText&lt;/i&gt; from the PDF and dump it into an XML file. First I need the DDX, which is just some simple XML:

&lt;code&gt;
&lt;cfsavecontent variable=&quot;myddx&quot;&gt;
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;DDX xmlns=&quot;http://ns.adobe.com/DDX/1.0/&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd&quot;&gt;
	&lt;DocumentText result=&quot;OutXML&quot;&gt;
		&lt;PDF source=&quot;Title&quot;/&gt;
	&lt;/DocumentText&gt;
&lt;/DDX&gt;
&lt;/cfsavecontent&gt;
&lt;cfset myddx = trim(myddx)&gt;
&lt;/code&gt;

Then, I verify the validity:

&lt;code&gt;
&lt;cfif isDDX(myddx)&gt;
yes, its ddx
&lt;cfelse&gt;
no its not
&lt;/cfif&gt;
&lt;/code&gt;

Now, a little explanation. Looking at the DDX, you&apos;ll notice I&apos;ve defined a &lt;b&gt;result&lt;/b&gt; and a &lt;b&gt;source&lt;/b&gt;. I had tried to define my file names here directly, but ColdFusion didn&apos;t like that when I hit the CFPDF tag. Apparently, when using the &lt;i&gt;processddx&lt;/i&gt; action of the tag, you are required to define your &lt;i&gt;inputfiles&lt;/i&gt; and &lt;i&gt;outputfiles&lt;/i&gt;. Further study of the LiveDocs shows that ColdFusion is expecting structures for these defininitions. So, the DDX references certain structure keys (OutXML and Title) which you must define prior to processing your pdf.

&lt;code&gt;
&lt;cfset inputStruct = StructNew() /&gt;
&lt;cfset inputStruct.Title = &quot;rptLauncher2.pdf&quot; /&gt;

&lt;cfset outputStruct = StructNew() /&gt;
&lt;cfset outputStruct.OutXML = &quot;words2.xml&quot; /&gt;
&lt;/code&gt;

You now have all of the necessary pieces. All that&apos;s required is your call to process your DDX directives.

&lt;code&gt;
&lt;cfpdf action=&quot;processddx&quot; ddxfile=&quot;#myddx#&quot; name=&quot;VARIABLES.doc&quot; inputfiles=&quot;#inputStruct#&quot; outputfiles=&quot;#outputStruct#&quot; /&gt;
&lt;/code&gt;

I CFDump the VARIABLES.doc to see my success or failure, which comes out just fine. I now have a file, words2.xml, sitting in my server&apos;s folder, which contains all of the content of the PDF file. Simple and sweet. 
				</description>
				
				<category>PDF</category>				
				
				<category>ColdFusion</category>				
				
				<category>ColdFusion 8</category>				
				
				<category>Development</category>				
				
				<pubDate>Mon, 03 Dec 2007 13:48:00 -0400</pubDate>
				<guid>http://blog.cutterscrossing.com/index.cfm/2007/12/3/CF8-PDF-Manipulation-Pulling-Text-Out</guid>
				
				
			</item>
			</channel></rss>