February 15, 2008

Knowledgebase - File Uploading

Knowledgebase - File Uploading

On the boards, I often get questions regarding file uploads. Usually these questions display a lack of knowledge about how a file is treated, and therefore sent, to the server. I have included here a brief tutorial that I hope will help the new programmer in understanding what is happening to the file as it travels from the user's computer to your server.

First off, the form page.
Try this simple form:
<form method="post" action="getfile.cfm" enctype="multipart/form-data">
<input type="file" name="uploadfile" size="40" />
<input type="submit" name="submit" value="Send the File Off" />
</form>
As you can see, this form is extremely simple. On a real upload page you would want at minimum some instructions to the user as well as a description of what this page will be doing.

When a user fills out the form in their local browser and clicks on the submit button, the file will be sent automatically from the user's computer to your server. In the 'form' tag above, the 'method=post' tells the form to send all information in as a 'post' form (not as part of the url string). The 'action' parameter tells the form which page to load after the form is submitted. The 'enctype=multipart/form-data' tells the form that there is a file attached to the form that needs to be sent to the server (without this, the file will not be received by the server at all).

So, here are the steps so far:
  1. The user opens your 'uploadfile.html' page in their browser. (You could have called this page anything 'upload.cfm', 'uf.html', etc)
  2. The user clicks on the 'browse button' in the form and chooses which file on their local computer to send to your server.
  3. The user clicks on the 'submit' button.
  4. The file is sent with the form to your server
  5. Your web server recognizes the 'multipart/form-data' enctype and accepts the file, saving it in a temporary (temp) directory on the web server.
  6. Your web server retrieves the 'action' page (getfile.cfm, could have been called anything as long as it is a CF page) and processes it.
Next, in order to actually use the file, you will need to save it into a 'permanent' directory on your webserver.
Try this simple CF page:
<cffile action = "upload" fileField = "UPLOADFILE" destination = "#GetDirectoryFromPath(expandpath('*.*'))#">
This one command is both 'misleading' as well as very powerful. First the powerful part. This command retrieves the location of the uploaded file from the current location in the temp directory. It moves the file from the temp directory to the destination directory. and then returnes to the coldfusion file a 'cffile structure' filled with information about the file that was just uploaded.
The misleading part? The action says 'upload'. Most people (myself included) read that to mean 'upload from the users computer'. Which can lead the programmer to believe that the file stays on the user's computer until this command is run. Which can also lead the programmer to believe that they have some minute control over the user's computer. What the 'upload' really means is 'copy from temp directory' or 'upload to a permanent directory'.

So we pick up our server activity steps from where we left off above. (web server retrieves the 'action' page)
  1. Since the action page is a .cfm page the web server sends the page to the ColdFusion engine.
  2. The ColdFusion engine sees the cffile upload command and;
    • Copies the file from the temp directory to the specified destination directory
    • If the copy was successful (no 'permission' errors), then the temp directory file is deleted
  3. ColdFusion creates a structure variable called 'cffile' to contain all of the information returned by the cffile action
  4. After ColdFusion engine is done with the file, it is sent back to the webserver to be sent as a 'static' html page to the user's browser.
This is how you upload a file from the user's browser to a directory on your web server. It is important to mention that this is an example of the most simple of cases. There are other actions that you should take when uploading a file. For instance, you should determine if you want the filename to be unique. You should limit the file types to those types you are allowing to be uploaded. You probably don't want anyone to upload a .cgi, .pl, .exe, etc that may 'compromise' your web server.

Something I would like to address, that has come up a few times, is this:
A certain browser (ie) made by the same company that makes a certain popular operating system (windows) sends the user's directory information for the file along with that file to the webserver. You can view it by displaying the 'cffile.CLIENTDIRECTORY' variable, after performing the cffile tag. It is important not to rely on this information. Unless you can guarantee that all browsers in your system will be IE (or at least IE7) then this information will not be available for all browsers. The reason is due to the 'sandbox' rules. These rules are there to protect the user from web sites that might want to do too much and possibly jeopardize the user's system. 'IE' has always broken those rules where it sees fit, but other browsers such as Netscape (now dying off), Firefox, Safari, etc respect the sandbox and don't send that information.

No comments: