Large BLOB to MSSQL database
While trying to maintain security restrictions for a new system I’m developing to deliver data to our customers I came up with the following chunks of code to insert data in 256k chunks into a Microsoft SQL Server database.
The problem was the unavoidable fact that the server couldn’t reserve a contiguous space in its memory to handle the query. The query came out to nearly 120MB just for a 25MB file. Even with a recent upgrade of 2GB memory the server still couldn’t handle it and I searched high and low for an answer.
Out of desperation I felt I had to contact some SQL gurus so I popped over to SQLTeam.com and posted up my problem. I received a great response from one of the developers of SQL Server 2005’s storage engine which pretty much said: Split it in to chunks to avoid the problem altogether.
Hmmm… chunks… data… how the f*** am I gonna do that then? Well, trial and error for the most part.
One of our developers pointed me in the right direction with a book called “Guru’s guide to SQL Server” and a section on storing BLOBs. Again it said to use chunking but told me exactly what I needed to use to get it done: the UPDATETEXT function. One problem, it needs to return a file pointer to the field where you’re storing the data and it need an exact integer figure to know where to start inserting the information. (I’m avoiding calling it text or data at this point, you’ll see why shortly).
Here’s the function’s syntax at MSDN: UPDATETEXT Syntax
Reading through that means if I’m doing this in chunks I need to know exactly how many bytes I’ve already inserted so I can give it the offset value to insert. OK, I can cover that bit by knowing that I’m inserting a specific amount at a time, in this case 256000 bytes.
Next thing is… UPDATETEXT needs a text-pointer value so that it can update the text in the field. Well I wouldn’t even know where to begin to return that back to PHP so that’s that. I know that stored procedures exist in SQL Server and since the UPDATETEXT function needs this TEXTPTR value I’ll have to use a stored procedure to do what I need to do! If I tried to do it in PHP the result would be automatically lost as soon as it’s returned so there’s no point wasting my time.
So, we need a stored procedure that returns a text-pointer value, takes an offset value AND the data to insert. With some assistance from our dev guys, this is what I came up with:
The zipfile column data type is TEXT, I tried IMAGE but it was problematic.
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE PROCEDURE dbo.dds_writeBlob @dataChunk AS TEXT, @refCode AS VARCHAR(50), @offSet AS INT AS DECLARE @dataPtr AS BINARY(16) SELECT @dataPtr=TEXTPTR(zipfile) FROM [dbo].[file] (UPDLOCK) WHERE [dbo].[file].ref = @refCode UPDATETEXT [dbo].[file].zipfile @dataPtr @offSet 0 @dataChunk GO |
OK, the stored procedure: The @dataChunk is expected in TEXT format, the @refCode is a unique reference to the record I’m updating and we’ll be passing the @offSet value to it aswell. Remember we’re doing this in chunks so we’ll need to keep track of the offset, it’s basically a bit of simple maths in a loop but you’ll see that in a minute. Now we know the syntax for our stored procedure and this dictates what we need to fire at it when we run it from PHP.
The bit of the stored procedure that actually does anything useful is the line that starts with UPDATETEXT. From UPDATETEXT’s syntax we can see what I’m passing to the stored procedure.
The english:
UPDATETEXT [what I want to update] [the text-pointer value] [the offset value] [how much I want to delete, nothing of course] [and finally, the chunk of data that I want to append to the field]
Great, now we just need the PHP to pass these bits of information to the stored procedure and we’re up and running.
Firstly I have to convert the binary data to a type that can be accepted by TEXT data type, remember that’s the type of field we’re inserting in to, also it’s the data type that the stored procedure is expecting. base64_encode() (the PHP function) comes in handy for this purpose but of course I need it in chunks so I use chunk_split() to split it in to chunks of 256000 bytes. I then explode() it in to a numerically indexed array so that I can loop through with a for(), passing the data chunks, in order, one at a time to the stored procedure. Here’s the code:
1 2 3 4 5 6 7 8 9 10 |
$data = chunk_split(base64_encode(fread(fopen($file, "rb"), filesize($file))), 256000); $arrData = explode("\r\n", $data); $num = count($arrData); $offset = 0; for ($i=0;$i< $num;$i++) { $buffer = $arrData[$i]; $query = "EXEC [dds_writeBlob] '".$buffer."', '$reference', $offset"; @mssql_query($query) or die(mssql_get_last_message()); $offset = ($offset + strlen($buffer)); } |
One line at a time:
- Open the file for reading binary data, read it all, base64 encode it and split it in to chunks of 256000 bytes.
- Explode it at the point chunk split inserted its default split term. Explode returns to a numerically indexed array so we can loop through the data in order
- Count the number of elements in the array and store the result in a variable.
- Set the offset value to zero because we’ll be starting to insert data in to the field at offset 0
- Set $i to 0, for the number stored in $num, run the following code and then add 1 to $i. If $i is ever greater than $num, stop running the code.
- Set the variable $buffer to the array element $arrData[$i]. We get $i from the for loop, so the first loop will be $arrData[0], second is $arrData[1] and so on until we loop through the specified number of times.
- Set the query we’re going to run using the buffer value, the reference (to the record we’re updating) and the offset value which, on the first loop is going to be 0. So, first run, our $query really looks something like: EXEC [dds_writeBlob] ‘UHRENJOIDNV9u44b9g84…[etc for 256000 bytes]’, ‘895tng93b2nt9309’, 0
- Execute the query or die trying
- Update the offset value by adding the integer value of the length of the $buffer variable. Bearing in mind the buffer’s integer length value “should” be 256000 bytes, this would set $offset to 256000 so that on our next loop, we pass the offset value of 256000 in to the stored procedure, 512000 on the next etc etc.
Right, hopefully you understand that and can implement it wherever you like. I’ve tried my best to explain it but if there’s anything you’re not sure of, please let me know and I’ll see what I can do to answer it.
Retrieving the data
Simply select the base64 encoded data from the record in question and base64 decode it before presenting it to the browser. Remember, if all went well with our UPDATETEXT code then all the data will be in one field so we don’t need to do any fancy code to retrieve it and re-merge it or anything.
1 2 3 4 5 6 |
$query = "SELECT zipfile, job_code FROM [file] WHERE ref = '$ref'"; @$result = mssql_query($query) or die('File Download: Failed to get file from the database.'); $file = mssql_fetch_assoc($result); header('Content-type: application/zip'); header('Content-Disposition: attachment; filename="job-'.trim($file['job_code']).'.zip"'); echo base64_decode($file['zipfile']); |
Remember though that we need to tell the browser what type of file to expect. I’ve actaully set the name by using another field from the record but you can just create random names if you like or make it something permanent. If you’ve got your forward thinking cap on, you’ll store the size of the file in the database aswell so that you can issue a Content-length header to the browser aswell so the user knows how much data to expect!
I hope this has been useful to you.
Lewis
What sql driver did you use?
Are you using freetds or something else?
We’re having trouble getting 2mb files out of the db, while smaller files come out just fine.
Hi Scott,
It was SQL Server 2000 as the back-end.
It sounds to me like you have ommitted to make any changes in your PHP.INI. There are all kinds of limitations in there regarding file uploads and how much data PHP will happily return from the SQL Server.
HTH – Lewis