An issue came up at work recently where my development team was asked to create a web interface for the storage, updating, and retrieval of PDF documents. After taking a look at my options, I decided to go with Attachment Fu. Getting started was pretty simple, just follow the directions in the README and away we go.
After doing a couple of uploads on my development machine with some random PDFs I happen to have laying around, it seemed as if only small PDFs would work properly. Larger PDFs, upon retrieval, came back to the browser as corrupt files. This wasn’t really an issue, more of a nagging concern, since the PDFs we need to store are only a couple hundred kilobytes in size. The corruption I was encountering was happening when files were on the order of a couple megabytes. While we did not have the actual files that were going to be uploaded at the time, sample files were used and everything worked perfectly.
That is until we finally got our hands on the actual PDFs and then everything went to hell. The 200KB file came back from the database corrupt and a quick hotfix had to be devised to give users access to a hard copy of the file in question.
After about 2 days of investigation, throwing logger statements all over the code, and learning quite a bit more about connection adapters in Rails then I had previously known, I was able to trace the problem down a bit. The problem seems to stem from the null character \0, or more specifically \000, in the PDF and the pg gem’sstring_to_binary method when used through ActiveRecord. To get around this issue, I modified my copy of Attachment Fu’s DbFile Backend process’s save_to_storagemethod (in RAILS_ROOT/vendor/plugins/attachment_fu/lib/technoweenie/attachment_fu/backends/db_file_backend.rb) as follows:
# Saves the data to the DbFile model def save_to_storage if save_attachment? # NOTE there seems to be a problem with the PostgreSQL gem. Storing data manually on Create/Update. if self.connection.class == ActiveRecord::ConnectionAdapters::PostgreSQLAdapter unless db_file build_db_file db_file.save! self.class.update_all ['db_file_id = ?', self.db_file_id = db_file.id], ['id = ?', id] end # NOTE Use a slightly adjusted SQL call than Rails would use with ::bytea self.connection.update_sql "UPDATE \"db_files\" SET \"data\" = '#{ActiveRecord::ConnectionAdapters::PostgreSQLColumn.string_to_binary(temp_data)}'::bytea, \"updated_at\" = '#{Time.now.to_s(:db)}' WHERE \"id\" = #{self.db_file_id};" else (db_file || build_db_file).data = temp_data db_file.save! self.class.update_all ['db_file_id = ?', self.db_file_id = db_file.id], ['id = ?', id] end end true end
Update: Well, ran into another issue when moving from development and QA environment to our staging environment. The standard "data" = E'DATA GOES HERE' syntax was throwing an error, checking the PostgreSQL website, I adjusted the query to be "data" = 'DATA GOES HERE'::bytea and all seems to be running smoothly again. Seems like more of the root problem is coming into view. If I get some time, I’ll see if I can create a test case where this fails, and look further into a solution. In the meantime, my fix above does the job I need and PDFs of all sizes work properly with PostgreSQL.
Update 2: Looks like somebody beat me to the punch. The PostgreSQL adapter is fixed in Rails 2.2, here is the commit message for those of you interested: Fix binary data corruption bug in PostgreSQL adaptor