The EntitySpaces Community

Share and learn about the EntitySpaces Architecture.
Welcome to The EntitySpaces Community Sign in | Join | Help
in
Home Forums Photos

General advice please: Store docs in database or on filesystem?

Last post 07-28-2008, 5:42 AM by Scott.Schecter. 5 replies.
Sort Posts: Previous Next
  •  07-28-2008, 1:40 AM 10383

    General advice please: Store docs in database or on filesystem?

    Hi all

    Just wanted to get your thoughts on whether it's best to store (word/pdf) documents in the database or on the filesystem (and store a link to the file in the DB)?
     
    In my app I'll need to store 2 versions of the same file - the original version, and a re-formatted version.  The docs will all be either PDF or Word at this point (but other formats such as HTML may be added).  I need to be able to search for matching text/data through both versions.  It's vital that the link between the database record and the files is not broken at any point (i.e. I can't afford to have a situation where the database record is pointing to a non-existent document as it's been deleted etc).
     
    I can see pros/cons of each approach (filesystem/DB) but wondered whether anyone had any advice/experience with doing it either way and can provide any comments or warn for potential gotcha's etc.
     
    From my perspective, pros/cons are below:
     
    Stored in DB
    pros:
     - Single point of failure
     - If relationships setup correctly, won't be possible to 'orphan' documents from their associated records
     - Can search using fulltext
     - "Anywhere" availability - i.e. If I can access the db - I can access the documents
     
    cons:
     - Performance (possibly)
     - Using a db for what essentially is a filesystem task
     - If db is unavailable due to outage, everything is out
     - db size will be impacted heavily (number of documents likely to be large)
     
     
    Stored in filesystem
    pros:
     - Performance
     - No space taken in DB
     - Documents still available if DB becomes unavailable
     
    cons:
     - If file is deleted from outside of app, db will point to a file that no longer exists
     - Now have 2 points of failure
     - searching through documents may be more difficult (not checked this though)
     
    My "gut" feel is that I'd prefer to use the DB but if I'm honest that's probably driven by my lack of experience of dealing with maintaining links to the filesystem and ensuring the integrety of those links remain solid (i.e. if a file is deleted outside of my app, how to deal with the link/missing file within the app).
     
    If anyone has any experience with either approach I'd really appreciate your comments etc
     
    Cheers
    Martin
  •  07-28-2008, 4:07 AM 10385 in reply to 10383

    Re: General advice please: Store docs in database or on filesystem?

    Hey Martin, I think they pros/cons you listed are perfectly valid. For me it usually comes down to two of the line items you listed; performance and security. On the performance front, you  may want to perform some read/writes on each type with sample docs that represent typical size/content and compare your metrics for filesystem and db. Another consideration you also wisely mention is full text indexing, this would be trivial with the db approach, but you would have to create/find a spider to index the content then use something like lucene.net if you went the filesystem route. The other thing I typically consider in this scenario is security. Documents are much easier to secure in the db than on the file system, this is especially true for web applications. Given either route you are probably going to have to write some routines to check that the documents have not been removed (although I think this scenario is less likely with the db route since typically fewer people have the ability to delete the binary object without the app knowing). These are my general thoughts on the subject at a high level, feel free to follow up with comments/questions.

    Regards,

    Scott Schecter
    EntitySpaces | My Site
  •  07-28-2008, 4:23 AM 10387 in reply to 10385

    Re: General advice please: Store docs in database or on filesystem?

    Hi Scott

    Thanks for the reply. 

    FYI the docs I'll be storing are going to be CV's (resumes) so searching functionality is vital - I wish I had the time to investigate some of the information extraction stuff out there at the moment as "intelligent" handling of this type of data would be a huge add-on for me and the users i.e. being able to differentiate between someone having Project Management experience rather than just finding the words "Project Manager" in the CV (as in "reporting to the Project Manager") would be great - still, maybe something for v2 (given time/resources/understanding of the available technology!)

    This is why it's so key for me to a) make sure the documents aren't orphaned, b) are searchable, and c) secure "to a degree" (I say that as there wouldn't be an outside touchpoint - any externally provided CVs/resumes would get put into a separate holding database to check for any potentially malicious/badly formatted/corrupted data etc) - still, the added bonus of an extra level of security certainly isn't going to do me any harm.

    I think, unless I hear a good reason not to, that I'll likely go down the "store in the DB" route as this appears to give me most pros in my needs list

    Cheers

    Martin

  •  07-28-2008, 4:33 AM 10389 in reply to 10387

    Re: General advice please: Store docs in database or on filesystem?

    I think performance should be your primary concern then. I know resumes, generally aren't that large but one thing you don't mention is how many users you think would be searching using your app at once, etc. I would probably create an integration test then set up some iterations, run profiler and check your numbers. At least that way you should have some idea what to expect in regards to performance before you invest much time/energy into committing to that route.

    Regards,

    Scott Schecter
    EntitySpaces | My Site
  •  07-28-2008, 5:27 AM 10390 in reply to 10389

    Re: General advice please: Store docs in database or on filesystem?

    Hi Scott

    My original intention was that the app would be used internally by the recruitment consultancy themselves and therefore the number of users hitting the db at the same time is unlikely to be high at this point.  Thinking on your question though makes me think that it may be useful in the future to open up the resume matching/searching to their client base (or a subset of) in the future via a web app and obviously if that's the case then the number of users could increase significantly.  I think based on that assumption it is probably well worth carrying out your suggestion of testing performance if storing in the DB so thanks for the suggestion - it's made me think a little(!) more about the future needs of the app - much appreciated!

    Cheers

    Martin

  •  07-28-2008, 5:42 AM 10391 in reply to 10390

    Re: General advice please: Store docs in database or on filesystem?

    You are very welcome Martin. Database storage would also be beneficial if you do make it a web app, as it prevents you from having to secure the filesystem from direct url access. Usually people tack on a .resources to the filename to prevent IIS from serving it up, but if you store them in the database then you would not have to worry about that. However, the trade off is performance and scalability should always be a concern if you think your application might grow in the future. Glad I could be of some help.

    Regards,

    Scott Schecter
    EntitySpaces | My Site
View as RSS news feed in XML