File Storage Design Document

by Kevin Scaldeferri

I. Essentials

II. Introduction

We have our own file-storage application because we want all users to be able to collaboratively maintain a set of documents. Specifically, users can save their files to our server so that they may:

We want something that is relatively secure, and can be extended and maintained by any ArsDigita programmer, i.e., something that requires only AOLserver TCL and Oracle skills.

In ACS 4, File Storage can be implemented on top of the Content Repository. Thus, there is no data model associated with File Storage. It is only a UI and a small set of TCL and PL/SQL library procedures. The actual storage and versioning is relegated to the Content Repository.

III. Historical Considerations

File Storage was created to provide a mechanism for non-technical users to collaborate on a wide range of documents, with minimum sysadmin overhead. Specifically, it allowed clients to exchange design documents (often MS Word, Adobe PDF, or other proprietary desktop file formats) that changed frequently without having to get bogged down by sifting through multiple versions.

IV. Competitive Analysis

Why is a file-storage application useful?

If you simply give everyone FTP access to a Web-accessible directory, you are running some big security risks. FTP is insecure and passwords are transmitted in the clear. A cracker might sniff a password, upload Perl scripts or ADP pages, then grab those URLs from a Web browser. The cracker is now executing arbitrary code on your server with all the privileges that you've given your Web server.

The File Storage application is not a web-based file system, and can not be fairly compared against such systems. The role of File Storage is to provide a simple web location where users can share a versioned document. It does not allow much functionality with respect to aggregate file administration (ex. selecting all files of a given type, or searching through specified file types).

V. Design Tradeoffs

Folder Permissions

Previous versions of File Storage have not included folder permissions. (However they did have a concept of private group trees.) The reasons for this were to simplify the code and the user experience. However, this system actually caused some confusion (e.g., explicitly granting permission to an outsider on a file in a group's private tree did not actually give that person access to the file) and was not as flexible as people desired. The ACS 4 version includes folder read, write and delete permissions.

Note that this can create some funny results. For example, a user might have write permission on a folder, but not on some of its parent folders. This can cause the select box provided for moving and copying files to look odd or misleading.

Deletion of Files

Previous versions of File Storage allowed only administrators to actually delete content (although users could mark content as "deleted" using a toggle in the data model, deleted_p.) However, the proper use of versioning should allow users to avoid accidentally losing their files. So, in this version, if a person asks to delete a version or a file, we really delete it.

Use of Content Repository

Basing this system on the Content Repository provides a wealth of useful functionality for File Storage with no additional development costs. However, it may also constrain the system somewhat.

Currently, the only example is that cr_revisions does not have a column for the size of the content. This requires that we call dbms_lob.getlength on the content of each revision each time we display information on a file. No timing information on this is currently availible, but subjectively the response is somewhat sluggish. This could be avoided by subtyping content_revision and adding a size column. Alternatively, this might be considered generally valuable enough to warrant a change to the Content Repository data model.

As mentioned in the previous paragraph, we do not subtype content_revision or content_folder. It is possible that this will cause problems in the future. In particular, URL surgery might enable people to do some funny stuff with other items in the Content Repository. However, appropriate use of the permissions system should prevent people from doing anything which they couldn't achieve through other means.

Permissions Design

Permissions were chosen to make as much use as possible of the predefined privileges while keeping the connotative value of each privilege clear. The permissions scheme is vaguely modeled off Unix file permissions, with somewhat less overloading. In particular, we define a delete privilege rather than overloading the write permission. Also, execute privileges have no meaning in this context.

Folder File Version
read view and enter folder view file information view and download version
write add new files / folders upload new versions -----
delete delete folder delete file delete version
admin modify permission grants and read, write and delete privileges

Some notes: the admin privilege implies the read, write and delete privileges. It may be the case that a user has delete permission on a folder or file, but not on some of its child items. This will block attempts to delete the parent item. Finally, the write permission does not have any meaning for versions.

VI. API

For the most part, File Storage will simply use the Content Repository APIs.

PL/SQL API

File Storage is not intended to provide any public PL/SQL APIs. There are two internal PL/SQL functions, get_root_folder and new_root_folder, defined in the file_storage PL/SQL package

TCL API

children_have_permission_p

children_have_permission_p [ -user_id user_id ] item_id privilege
This procedure, given a content item and a privilege, checks to see if there are any children of the item on which the user does not have that privilege.

Switches:
-user_id (optional)

Parameters:
item_id
privilege

fs_context_bar_list

fs_context_bar_list [ -final final ] item_id
Constructs the list to be fed to ad_context_bar appropriate for item_id. If -final is specified, that string will be the last item in the context bar. Otherwise, the name corresponding to item_id will be used.

Switches:
-final (optional)

Parameters:
item_id

fs_file_downloader

fs_file_downloader conn key
Sends the requested file to the user. Note that the path has the original file name, so the browser will have a sensible name if you save the file. Version downloads are supported by looking for the form variable version_id. We don't actually check that the version_id matches the path, we just serve it up.

Parameters:
conn
key

fs_file_p

fs_file_p file_id
Returns 1 if the file_id corresponds to a file in the file-storage system. Returns 0 otherwise.

Parameters:
file_id

fs_folder_p

fs_folder_p folder_id
Returns 1 if the folder_id corresponds to a folder in the file-storage system. Returns 0 otherwise.

Parameters:
folder_id

fs_get_folder_name

fs_get_folder_name folder_id
Returns the name of a folder.

Parameters:
folder_id

fs_maybe_create_new_mime_type

fs_maybe_create_new_mime_type file_name
The content repository expects the MIME type to already be defined when you upload content. We use this procedure to add a new type when we encounter something we haven't seen before.

Parameters:
file_name

fs_root_folder

fs_root_folder [ -package_id package_id ]
Returns the root folder for the file storage system.

Switches:
-package_id (optional)

fs_version_p

fs_version_p version_id
Returns 1 if the version_id corresponds to a version in the file-storage system. Returns 0 otherwise.

Parameters:
version_id

VII. Data Model Discussion

File Storage uses only the Content Repository data model. There is one additional table, fs_root_folders, which maps between package instances and the corresponding root folders in the Content Repository.

VIII. User Interface

The user interface attempts to replicate the file system metaphors familiar to most computer users, with folders containing files. Adding files and folders are hyperlinked options, and a web form is used to handle the search function. Files and folders are presented with size, type, and modification date, alongside hyperlinks to the appropriate actions for a given file. Admin functions will be presented alongside the normal user action when appropriate.

IX. Configuration/Parameters

There is only one configuration parameter in this version of File Storage, the maximum size of uploaded files. All of the other parameters in previous versions have been made obsolete by ACS 4 features like site-nodes and templating.

X. Future Improvements/Areas of Likely Change

XI. Authors

  • System creator:
    3.x : David Hill and Aurelius Prochazka
    4.x : Kevin Scaldeferri
  • System owner
    Kevin Scaldeferri
  • Documentation author
    Kevin Scaldeferri

    XII. Revision History

    Document Revision # Action Taken, Notes When? By Whom?
    0.1 Creation 11/6/2000 Kevin Scaldeferri
    0.2 Revisions and Additions after Implementation 11/15/2000 Kevin Scaldeferri
    0.2 Revised after review by Josh 11/16/2000 Kevin Scaldeferri, Josh Finkler


    kevin@arsdigita.com