Index: openacs-4/packages/file-storage/www/doc/design.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/file-storage/www/doc/design.html,v diff -u -r1.1 -r1.2 --- openacs-4/packages/file-storage/www/doc/design.html 20 Apr 2001 20:51:10 -0000 1.1 +++ openacs-4/packages/file-storage/www/doc/design.html 25 Sep 2001 19:24:01 -0000 1.2 @@ -6,8 +6,11 @@
+ File Storage was created to provide a mechanism for non-technical users to collaborate on a wide range of documents, with minimum sysadmin overhead. Specifically, it allowed clients to exchange @@ -110,23 +114,19 @@ useful functionality for File Storage with no additional development costs. However, it may also constrain the system somewhat. -
Currently, the only example is that cr_revisions
does not
-have a column for the size of the content. This requires that we call
-dbms_lob.getlength
on the content of each revision each
-time we display information on a file. No timing information on this
-is currently availible, but subjectively the response is somewhat
-sluggish. This could be avoided by subtyping content_revision and
-adding a size column. Alternatively, this might be considered
-generally valuable enough to warrant a change to the Content Repository
-data model.
+
+The Content Repository's datamodel has been extended to include an +attibute to store the filesize. Unfortunately, the Content Repository +does not automatically do this, since files may be stored on the +filesystem (the Content Repository thus serving as a catalog to keep +track of file location and some metadata, but not the filesize). The +filesize is therefore calculated whenever a file is inserted in the +Content Repository by the external program (the webserver's database +driver) doing the insertion into the database.. -
As mentioned in the previous paragraph, we do not subtype -content_revision or content_folder. It is possible that this will -cause problems in the future. In particular, URL surgery might enable -people to do some funny stuff with other items in the Content -Repository. However, appropriate use of the permissions system should -prevent people from doing anything which they couldn't achieve through -other means. +
The content_revision is subtyped as a "file-storage-item" to allow +site-wide search to distinguish file storage objects in its search +results. This feature is not implemented yet, however.
For the most part, File Storage will simply use the Content Repository APIs. +
For the most part, File Storage will provide wrappers to the +Content Repository APIs.
File Storage is not intended to provide any public PL/SQL APIs.
-There are two internal PL/SQL functions,
+
+The main objects of File Storage are "folders" and "files". A "folder"
+is analogous to a subdirectory in the Unix/Windows-world filesystem.
+Folder objects are stored as Content Repostory folders, thus folders
+are stored "as is" in the Content Repository.
+
+
+
+"Files", however, can cause some confusion when stored in the Content
+Repository. A "file" in File Storage consists of meta-data, and
+possibly multiple versions of the file's contents. The main meta-data
+of a "file" is its "title", which is stored in the Content
+Repository's "name" attribute of the cr_items table. The "title" of a
+file should be unique within a subdirectory, although a directory may
+contain a file and a folder with the same "title".
+
+
+
+Each version of a file is stored as a revision in cr_revisions table
+of Content Repository. The Content Repository also allows some
+meta-data about a version to be stored in this table, and indeed File
+Storage uses attributes of the cr_revisions table are used. However,
+this is where the confusion is created. The name of the filename
+uploaded from the client's computer, as a version of the file, is
+stored in the "title" attribute of cr_revisions. Note that "title" is
+also used as the (unique within a folder) identifier of the file
+stored in cr_items. Thus, wrappers to the Content Repository API
+makes sure that the naming convention is corect: cr_items.name
+attribute stores the title of a file and all its versions, while the
+cr_revisions.title attribute stores the filename of the version
+uploaded into the Content Repository.
+
+
+
+Meta-data about a version of a file stored in Content Repository are
+the size of the version (stored in cr_revisions.content_length) and
+version notes (stored in cr_revisions.description).
+
+
+
+There are two internal PL/SQL functions that do not call the Content
+Repository API, however:
@@ -356,7 +402,56 @@
package instances and the corresponding root folders in the Content
Repository.
+
+Inserting a row into the table fs_root_folders occurs the first time
+the package instance is visited. The reason is that there is no
+facility in APM to insert a row in the database everytime a package
+instance is created (technically, there is no "on insert" trigger
+imposed by APM on Content Repository, since they are separate packages
+even though they are both part of the core). The solution to this
+deficiency is a bit hack-ish, but seems to be the only solution
+available (unless APM allows trigger functions to be registered, to be
+caled at package instance creation). Whenever the package instance is
+first visited, it calls a PL/SQL function that calculated the "root
+folder" of the File Storage. If this function detects that there is no
+"root folder" yet for this instance (as would be the case when the
+instance is first visited), it inserts the package id and a unique
+folder_id into the fs_root_folder table to serve as the root folder
+identifier. It also inserts meta-data information about this folder
+in cr_items table. Finally, it returns the newly created folder
+identifier as the root folder for this package instance. Subsequent
+visits to the package instance will detect the root folder, and will
+then return the root folder identifier.
+
+
+
+There is an "on delete cascade" constraint imposed on the package_id
+attribute of fs_root_folders. The reason for this is that whenever the
+package instance is deleted by the site administrator, it
+automatically deletes the mapping between APM and the Content
+Repository (i.e, the package identifier and the root folder
+identified), and presumably the particular instance of File Storage.
+Unfortunately this has an undesirable effect. There is no
+corresponding "on delete cascade" on the Content Repository objects so
+that deleting the root folder will cause deletion of everything under
+the root folder. Left on its own, the "on delete cascade" on the
+package identifier attribute of fs_root_folders will cause all objects
+belonging to the instance of File Storage deleted to be orphaned in
+the database, since the root folder is the crucial link from which all
+content is referenced!
+
+
+
+The solution is (hopefully) more elegant: an "before on delete"
+trigger that first cleans up all contents under the root folder
+identifier before the root folder identifier is deleted by APM. This
+trigger walks through all the contents of the instance of File
+Storage, and starts deleting from the "leaves" or end nodes of the
+file tree up to the root folder. Later improvements in Content
+Repository will allow archiving of the contents instaed of actually
+deleting them from the database.
+
@@ -372,12 +467,41 @@
-There is only one configuration parameter in this version of
-File Storage, the maximum size of uploaded files. All of the other
-parameters in previous versions have been made obsolete by ACS 4
-features like site-nodes and templating.
+There are two configuration parameters in this version of File
+Storage. The first parameter MaximumFileSize is the maximum
+size of uploaded files, which should be self-explanatory. The other
+parameter is a flag that indicates to the package whether files are
+stored in the database or in the webserver's filesystem. This second
+parameter StoreFilesInDatabaseP uses the new capability in
+Content Repository to use the Content Repository as a mere catalog to
+store file information while the actual file contents are stored in
+the webserver's filesystem. Note that when files are stored in the
+filesystem, backups of the database will only store the catalog, but
+not the contents. Thus, it is important for the site administrator to
+store the entire directory containing the Content Repository files (in
+particular, pageroot/content-repository-content-files) when
+storing files in the fiesystem.
+
+
+When a file is stored in the Content Repository, it first queries the
+parameter StoreFilesInDatabaseP to determine how the new file
+will be stored. Thus, it is important that this parameter should be
+changed only at package instance creation, or before any operation
+that uploads a file into Content Repository. Otherwise, the package
+instance will have files of different storage types, depending on the
+value of the parameter at the time the file is uploaded. Although all
+functionality provided by File Storage will continue to work (copy,
+move, delete, etc.), backing up the contents will be more complicated
+if the parameter is changed.
+
+
+
+All of the other parameters in previous versions have been made
+obsolete by ACS 4 features like site-nodes and templating.
+
+
get_root_folder
-and new_root_folder
, defined in the File Storage provides public PL/SQL APIs either as wrappers to the
+Content Repository API, or more involved functions that calls multiple
+Content Repository PL/SQP functions. One reason for doing this is to
+abstract from the Content Repository datamodel and naming conventions,
+due to the different way File Storage labels its objects.
+
+get_root_folder
and
+new_root_folder
, defined in the file_storage
PL/SQL package
@@ -346,8 +394,6 @@
-
-
VII. Data Model Discussion
VIII. User Interface
IX. Configuration/Parameters
X. Future Improvements/Areas of Likely Change