Let's say you're a developer making a package for OpenACS. +You've heard statements like, "every package should use the content +repository", or maybe a developer has suggested that you use it. Or +maybe you just stumbled across it. Why would you want to spend your +time reading this document and wasting a good afternoon when you +could get started coding right away?
The simple answer is that the content repository (CR) gives you +many different things for free:
For the sake of an example, I'm going to use a Tasks +application. This application will keep track of all the tasks of +an organization, deadlines for those tasks, and who needs to work +on them.
The reason I might be interested in using the content repository +(CR) in this case is that I can keep track of all the changes to a +particular Task, so that if someone changes the deadline on an +item, I can see what the original deadline was. In addition, I as a +developer would like to have sub-tasks, and sub-sub-tasks, so I can +have a whole hierarchy of things that need to be accomplished. Big +tasks can be sub-divided so that several people can each do their +particular parts.
So I decide to create a Tasks table. Each of these Tasks has +various information associated with it, such as deadlines, +descriptions of what needs to be accomplished, and so on:
+Task + Title + Description + Task Number + +
First of all, let's get some terminology out of the way. Columns +of a table are referred to as attributes in content +repository-speak.
+The steps to set up your data model are as follows: +The first step is to decide on what part of a Task you'd you'd +like to have under revision control, and what portion you'd like to +have just one version of. In our case, the only thing we wouldn't +want under version control is the Task Number. This will be a +unique identifier for the task, and we don't want that changing +every time someone edits it.
For our simple example:
+ Title - want versions + Description - want versions + Task Number - do NOT want versions + +
You will have two tables: one with versioned attributes, and one +without versioned attributes.
+Convention: often, developers will name the first table +by what it is (in my case pm_tasks), and the second, +versioned table by the same name, but with _revisions at the end. +Thus, I'll name my second table pm_tasks_revisions.
+This is actually very easy: +Versioned portion:
+ create table pm_tasks_revisions ( + task_revision_id + integer + constraint pm_tasks_revisions_id_pk + primary key + constraint pm_tasks_revisions_id_fk + references cr_revisions(revision_id) + on delete cascade, + title + varchar(100), + description + varchar(4000) + ); + +
Unversioned portion:
+ create table pm_tasks ( + task_id + integer + constraint pm_tasks_id_pk + primary key + constraint pm_tasks_id_fk + references cr_items(item_id) + on delete cascade, + task_number + integer + ) + +
One thing you have to be careful of when creating these tables
+is that there are no columns that have the same names as any of the
+columns in the cr_items
and cr_revisions
+tables. For example, you can't call you key on the
+pm_tasks_revisions table revision_id
. Why? There are
+some views that are automatically generated that combine these
+tables for you, but they won't be created if the names conflict.
+I'll describe what these views are later, but they are useful. You
+were warned.
Notice that each table uses as its primary key a reference to
+either the cr_revisions
table or the
+cr_items
table. A content item is basically
+just some content: either text or binary data. The contents
+revisions table keeps track of which version from the
+tasks_revisions table is the most current, and which one is
+live.
All this is going inside the
+sql/postgresql/project-manager-create.sql
file. Your
+name will be different of course.
After we've created the two tables, we need to let the content +repository know that we have a new type of structured data that we +are storing in the content repository. Tasks are a "content type", +because they have data associated with them, such as when they are +due, and what needs to be done.
I thus need to to
+ --create the content type + select content_type__create_type ( + 'pm_task', -- content_type + 'content_revision', -- not sure what this is + 'Task', -- pretty_name + 'Tasks', -- pretty_plural + 'pm_tasks_revisions', -- table name + 'task_id', -- id_column + 'content_revision.revision_name' + ); + +
You then need to add in all the attributes, so that the content +repository can do some magic things behind the scenes. The content +repository doesn't know about what's inside of the pm_tasks +and pm_tasks_revisions tables, so we teach it:
+ -- add in attributes + + select content_type__create_attribute ( + 'pm_task', -- content_type + 'start_date', -- attribute_name + 'date', -- datatype (string, number, boolean, date, keyword, integer) + 'Start date', -- pretty_name + 'Start dates', -- pretty_plural + null, -- sort_order + null, -- default value + 'timestamptz' -- column_spec + ); + + select content_type__create_attribute ( + 'pm_task', -- content_type + 'end_date', -- attribute_name + 'date', -- datatype + 'End date', -- pretty_name + 'End dates', -- pretty_plural + null, -- sort_order + null, -- default value + 'timestamptz' -- column_spec + ); + + select content_type__create_attribute ( + 'pm_task', -- content_type + 'percent_complete', -- attribute_name + 'number', -- datatype + 'Percent complete', -- pretty_name + 'Percents complete', -- pretty_plural + null, -- sort_order + null, -- default value + 'numeric' -- column_spec + ); + +
+Side effect: once you've created the content type, the
+content repository creates a view for you called
+pm_tasks_revisionsx
. Note the x at the end of the
+name. If you're using Postgres, I believe it will also create a
+view for you called pm_tasks_revisionsi
+
Why are these two views created? the x view is created for +selection, and the i view is created for inserts. They join the +acs_objects, cr_revisions, and our pm_tasks_revisions tables +together. Try viewing them to get an idea of how they might be +useful.
It is also possible to dynamically create tables, and extend
+them with extra columns. You could do this by using create
+table
or alter table add column
statements in
+SQL, but this also adds in some meta-data that will be useful to
+you. The disadvantage is that you have to call the content
+repository API. The advantage is that someday you'll be able to do
+really cool stuff with it, like automatically generate interfaces
+that take advantage of the new columns and tables you've added.
+Another nice thing is that all that messy business of defining your
+attributes through the API is taken care of.
+Types is the content repository are another term for
+tables, although that doesn't explain it completely. Types are also
+kept track of within OpenACS, in the acs_object_types
+table, so the system knows about the tables you create, and can do
+some intelligent things with them.
A lot of the intelligent things you can do with this +information is still being built. But imagine for example that you +are using the project manager package I've written. You work at an +ice cream company, and every task that is done also has an +associated ice cream flavor with it (yeah, this isn't a good +example, but pay attention anyway). If I've written the project +manager to take advantage of it, when you add in this extra +attribute to the pm_tasks_revisions table, the UI aspects will be +automatically taken care of. You'll be able to select a flavor when +you edit a task, and it will be shown on the task view page. This +is the direction OpenACS development is going, and it will be +really really cool!
First, I'm going to describe how to extend other content +repository tables using the CR API. Then, I'll describe how to set +up your own tables as well:
As you recall from earlier in this page, attributes are just
+another term for columns in a table. The Content Repository has a
+mechanism for adding and removing columns via the pl/sql API. If
+you check your /api-doc:
+/api-doc/plsql-subprogram-one?type=FUNCTION&name=content%5ftype%5f%5fcreate%5fattribute
+, you'll see that there is a way to extend the columns
+programmatically.
Why would you want to do this? For project manager, I decided to +do this because I wanted to customize my local version of the +projects table, to account for company-specific information. That +way, I can have a separate edit page for those types, but not have +a separate table to join against.
+. Instead of doing this: ++I can do this: ++alter table pm_projects add column + target_date date; +
+A very important advantage of this method is that it recreates all +the views associated with the pm_projects table, like pm_projectsx. +If I did an alter table statement, all the views would not contain +the new column. Note that I believe you CAN create foreign key +constraints, by putting them in the column spec (the last column): ++select content_type__create_attribute( + 'pm_project', + 'target_date', + 'date', + 'Target date', + 'Target dates', + null, + null, + 'date' +); +
+I have no idea of whether or not that is supposed to be legal, but +I believe it works. Jun was the one who originally talked about +the +possibility of storing all the revisioned columns in a generic +table. ++select content_type__create_attribute( + 'pm_project', + 'company_id', + 'integer', + 'Company', + 'Companies', + null, + null, + 'integer constraint pm_project_comp_fk references organizations' +); +
pm_task__new_task_item
function (which we'll
+write in a little bit). This function creates both a new content
+item, and a new content revision. Information is actually stored in
+four tables, believe it or not: cr_revisions
,
+cr_items
, pm_tasks
, and
+pm_tasks_revisions
. The task number is stored in
+pm_tasks, the title and description are stored in
+pm_tasks_revisions, and some additional information like who
+entered the information is stored in cr_revisions and cr_items.
+Whenever you make a change to this item, you don't change the table
+yourself, but add a revision, using your
+pm_task__new_task_revision
function (which we'll write
+in a little bit). This function adds another revision, but
+not another item or cr_item. After you've added another
+revision, you'll have two revisions and one item. Two entries in
+cr_revisions (and pm_tasks_revisions), and one item in cr_items and
+pm_tasks. The cr_revisions table keeps track of which item is the
+most recent, and which item is "live". For the edit-this-page
+application, for example, this is used to keep track of which
+revision to a page is actually being served to users. In your code,
+you'll use your pm_tasks_revisionsx view, which joins the
+pm_tasks_revisions table with the cr_revisions table (and it might
+even join in cr_items -- I forget at the moment).
++select define_function_args('pm_task__new_task_item', 'task_id, project_id, title, description, end_date, percent_complete, estimated_hours_work, estimated_hours_work_min, estimated_hours_work_max, creation_date, creation_user, creation_ip, package_id'); + +create function pm_task__new_task_item ( + p_task_id integer, + p_project_id integer, + p_title varchar, + p_description varchar, + p_end_date timestamptz, + p_percent_complete numeric, + p_estimated_hours_work numeric, + p_estimated_hours_work_min numeric, + p_estimated_hours_work_max numeric, + p_creation_date timestamptz, + p_creation_user integer, + p_creation_ip varchar, + p_package_id integer +) returns integer +as $$ +declare + v_item_id cr_items.item_id%TYPE; + v_revision_id cr_revisions.revision_id%TYPE; + v_id cr_items.item_id%TYPE; + v_task_number integer; +begin + select acs_object_id_seq.nextval into v_id from dual; + + -- We want to put the task under the project item + + -- create the task_number + + v_item_id := content_item__new ( + v_id::varchar, -- name + p_project_id, -- parent_id + v_id, -- item_id + null, -- locale + now(), -- creation_date + p_creation_user, -- creation_user + p_package_id, -- context_id + p_creation_ip, -- creation_ip + 'content_item', -- item_subtype + 'pm_task', -- content_type + p_title, -- title + p_description, -- description + 'text/plain', -- mime_type + null, -- nls_language + null -- data + ); + + v_revision_id := content_revision__new ( + p_title, -- title + p_description, -- description + now(), -- publish_date + 'text/plain', -- mime_type + NULL, -- nls_language + NULL, -- data + v_item_id, -- item_id + NULL, -- revision_id + now(), -- creation_date + p_creation_user, -- creation_user + p_creation_ip -- creation_ip + ); + + PERFORM content_item__set_live_revision (v_revision_id); + + insert into pm_tasks ( + task_id, task_number) + values ( + v_item_id, v_task_number); + + insert into pm_tasks_revisions ( + task_revision_id, end_date, percent_complete, estimated_hours_work, estimated_hours_work_min, estimated_hours_work_max, actual_hours_worked) + values ( + v_revision_id, p_end_date, p_percent_complete, p_estimated_hours_work, p_estimated_hours_work_min, p_estimated_hours_work_max, '0'); + + PERFORM acs_permission__grant_permission( + v_revision_id, + p_creation_user, + 'admin' + ); + + return v_revision_id; +end; +$$ language plpgsql; + + +select define_function_args('pm_task__new_task_revision', 'task_id, project_id, title, description, end_date, percent_complete, estimated_hours_work, estimated_hours_work_min, estimated_hours_work_max, actual_hours_worked, creation_date, creation_user, creation_ip, package_id'); + +create function pm_task__new_task_revision ( + p_task_id integer, -- the item_id + p_project_id integer, + p_title varchar, + p_description varchar, + p_end_date timestamptz, + p_percent_complete numeric, + p_estimated_hours_work numeric, + p_estimated_hours_work_min numeric, + p_estimated_hours_work_max numeric, + p_actual_hours_worked numeric, + p_creation_date timestamptz, + p_creation_user integer, + p_creation_ip varchar, + p_package_id integer +) returns integer +as $$ +declare + v_revision_id cr_revisions.revision_id%TYPE; + v_id cr_items.item_id%TYPE; +begin + select acs_object_id_seq.nextval into v_id from dual; + + -- We want to put the task under the project item + + v_revision_id := content_revision__new ( + p_title, -- title + p_description, -- description + now(), -- publish_date + 'text/plain', -- mime_type + NULL, -- nls_language + NULL, -- data + p_task_id, -- item_id + NULL, -- revision_id + now(), -- creation_date + p_creation_user, -- creation_user + p_creation_ip -- creation_ip + ); + + PERFORM content_item__set_live_revision (v_revision_id); + + insert into pm_tasks_revisions ( + task_revision_id, end_date, percent_complete, estimated_hours_work, estimated_hours_work_min, estimated_hours_work_max, actual_hours_worked) + values ( + v_revision_id, p_end_date, p_percent_complete, p_estimated_hours_work, p_estimated_hours_work_min, p_estimated_hours_work_max, p_actual_hours_worked); + + PERFORM acs_permission__grant_permission( + v_revision_id, + p_creation_user, + 'admin' + ); + + return v_revision_id; +end; +$$ language plpgsql; + + +-- The delete function deletes a record and all related overhead. + +select define_function_args('pm_task__delete_task_item', 'task_id'); + +create or replace function pm_task__delete_task_item (p_task_id integer) +returns integer as $$ +declare +begin + delete from pm_tasks_revisions + where task_revision_id in (select revision_id from pm_tasks_revisionsx where item_id = p_task_id); + + delete from pm_tasks + where task_id = p_task_id; + + raise NOTICE 'Deleting pm_task...'; + + PERFORM content_item__delete(p_task_id); + return 0; +end; +$$ language plpgsql; +
+item_id - unique id for this item, will be +different than the revision_id+cr_revisions: +
parent_id - used to group items into a hierarchy (see +below)
name - this is used to make a URL by the content repository. +It must be unique per content folder. You can use a number, or +something like project_231. One way to do this is to set it equal +to a title plus the item_id.
locale - not sure, probably for internationalization +support
live_revision - this is equal to the cr_revision table's +revision_id that is the live version
latest_revision - this is equal to the cr_revision table's +revision_id that is the latest version
publish_status - not sure
content_type - not sure
storage_type - not sure, probably text or binary?
storage_area_key - not sure
tree_sortkey - a utility column used in hierarchical +queries.
+
+revision_id - a unique id for this revision.
item_id - a reference to the item_id for this revision
title - you can use this for your application. For example, +My Big Project
description - you can use this for your application, as a +longer description.
publish_date - the date this was published. Not sure if this +is for your use, or internal
mime_type - the mime type.
nls_language - I believe this is for +internationalization
lob - the binary content.
content - the text content.
content_length - the length of the text or binary +content?
+
Using this structure is optional, but useful in many +circumstances.
The facility for this is built into the cr_items
+data model. This makes sense, because you wouldn't want your
+hierarchy associated with each revision. Here's how Postgres
+describes the cr_items
table:
+The+ Table "public.cr_items" + Column | Type | Modifiers +------------------+------------------------+----------------------------- + item_id | integer | not null + parent_id | integer | not null + name | character varying(400) | not null + locale | character varying(4) | + live_revision | integer | + latest_revision | integer | + publish_status | character varying(40) | + content_type | character varying(100) | + storage_type | character varying(10) | not null default 'text' + storage_area_key | character varying(100) | not null default 'CR_FILES' + tree_sortkey | bit varying | +
parent_id
refers to either a content item
+(cr_items
), or a subclass of a content_item (such as
+cr_folders
). I'll explain more later about
+cr_folders
.
+One thing that you might want to do for your application is to +give the application its own root directory. Because the content +repository is shared among applications, this separates it off from +other applications. They can still use the items in your +application, but it must be a more deliberate process. If you don't +create your own root directory, you may see strange-looking data +from other applications in your application, or see your +application's data in other applications. There are times when +you'll want to do this, but probably not until you're much more +familiar with the content repository. Another reason for creating +your own root repository is that you application may be mounted +several times. If you want to separate the directory structure +between instances of your application, you need to create your own +root directory:
+Note that this example is for projects rather than tasks. This is +because for the application I'm writing, projects are what tasks +are stored inside of. A project has many component tasks. If you +were writing another application, or if I wasn't doing anythign +with projects, then this would be creating a folder for just tasks. ++-- Creates and returns a unique name for new project folders + +select define_function_args('pm_project__new_unique_name', 'package_id'); + +create function pm_project__new_unique_name (p_package_id integer) +returns text as $$ +declare + v_name cr_items.name%TYPE; + v_package_key apm_packages.package_key%TYPE; + v_id integer; +begin + select package_key into v_package_key from apm_packages + where package_id = p_package_id; + + select acs_object_id_seq.nextval into v_id from dual; + + -- Set the name + select v_package_key || '_' || + to_char(current_timestamp, 'YYYYMMDD') || '_' || + v_id into v_name; + + return v_name; +end; +$$ language plpgsql; + + +select define_function_args('pm_project__new_root_folder', 'package_id'); + +create function pm_project__new_root_folder (p_package_id integer) +returns integer as $$ +declare + v_folder_id cr_folders.folder_id%TYPE; + v_folder_name cr_items.name%TYPE; +begin + -- Set the folder name + v_folder_name := pm_project__new_unique_name (p_package_id); + + v_folder_id := content_folder__new ( + v_folder_name, -- name + 'Projects', -- label + 'Project Repository', -- description + p_package_id -- parent_id + ); + + -- Register the standard content types + PERFORM content_folder__register_content_type ( + v_folder_id, -- folder_id + 'pm_project', -- content_type + 'f' -- include_subtypes + ); + + -- TODO: Handle Permissions here for this folder. + + return v_folder_id; +end; +$$ language plpgsql; +
Typically, this definition would go in your
+sql/postgresql/project-manager-create.sql
file. If
+this file is broken in several parts, this would go in the
+project-manager-create-functions.sql portion.
Once you've created your root directory, you will set the
+parent_id
of your items to the id for the new root
+repository (in our case, it's returned from the
+pm_project__new_root_folder function
)
In the project-manager application, we'll create a root
+repository, and make all projects under that root repository. That
+means they'll all have a parent_id
set to the root
+repository. However, we also want to make projects that are
+sub-projects of other projects. In that case, we will set the
+parent_id
of the sub-project to the
+item_id
of the parent.
cr_items
, and the only real difference is that they
+contain no data, except for a label and description.
+If you create folders for your application, then you'll need to +make sure you manage them along with your other objects. For +example, if you were to add a folder for each of your objects, then +you would probably want to make sure you delete the folder when you +delete the object.
However, in many cases you are not creating more than one +folder. In fact, the only folder you might have will be the root +folder you create for each instance of your application (if you +install the project-manager in two parts of your web server, for +example, it should have two different root folders). When your +application is running, it can determine the root folder by +searching the cr_folders table. Here's the definition of that +table:
+Note that there is a+ Table "public.cr_folders" + Column | Type | Modifiers +--------------------+-------------------------+------------- + folder_id | integer | not null + label | character varying(1000) | + description | text | + has_child_folders | boolean | default 'f' + has_child_symlinks | boolean | default 'f' + package_id | integer | +
package_id
column. The nice thing
+about this column is that you can use it to find the root
+repository, if you only have one folder per instance of your
+application. You can get your package_id using this call within
+your .tcl file:
++Then you can find the root repository by using a query like this: ++set package_id [ad_conn package_id] +
+select folder_id from cr_folders where package_id = :package_id; +
Get latest revision (Tcl) | set live_revision_id [db_exec_plsql get_live_revision "select +content_item__get_live_revision(:item_id)"] | +
---|---|
Get latest revision (pl/sql) | live_revision_id := +content_item__get_live_revision(:item_id); | +
The item_id identifies the content item with which the revision +is associated.
Likewise, the most recent revision of a content item can be +obtained with the content_item__get_latest_revision function
cr_revisions
table, and a table that
+you set up to store specialized data. In this case, the title (I
+think), byline, and publication date would be stored in a
+specialized table.+The problem seems to be that there were still items in the ++Referential Integrity: attempting to delete live_revision: 658 +
cr_items
table. You can remove them using select
+content_item__delete(648);
in psql. You get the codes by
+doing a query like this:
++Really, however, what you need to do is make sure your __delete and +drop scripts first go through and delete all children of those +items. I'm not sure if you need to delete the items themselves -- I +believe they may be dropped by themselves when the tables are +dropped, because of the+select i.item_id, r.revision_id, r.title, i.content_type from cr_items i, cr_revisions r where i.item_id = r.item_id order by i.item_id, r.revision_id; +
cascade
portion of the SQL
+data model.
+When I was troubleshooting folders, I found this query +useful:
+select f.folder_id,f.label,f.description,i.content_type from cr_folders f, cr_items i where f.folder_id = i.item_id; +
Once again, thanks to daveb for help in tracking this down (he +rocks!).
+