Project Open Custom E-Mail Notifications - Part 3

In parts 1 and 2, I presented the queries to pull individual workflow steps and closed items out of Project Open. The results of those queries represented the raw data we'd need to send e-mails. Trouble is, every time we'd run the queries, we'd run the risk of sending and resending and resending e-mails. That could become problematic. Folks could unfairly blame Project Open for inundating their inboxes!

Obviously, we need a mechanism to track what e-mails we've already sent. If you remember the queries from the last two articles, you'll remember that the column task_id uniquely identified a given task. Because of how the queries join tables, it appears that the task_id can appear multiple times. But in reality, the task_id uniquely identifies a task. That means we need a mechanism to know for what task_ids we've sent e-mails.

I can think of three realistic ways to approach this. I'm going to present them so you can see the rationale I bring to the table any time I "extend" Project Open (or any other open source project).

  1. Extend Project Open: I could use Concurrent Version System (CVS) to checkout an appropriate version of Project Open and upgrade the SQL, tcl, and other source code to customize the notification subsystem. I could then offer the code back to the project for consideration as part of the code base. In this case, I didn't think this was a good option for two reasons. First, I don't know tcl well enough to inflict my code on the project team. They have families and lives -- they don't need that kind of horror! Second -- and more importantly -- Project Open already has a robust notification system. I just couldn't figure it out. I decided against this option.
  2. Add a column to wf_tasks: I could use an ALTER TABLE statement to add a column directly to wf_tasks. It could be a simple integer, and I could programmatically set the value to 1 after I send the e-mail. This would be simple to implement and use, but I decided against it. I don't like to directly manipulate a table whose structure affects a lot of downstream processes -- processes I don't fully understand. In other words, I'm afraid I'd screw something up, and then I'd be embarrassed and the people who depend on me to keep things running would be cross.
  3. Add an independent table: I could add a table with a column to hold the task_ids for which I've sent e-mails. It's easy to setup and if I keep it small (and well-indexed), it should have no impact on the application of PostgreSQL performance. I chose this option.

Even though I think creating a small table is a safe option, I still approach the task with respect for the overall system. Project Open is a huge application, and one false move on my part could have unintended consequences. I can say that I've used this e-mail solution for some time now, and it appears to have no negative impact on the system.

Once the table's in place, I can adjust the query to show open workflow steps to look like this:


SELECT a.task_id, a.workflow_key, a.transition_key, b.*, c.description, d.transition_name, acs_object__name(e.user_id) AS thename,
f.email
FROM wf_tasks a, wf_cases b, wf_workflows c, wf_transitions d, wf_user_tasks e, parties f
WHERE b.case_id = a.case_id
AND c.workflow_key = a.workflow_key
AND d.workflow_key = a.workflow_key
AND d.transition_key = a.transition_key
AND e.task_id = a.task_id
AND f.party_id = e.user_id
AND a.state = 'enabled'
AND a.task_id NOT IN (SELECT pns_task_id FROM pona_notifications_sent)
ORDER BY task_id DESC

The same query for closed items could look like this:


SELECT a.task_id, a.case_id, b.object_id, b.state, c.description, d.transition_name, f.email
FROM wf_tasks a, wf_cases b, wf_workflows c, wf_transitions d, im_tickets e, parties f
WHERE b.case_id = a.case_id
AND c.workflow_key = a.workflow_key
AND d.workflow_key = a.workflow_key
AND d.transition_key = a.transition_key
AND a.transition_key LIKE '%close%'
AND e.ticket_id = b.object_id
AND f.party_id = e.ticket_customer_contact_id
AND a.task_id NOT IN (SELECT pns_task_id FROM pona_notifications_sent)
ORDER BY task_id DESC

Now, I have four things to do:

  1. Create the table
  2. Create an index
  3. Populate the existing enabled items
  4. Populate the existing closed items

I created the table with this script:


CREATE TABLE pona_notifications_sent
(
pns_task_id INTEGER UNIQUE NOT NULL,
pns_when_sent DATE NOT NULL

);

It's a simple table to minimize the performance impact. I needed at least a column to hold the task_id; I added a date/time column so I had a little diagnostic information if someone complained about missing e-mails.

This script created the index:


CREATE UNIQUE INDEX pona_notifications_sent_ndx01 ON pona_notifications_sent (pns_task_id);

If I didn't have this index, after the first couple hundred thousand e-mails, I'm pretty sure performance (at least for the e-mail application, and potentially for all of PostgreSQL -- table scans are evil!) would perform terribly.

I found out during my first round of Unit Integration Testing (UIT) that I have to first mark the existing workflow tasks and closed items as having been sent. Otherwise, I end up with several hundred e-mails in my test system! I used this query to populate the open workflow steps:


INSERT INTO pona_notifications_sent
SELECT a.task_id, NOW()
FROM wf_tasks a, wf_cases b
WHERE b.case_id = a.case_id
AND a.state = 'enabled';

Finally, I used this query to populate the closed workflow steps:


INSERT INTO pona_notifications_sent
SELECT a.task_id, NOW()
FROM wf_tasks a, wf_cases b
WHERE b.case_id = a.case_id
AND a.transition_key LIKE '%close%'
AND a.task_id NOT IN (SELECT pns_task_id FROM pona_notifications_sent);

I had to add the NOT IN clause because I found. much to my chagrin, that I didn't follow my own advice about better living through standards when I built my early workflows. Some of them had items with an "enabled" status AND a transition_key that contained the word "close". By fashioning the INSERT statement this way, I didn't try to insert a value that violated my unique key index.

At this point, I have all of the Data Manipulation Language (DML) to process e-mail notifications. In the next part of this series, I'll present the program that'll actually send the e-mails.