• last updated 1 hour ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
make naming more consistent

differentiate between sent and received intra-server messages

added multiple delivery methods to intra-server talk

Here is some background information for my experiments with the delivery methods.

For this experiment, I compared 5 different means for this kind of

communications

- ns_http over HTTP (the standard setup, which is used in OpenACS 5.10)

- ns_http over HTTPS

- ns_conn over HTTP using persistent connections

- ns_conn over HTTPS using persistent connections

- ns_udp using UDP

 

I tested the is in 2-node cluster to make measurements simple consisting

of the canonical server and one node listening on the following protocols/ports:

- http://127.0.0.1:8101

- https://127.0.0.1:8444

- udp://127.0.0.1:8101

The first test sends per call 1000 intra-server commands from the canonical server

to the 2nd node over the various delivery methods:

set times 1000

lappend _ ns_http-[time {::acs::CS_127.0.0.1_8101 message set x ns_http} $times]

lappend _ ns_https-[time {::acs::CS_127.0.0.1_8444 message set x ns_https} $times]

lappend _ ns_connchan-http-[time {::acs::CS_127.0.0.1_8101 message -delivery connchan set x ns_http} $times]

lappend _ ns_connchan-https-[time {::acs::CS_127.0.0.1_8444 message -delivery connchan set x ns_https} $times]

lappend _ ns_udp-[time {::acs::CS_127.0.0.1_8101 message -delivery udp set x udp} $times]

join $_ \n

This leads to the following results:

ns_http 564.027083 microseconds per iteration

ns_https 1483.478916 microseconds per iteration

ns_connchan-http 147.688541 microseconds per iteration

ns_connchan-https 68.480875 microseconds per iteration

ns_udp 198.343416 microseconds per iteration

Since the commands are sent in sequence, the variant with the

persistent HTTP connection is the fastest, although this is Tcl

implemented. The slowest is the version with HTTPS via ns_http without

persistent connections. We see a factor of 20 in terms of performance.

When using ns_udp with the "-noreply" option, we have would have

a "fire and forget" solution, which might be ok when the packet loss

rate is low. That would lead to 54 microseconds.

Clearly, the numbers for persistent connections look the best, but it has

as well some disadvantages compared to other solutions:

- the server has to keep a socket open to every node (but no

connection thread)

- the keepalive setting of the server must set sufficiently long to

gain advantage of persistent connections (e.g. 5 sec keepalive,

heart beat frequency of 1s)

- Since the whole communication goes over a single connection, it is

necessary to serialize the requests to avoid that multiple

connection threads write concurrently to the same connection and

interfere with each other

- It is probably necessary to have a separate thread handling the

outgoing intra-server talk (implementing cmd queuing,

async-handling, heart-beat, etc.). Since this has to be a Tcl-thread

it will use up some memory (similar to a connection thread).

- This intra-server talk thread requires queuing and event handling we

have so far just in xotcl-core, so when implemented, it will require

the xotcl-core package (maybe this can be put later to acs-core).

As a second experiment, I've implemented a simple heart-beat service

inside the request monitor that checks the liveliness of the nodes

every second. So, in contrary to the back to back commands of the

first experiment, these are single calls. Here are some random

values for the 5 delivery methods:

[27/Dec/2022:20:29:34.171376][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_http sent total 2.907ms

[27/Dec/2022:20:29:34.182241][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x ns_https sent total 10.798ms

[27/Dec/2022:20:29:34.183475][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_connchan sent total 1.161m

[27/Dec/2022:20:29:34.183657][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x https-connchan sent total 0.086ms

[27/Dec/2022:20:29:34.188564][::throttle] Notice: -cluster: udp://127.0.0.1:8101 set x udp sent total 4.861ms

[27/Dec/2022:20:30:25.494080][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_http sent total 2.049ms

[27/Dec/2022:20:30:25.516306][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x ns_https sent total 21.903ms

[27/Dec/2022:20:30:25.517239][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_connchan sent total 0.814ms

[27/Dec/2022:20:30:25.522957][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x https-connchan sent total 0.33ms

[27/Dec/2022:20:30:25.534274][::throttle] Notice: -cluster: udp://127.0.0.1:8101 set x udp sent total 11.099ms

[27/Dec/2022:20:31:54.993455][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_http sent total 2.431ms

[27/Dec/2022:20:31:55.003036][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x ns_https sent total 9.499ms

[27/Dec/2022:20:31:55.010100][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_connchan sent total 6.981ms

[27/Dec/2022:20:31:55.010585][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x https-connchan sent total 0.322ms

[27/Dec/2022:20:31:55.017764][::throttle] Notice: -cluster: udp://127.0.0.1:8101 set x udp sent total 7.13ms

We see in essence the same pattern. The approach with the persistent

connections looks here the best as well. It is not clear to me, why

HTTPS over connchan is the best, but the communication seems ok. Maybe

some buffering/nagle algorithm is responsible for this. We see as well

that the round-trip takes typically single to double-digit

milliseconds. So when a single HTTP request to nsd triggers multiple

cache-flush operations to multiple nodes, this will take some

time. When e.g., the request issues 5 cash-flush operations, which are

sent to 5 nodes, and every request with take 1ms, the cache flushing

will make the original request about 25ms slower. This might also be

an argument for a separate thread doing these operations

asynchronously.

improved clusterwide operations

improved support for cluster-wide operations

  1. … 2 more files in changeset.
added cluster-awareness

NOTE: one should use here the acs-cluster infrastructure instead to

achieve better scalability and easier cache maintenance

update comments

improve robustness

Cleanup duplicated line

make parsing more robust

Set "softrecreate" in nsf o avoid full cleanup on redefinition of classes

fix typo

    • -1
    • +1
    ./test/application-data-link-procs.tcl
New package parameter for acs-tcl: DbLogMinDuration

This parameter can be used to adjust the time threshold for longdb

warnings in the log file. When SQL logging is turned on, it might

also adjust the threshold, unless it is already more sensitive.

Bump version number to 5.10.1d25.

  1. … 1 more file in changeset.
improve spelling

  1. … 4 more files in changeset.
improve resolution in fallback case

Moved the experimental disk-cache from utilities to the other cacheing infrastructure

Make use of new API "ad_mktmpdir" and "ad_opentmpfile" instead of "ad_tmpnam"

    • -3
    • +3
    ./test/application-data-link-procs.tcl
  1. … 1 more file in changeset.
New API "ad_mktmpdir" and "ad_opentmpfile"

Since "ns_mktemp" is deprecated (on the C level) and is prone

to vulnerabilities. This effects as well "ad_tmpnam" in OpenACS,

which uses "ns_mktemp".

Newer C-compilers complain about this more loudly:

Due to security concerns inherent in the design of mktemp(3),

it is highly recommended that you use mkstemp(3) instead.

The security concern is that when ns_mktemp() is used to generate a

(unique) file name, which is used for opening a file, an attacker can

intercept the running binary and sneak in a different file. Although

ns_mktemp() guarantees to return a unique file name, there is no

mechanism to prevent another process or an attacker from creating a

file with the same name before the application attempts to open it.

The problem with using mkstemp() instead is that it has different

semantics, since it returns the open file. So one cannot blindly

replace these calls, but it requires some refactoring. Unfortunately,

this also effects application code, since NaviServer offers

"ns_mktemp" on the Tcl level.

To make it short: one has to separate out different use_cases of

"ad_tmpnam":

(a) use it to obtain a name for creating a file, which is subsequently opened

(b) use it to obtain a name for creating a directory

(c) use it as a name, providing name as a unique name to some external programs.

Case (a) is similar to the "mkstemp(3)" recommendation above. For this

usage scenario, the call "file tmpfile..." in Tcl 8.6 can be used (but

it should also respect the configured tmp directory. This function

is also very similar to "ns_opentempdir" in NaviServer, which uses

as well "file tmpfile". Therefore, we have created a new API call

"ad_opentmpdir ..." which respects the OpenACS settings.

Case (b) can be addressed by "file tempdir" in Tcl 8.7, or by a function

in tcllib. The new API function "ad_mktmpdir" provides respects the

OpenACS settings, and works for Tcl 8.6 or newer.

Case (c) is somewhat different, since it just wants to create a unique name. This case has not received a special API so far

  1. … 1 more file in changeset.
Bugfix: Get database name from postgres connection string keywords

The connection string can be provided for libpq either as the plain

dbname (old style) or as a list of keywords (new style).

In the latter case, the full keyword string was returned

instead of the dbname.

fixed typo (many thanks to Franz Penz for spotting this)

Replaced "ns_mktemp" by "ad_tmpnam" to ease code maintenance

The underlying C library API for "ns_mktemp" is deprecated

for security reasons, we will have to do something about it.

    • -3
    • +3
    ./test/application-data-link-procs.tcl
  1. … 8 more files in changeset.
document right changes on new installs for api-doc

make behavior more robust when (erronously) called without a connection

without handling no-connection, the error message is swallowed

improve Oracle compatibility

Minor CSP improvements

- provided ability to add "trusted-types" and "require-trusted-types-for"

directives (Trusted Types policies)

For details, see:

https://blog.bitsrc.io/trusted-types-api-for-javascript-dom-security-fcdafa927e73

- changed default "object-src" from 'self' to 'none'

improve Oracle compatibility

Deescalation: the usage of the pairs in export_vars is not so dangerous as it looked at first sight.

The problem case was originating from the call

lappend __vars [lindex $_var 0] [uplevel subst [lindex $_var 1]]

which calls Tcl's "uplevel" with two arguments. In this case, the arguments

are concatenated and the evaluated in the caller's frame. There is a substitution

before the evaluation. When just one argument is passed in, this problem there

is only one evaluation:

lappend __vars [lindex $_var 0] [uplevel [list subst [lindex $_var 1]]]

  1. … 1 more file in changeset.
added warning to export_vars

added optional parameter "-timeout" to "CACHE eval ..." method

make ad_sanitize_filename more robust to filenames with parentheses + extend automated tests