• last updated 4 hours ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Convert MS Office files to plain text: reuse the approach employed for .pptx to extract text also from .docx and .xlsx, as it is very similar.

Make the test more verbose when there is failure: show the extracted text in this case

Fix test and proc:

- make package_id mandatory for search::dotlrn::get_community_id, as it will just fail when this proc is used under dotlrn otherwise

- fix the test by providing a package_id according to the expected behavior

Test leftover api

Test queuing, dequeuing

Test extra arg api from the search package

file test.docx was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.docx
file test.ppt was initially added on branch oacs-5-10.

    • binary
    ./tcl/test/data/test.ppt
file test.pdf was initially added on branch oacs-5-10.

    • binary
    ./tcl/test/data/test.pdf
file test.ott was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.ott
file test.ots was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.ots
file test.otp was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.otp
file test.odt was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.odt
file test.ods was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.ods
file test.odp was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.odp
file test.html was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.html
file test.xlsx was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.xlsx
file test.xls was initially added on branch oacs-5-10.

    • binary
    ./tcl/test/data/test.xls
file test.txt was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.txt
file test.pptx was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/data/test.pptx
Test binary to text conversion of various file types

file search-procs.tcl was initially added on branch oacs-5-10.

    • -0
    • +0
    ./tcl/test/search-procs.tcl
file test.doc was initially added on branch oacs-5-10.

    • binary
    ./tcl/test/data/test.doc
Whitespace cleanup

Make use of new API "ad_mktmpdir" and "ad_opentmpfile" instead of "ad_tmpnam"

Prefer util::which to retrieve the unzip executable

Complain in the logfile whenever the insertion of the null character is attempted in the syndication table

Implement a conversion from MS pptx to plaintext:

first slides are extracted from the presentation, then everything that is not the content of a text tag is removed.

More targeted sanityzing only on the variables that have a chance to contain the null character

Translate potential null characters in the syndication content with the empty string, so that we do not risk to try (and fail) to insert them in the database