• last updated 15 hours ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Document the decision of not testing .ppt text extraction anymore

Convert MS Office files to plain text: reuse the approach employed for .pptx to extract text also from .docx and .xlsx, as it is very similar.

Make the test more verbose when there is failure: show the extracted text in this case

Fix test and proc:

- make package_id mandatory for search::dotlrn::get_community_id, as it will just fail when this proc is used under dotlrn otherwise

- fix the test by providing a package_id according to the expected behavior

Test leftover api

Test queuing, dequeuing

Test extra arg api from the search package

file test.docx was initially added on branch oacs-5-10.

file test.ppt was initially added on branch oacs-5-10.

    • binary
    ./test/data/test.ppt
file test.pdf was initially added on branch oacs-5-10.

    • binary
    ./test/data/test.pdf
file test.ott was initially added on branch oacs-5-10.

file test.ots was initially added on branch oacs-5-10.

file test.otp was initially added on branch oacs-5-10.

file test.odt was initially added on branch oacs-5-10.

file test.ods was initially added on branch oacs-5-10.

file test.odp was initially added on branch oacs-5-10.

file test.html was initially added on branch oacs-5-10.

file test.xlsx was initially added on branch oacs-5-10.

file test.xls was initially added on branch oacs-5-10.

    • binary
    ./test/data/test.xls
file test.txt was initially added on branch oacs-5-10.

file test.pptx was initially added on branch oacs-5-10.

Test binary to text conversion of various file types

file search-procs.tcl was initially added on branch oacs-5-10.

    • -0
    • +0
    ./test/search-procs.tcl
file test.doc was initially added on branch oacs-5-10.

    • binary
    ./test/data/test.doc
Whitespace cleanup

Make use of new API "ad_mktmpdir" and "ad_opentmpfile" instead of "ad_tmpnam"

Prefer util::which to retrieve the unzip executable

Complain in the logfile whenever the insertion of the null character is attempted in the syndication table

Implement a conversion from MS pptx to plaintext:

first slides are extracted from the presentation, then everything that is not the content of a text tag is removed.

More targeted sanityzing only on the variables that have a chance to contain the null character