Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
MODE-2081 Changed the license for ModeShape code to ASL 2.0.

  1. … 559 more files in changeset.
MODE-2022 - Updated the Tika text extractor to ignore audio/video/image files by default and log a specific error in case of a NoClassDefFound.

  1. … 6 more files in changeset.
MODE-1561 - Added the writeLimit parameter to the TikaTextExtractor.

  1. … 6 more files in changeset.
MODE-1561 - Added the writeLimit parameter to the TikaTextExtractor.

  1. … 6 more files in changeset.
MODE-1527 - Updated the text extraction process to be triggered preemptively by the binary storage, when a binary value is created.

For this to be possible, the context of the extractor cannot contain any node-specific information. Also, this exposed an issue with the SharedLockingInputStream: if the stream is closed in the "read" methods, Tika's parsers will keep reading it over and over (effectively reopening it each time) either causing OOM errors or duplicate text. This means the "close" call from the read methods has been removed.

  1. … 13 more files in changeset.
MODE-1527 - Updated the text extraction process to be asynchronous with the indexing process.

This meant that a few changes were needed:

- the text extractors configuration has been updated to resemble that of the sequencers

- the binary store interface has been updated to be able to store and retrieve extracted text for a given binary (source) value

- the TextExtractors class was changed to become the entry point into text extraction

  1. … 41 more files in changeset.
MODE-1527- Migrated initial version of the text extractors from 2.x and updated the binary store to extract the text and mime-type of binary values

Working on this, exposed how fragile - lock-wise - is working with the SharedLockingInputStream (FileSystemBinaryStore). Therefore, I've updated the mime-type detection so that mark & reset are avoided as much as possible, also making sure that streams are closed after each detector finishes with them.

The Tika version was bumped to 1.1 which required also the update of the POI version to 3.8.

  1. … 59 more files in changeset.