Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
COR-331: Implement MSPPTXStreamDocumentReader using SAXParser

Problem analysis:

* Apache's POI for MS PPTX files provides only in-memory model.

In this model, SAXParser is used too many times (triple the slide number) even to get some meta data information.

It is therefore unsuitable to parse very big files (in terms of slide number).

Fix description:

* Implement a new document reader for PPTX files by reading the stream.

* Get meta data information directly from the corresponding file (core.xml) if this file exists.

* Parse and index text in a certain number of first slides.

  1. … 4 more files in changeset.
JCR-2209: Rely on auto-registration mechanism to reduce the configuration

  1. … 10 more files in changeset.
COR-281 : Can not get properties: Your document contained more than 10240 characters, and so your requested limit has been reached.

  1. … 6 more files in changeset.
COR-280 : Can not get properties: Your document contained more than 10240 characters, and so your requested limit has been reached.

  1. … 6 more files in changeset.
COR-280 : Can not get properties: Your document contained more than 10240 characters, and so your requested limit has been reached.

  1. … 2 more files in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE

  1. … 1 more file in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE

  1. … 1 more file in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE core-2.6.x

  1. … 1 more file in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE core-2.6.x

  1. … 4 more files in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE

  1. … 4 more files in changeset.
EXOJCR-1532: Change the target xsd on xml file

  1. … 29 more files in changeset.
EXOJCR-1465: The namespace used in the configuration files is misspelled 1. Fix added in the kernel in order to support both wrong and correct spelling 2. All the config files under src/main/resources have been fixed

  1. … 19 more files in changeset.
EXOJCR-749: readers map is volatile now; configuration updated

  1. … 1 more file in changeset.
EXOJCR-749: TikaDocumentReaderService and TikaDocumentReader updated

  1. … 6 more files in changeset.
EXOJCR-749: TikaDocumentReader added; tests added

    • -0
    • +162
    ./portal/tika-config.xml
    • -0
    • +4099
    ./portal/tika-mimetype.xml
  1. … 37 more files in changeset.
EXOJCR-886: updating tests and DocumentReaders according to remarks. Implementing property extraction from OOXML (MS 2007) formats.

  1. … 33 more files in changeset.
EXOJCR-886: updating tests and DocumentReaders according to remarks. Implementing property extraction from OOXML (MS 2007) formats.

  1. … 16 more files in changeset.
EXOJCR-886: adding document readers for MS 2007 file formats

  1. … 12 more files in changeset.
EXOJCR-886: adding document readers for MS 2007 file formats

  1. … 12 more files in changeset.
EXOJCR-163 module names changed, projects moved

  1. … 274 more files in changeset.