Clone Tools
  • last updated a few seconds ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
COR-363 Tests related to Tika fail randomly

COR-363 Tests related to Tika fail randomly

COR-333: TikaDocumentReader causes 'Unparseable date'

  1. … 4 more files in changeset.
COR-333: TikaDocumentReader causes 'Unparseable date'

Fix description:

* Don't convert date value extracted from document's properties to String of Java's Date object. This format doesn't conform to ISO8601 standard used in JCR

  1. … 4 more files in changeset.
COR-333: TikaDocumentReader causes 'Unparseable date'

  1. … 4 more files in changeset.
COR-332: Fixed the issue with the slide order

  1. … 2 more files in changeset.
COR-329: Fixed the issue with the slide order

  1. … 2 more files in changeset.
COR-331: Implement MSPPTXStreamDocumentReader using SAXParser

Problem analysis:

* Apache's POI for MS PPTX files provides only in-memory model.

In this model, SAXParser is used too many times (triple the slide number) even to get some meta data information.

It is therefore unsuitable to parse very big files (in terms of slide number).

Fix description:

* Implement a new document reader for PPTX files by reading the stream.

* Get meta data information directly from the corresponding file (core.xml) if this file exists.

* Parse and index text in a certain number of first slides.

  1. … 4 more files in changeset.
COR-329: Streaming parser for MSXPPTDocumentReader

Fix description:

* Implement streaming model to get properties and content of Microsoft Powerpoint files (OOXML).

* Index the content of the first 500 slides.

  1. … 5 more files in changeset.
COR-334: TikaDocumentReader causes 'Unparseable date'

  1. … 4 more files in changeset.
COR-333: TikaDocumentReader causes 'Unparseable date'

  1. … 4 more files in changeset.
COR-281 : Can not get properties: Your document contained more than 10240 characters, and so your requested limit has been reached.

  1. … 5 more files in changeset.
COR-280 : Can not get properties: Your document contained more than 10240 characters, and so your requested limit has been reached.

  1. … 5 more files in changeset.
COR-280 : Can not get properties: Your document contained more than 10240 characters, and so your requested limit has been reached.

  1. … 3 more files in changeset.
COR-280 : Can not get properties: Your document contained more than 10240 characters, and so your requested limit has been reached.

  1. … 2 more files in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE

  1. … 1 more file in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE

  1. … 1 more file in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE core-2.6.x

  1. … 1 more file in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE core-2.6.x

  1. … 3 more files in changeset.
COR-278 : IllegalArgumentException when upload a vsd file via CE

  1. … 3 more files in changeset.
EXOJCR-1889: logging cleanup

  1. … 24 more files in changeset.
EXOJCR-1771: Checked the scope and the dependency list of each project

    • -0
    • +4
    ./tsm-excludes.properties
  1. … 12 more files in changeset.
EXOJCR-1532: Change the target xsd on xml file

  1. … 30 more files in changeset.
COR-228

What is the problem to fix?

Implementation that uses iText does not support many XMP metadata. Make new implementation of PdfDocumentReader.getProperties() using PdfBox instead of iText.

How is the problem fixed?

Use PdfBox to extract XMP metadata.

iText was removed from code.

  1. … 4 more files in changeset.
EXOJCR-1373: pdf documents metadata UTF-16 encoding support added

    • binary
    ./Trait_union.06.Mai_2009.pdf
  1. … 2 more files in changeset.
EXOJCR-1175: Implement PDFDocumentReader.getProperties using PDFBox

  1. … 4 more files in changeset.
EXOJCR-1173: revert changes

  1. … 10 more files in changeset.
EXOJCR-1173: Wrap all Exceptions in DocumentReader.getProperites() into DocumentReadException

  1. … 10 more files in changeset.
EXOJCR-1114: provided support for more MIME types

  1. … 12 more files in changeset.
COR-218: provided support for more MIME types

  1. … 13 more files in changeset.