modeshape-extractor-tika

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Changed components versions to 2.8.2.GA.

  1. … 67 more files in changeset.
MODE-1561 - Updated code as per review

  1. … 1 more file in changeset.
MODE-1561 - Updated the TikaTextExtractor to support a "writeLimit" property which is passed down to Tika. If absent, the default Tika limit of 100k characters is used.

  1. … 4 more files in changeset.
'Release: update versions for modeshape-2.8.3.Final'

  1. … 70 more files in changeset.
'Release: update versions for modeshape-2.8.2.Final'

  1. … 70 more files in changeset.
MODE-1527- Migrated initial version of the text extractors from 2.x and updated the binary store to extract the text and mime-type of binary values

Working on this, exposed how fragile - lock-wise - is working with the SharedLockingInputStream (FileSystemBinaryStore). Therefore, I've updated the mime-type detection so that mark & reset are avoided as much as possible, also making sure that streams are closed after each detector finishes with them.

The Tika version was bumped to 1.1 which required also the update of the POI version to 3.8.

    • -12
    • +0
    ./src/test/resources/log4j.properties
    • -168
    • +0
    ./src/test/resources/modeshape.docx
    • -1612
    • +0
    ./src/test/resources/modeshape.ps
    • binary
    ./src/test/resources/modeshape_gs.pdf
  1. … 47 more files in changeset.
'Release: update versions for modeshape-3.0.0.Alpha6'

  1. … 47 more files in changeset.
'Release: update versions for modeshape-3.0.0.Alpha5'

  1. … 47 more files in changeset.
'Release: update versions for modeshape-3.0.0.Alpha4'

  1. … 45 more files in changeset.
'Release: update versions for modeshape-2.8.1.Final'

  1. … 70 more files in changeset.
Released ModeShape 3.0.0.Alpha3

  1. … 45 more files in changeset.
MODE-1414 (related): promote version #'s in 2.5.x to 2.5.4.GA for BZ-786561 Roll up patch fro EDS_5.2_20120320

  1. … 68 more files in changeset.
Release: update versions for modeshape-3.0.0.Alpha1

  1. … 45 more files in changeset.
Changed version to SNAPSHOT following release

  1. … 74 more files in changeset.
Changed the '2.8-SNAPSHOT' artifact version to '2.8.1.GA' for use in the product.

  1. … 74 more files in changeset.
'Release: update versions for modeshape-2.8.0.Final'

  1. … 70 more files in changeset.
MODE-1378 Further improved poms: - moved OSGI bundle information generation in parent pom - added default db test profile, based on H2 - added datasource testing support, based on a filtered properties file - updated bundle plugin version to 2.3.7

  1. … 17 more files in changeset.
'Release: update versions for modeshape-3.0.0.Alpha1'

  1. … 43 more files in changeset.
Changed version to 2.8-SNAPSHOT after releasing 2.7.0.Final

  1. … 73 more files in changeset.
'Release: update versions for modeshape-2.7.0.Final'

  1. … 76 more files in changeset.
MODE-1353: promote version #'s in 2.5.x to 2.5.3.GA for SOA-3656

  1. … 68 more files in changeset.
MODE-1305 Remove unused i18n methods

- also a small refactoring of the I18n class

  1. … 23 more files in changeset.
MODE-1305 Remove unused i18n methods

- also a small refactoring of the I18n class

  1. … 23 more files in changeset.
MODE-1300 Updated Tika version to 1.0

- cleaned up Aperture mime type detector

- added an additional test to expose Tika problem parsing PDFContext pdfs

    • binary
    ./src/test/resources/modeshape_gs.pdf
    • -231
    • +0
    ./src/test/resources/modeshape.pdf
  1. … 3 more files in changeset.
MODE-1300 Updated Tika version to 1.0

- cleaned up Aperture mime type detector

- added an additional test to expose Tika problem parsing PDFContext pdfs

    • binary
    ./src/test/resources/modeshape_gs.pdf
    • -231
    • +0
    ./src/test/resources/modeshape.pdf
  1. … 3 more files in changeset.
Changed versions to prepare for 2.7-SNAPSHOT development

  1. … 77 more files in changeset.
'Release: update versions for modeshape-2.6.0.Final'

  1. … 69 more files in changeset.
Changed version to 2.5.2.GA, in preparation for release.

  1. … 73 more files in changeset.
MODE-1289 New approach for storing/caching JCR content

This is the first commit to start the 3.0 effort, which involves a major change to how

the JCR layer stores and caches information. The new approach is based upon Infinispan and uses

Infinispan's cache loaders for persistence, and JSON-like documents (that are in-memory

structures not needing to parsed/written) are used to store information for each node.

There are several new Maven modules:

- modeshape-jcr-redux

- modeshape-schematic

The 'modeshape-jcr-redux' module will eventually replace the 'modeshape-jcr' module once

the implementation is far-enough along. And the 'modeshape-schematic' module will likely

move into the Infinispan project, so that needs to remain separate.

Although it may seem strange and unkempt to have the new JCR implementation in a new module,

doing so means that we can continue to rebase from 'master' (and the 2.7 work) for at least

some time. When the new module becomes complete enough, we'll move it and replace the

existing 'modeshape-jcr' module. It's also convenient to have both the old and new implementations

around in the same codebase.

The build was changed to focus upon the (few) modules that are oriented around the new

implementation. So the following can be used to build the newer codebase:

mvn clean install

However, the build has a new Maven profile called "legacy" that can be used to build the

old modules. We kept this to make sure that any rebasing can be compiled and verified.

For example, to build everyhing, including the new modules and the 2.x-style modules,

use the following command:

mvn clean install -Plegacy

As the newer 'modeshape-jcr-redux' progresses and other modules (e.g., sequencers, web,

jboss, text extractors) are converted to use the new module, they should be moved

from the 'legacy' profile into the main set of modules in the top-level 'pom.xml'

  1. … 447 more files in changeset.
'Release: update versions for modeshape-2.6.0.Beta2'

  1. … 69 more files in changeset.