Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Close input streams passed to TikaNameOnlyDetector.detect()

MODE-2684 Removes the compile time dependency of modeshape-core towards Apache Tika The mime type extraction functionality will still function as-is if Tika is present, but now there is also an independent extension-based default which will be used if Tika is not present in the CP at runtime

    • -0
    • +75
    ./DefaultMimeTypeDetector.java
  1. … 16 more files in changeset.
MODE-2528 Integrates the new relational provider with the modeshape codebase This is a huge commit which makes the necessary changes to remove all Infinispan configuration and dependencies, replacing it with the new mechanism. It also contains several changes to the relational provider design because of various failing tests. This includes among other thing the necessity for ModeShape to notify the provider once exclusive locks have been obtained as part of each transaction.

  1. … 305 more files in changeset.
MODE-2489 Updated the AS kit to expose the new mime-type detection configuration options.

  1. … 14 more files in changeset.
MODE-2489 Refactored mime-type handling and added the possibility of configuring the repository to use either "content", "name" or no mime-type detection at all.

    • -0
    • +89
    ./ContentDetector.java
    • -0
    • +74
    ./NameOnlyDetector.java
  1. … 31 more files in changeset.
MODE-2221 Moved the SelfClosingInputStream to the common package and changed it so that it only wraps (and closes) and InputStream. Refactored the binary value classes to make sure any stream returned from a binary value is wrapped into a self closing stream.

  1. … 15 more files in changeset.
MODE-2221 Moved the SelfClosingInputStream to the common package and changed it so that it only wraps (and closes) and InputStream. Refactored the binary value classes to make sure any stream returned from a binary value is wrapped into a self closing stream.

  1. … 16 more files in changeset.
MODE-2221 Moved the SelfClosingInputStream to the common package and changed it so that it only wraps (and closes) and InputStream. Refactored the binary value classes to make sure any stream returned from a binary value is wrapped into a self closing stream.

  1. … 16 more files in changeset.
Moved test JTA settings from a base test class into an XML configuration file, which provides much more flexibility.

  1. … 16 more files in changeset.
MODE-2081 Changed the remaining files over to the ASL 2.0 license

  1. … 1046 more files in changeset.
Corrected JavaDoc errors and compiler warnings.

  1. … 11 more files in changeset.
MODE-2033 Added the ability to expose non critical errors & warnings occurring during repository startup, including those caused by schema validation.

  1. … 16 more files in changeset.
MODE-1639 Added log message.

MODE-1639 Minor improvements and fixes

A log message is output when no Tika mime type detectors could be found (meaning Tika is not on

the classpath), stating that automatic MIME type detection will be disabled. Also ensured that

the input stream used by Tika is always closed.

All unit and integration tests pass.

  1. … 4 more files in changeset.
MODE-1639, MODE-1640, MODE-1634 Replaced the Aperture-based MIME type detector with a Tika-based one

This required quite a bit of dependency gymnastics, since Tika has quite a few more transitive

dependencies than the Aperture library (which we had successfully pared down several years ago).

Tika references about 25 dependencies (including transitive dependencies), but this was reduced

in 'modeshape-jcr' to about 8 for basic MIME type detection. Note that Tika usually includes

two BouncyCastle libraries in its dependencies (used for encrypted PDFs, among other things),

but ModeShape intentionally excludes these (as we don't want to ship or depend on any

security-related JARs).

Not only do we get Tika's substantial MIME type database, we've made it possible for users

to edit the 'org/modeshape/custom-mimetypes.xml' file and provide the updated one on the application

classpath. What goes in that file will overwrite all of the other sources (namely Tika's built-in

file and its customization file, both of which are to be found on the classpath), which means

it's easiest to simply provide an updated version of this file at 'org/modeshape/custom-mimetypes.xml'.

Be sure to not remove any of the (few) customizations that ModeShape includes - those are important.

As we upgrade Tika, we'll get updated versions of the media type data. This is far more preferable

than having a ModeShape-specific version.

The MIME type related interfaces in ModeShape's public API (e.g., 'modeshape-jcr-api') have been removed.

These were added sometime in one of the 3.0 releases, so removing them will not introduce compatibility

issues for users.

Instead, we've decided to get out of the MIME type detection framework business, and have decided

to switch to Tika for all MIME type detection. In fact, you can still write your own MIME type detector,

but you do that by implementing Tika's interface and reference the implementation class(es) in the

corresponding service loader file in your JAR. (See the TIKA documentation for details.)

However, internally we still have an abstraction. This is because it is possible to remove the Tika

(and transitive dependencies) from a ModeShape installation, as long as your applications will not

expect any kind of automatic MIME type detection. This is a perfectly valid use case: for example,

using a repository to store data and do not store files (and don't use sequencers).

The AS7 kits required a bit more modification. There is now a new AS7 module for 'org.apache.tika'

that contains all of the JARs, and this is used by the ModeShape module and by the Tika text extractor

module.

All unit and integration tests pass with these changes. Several new tests were added.

    • -52
    • +0
    ./ExtensionBasedMimeTypeDetector.java
    • -0
    • +136
    ./TikaMimeTypeDetector.java
  1. … 64 more files in changeset.
Corrected compiler and JavaDoc warnings.

  1. … 22 more files in changeset.
MODE-1544 - Extracted Tika based mime-type detector and updated the way mime type detectors are loaded and initialized.

Because of the AS7 support, the detectors need to be loaded via the Environment class loader. Also, because text extraction (and implicitly mime-type detection) can be triggered preemptively, some of Tika's excluded dependencies needed to be added back (e.g. for .java and .class files)

    • -0
    • +48
    ./NullMimeTypeDetector.java
  1. … 16 more files in changeset.
MODE-1527- Migrated initial version of the text extractors from 2.x and updated the binary store to extract the text and mime-type of binary values

Working on this, exposed how fragile - lock-wise - is working with the SharedLockingInputStream (FileSystemBinaryStore). Therefore, I've updated the mime-type detection so that mark & reset are avoided as much as possible, also making sure that streams are closed after each detector finishes with them.

The Tika version was bumped to 1.1 which required also the update of the POI version to 3.8.

    • -4
    • +10
    ./ExtensionBasedMimeTypeDetector.java
  1. … 57 more files in changeset.
MODE-1508 Cleaned up optional dependency on Aperture

The Aperture library was changed to 'provided' scope, which means developers can add this library to the classpath, and ModeShape will automatically enable the Aperture-based MIME type detector. However, it is no longer there automatically.

  1. … 1 more file in changeset.
MODE-1386 Updated Maven assemblies, corrected dependencies, and added examples

The Maven assemblies were corrected (a bit; still work to do to create usable distributions),

but several dependencies were removed and two examples were added to the codebase (but not to

the build yet).

  1. … 59 more files in changeset.
MODE-1368 Removed all legacy modules no longer needed in 3.x

ModeShape 3.x will not need a number of the 2.x modules. In particular:

- since 3.x will only have an AS7 kit, the AS5 or AS6 artifacts were removed

- all the connectors were removed, since they're no longer used

- the connector benchmark tests module was removed, replaced by our new

performance test suite

- the JPA DDL generator utility has been removed

- the 'modeshape-graph', 'modeshape-repository', 'modeshape-search-lucene'

and 'modeshape-clustering' modules have all been removed, since the new

'modeshape-jcr' module no longer uses them

- the DocBook modules were removed and are replaced by the Confluence space

- the two JDBC modules were moved out of the 'utils' directory to top-level modules

The build still works, but not all components have been included in the build.

This is because the query functionality doesn't yet work, so quite a few web

and JDBC driver modules all depend on this.

The assembly profile has not yet been changed or corrected.

  1. … 3649 more files in changeset.