dna-search

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DNA-467 Restructured the SearchEngine functionality to simplify it, and to change over to have implementations be extensions. To specialize, an implementation now simply subclasses the SearchEngine, and this is instantiated directly.

The SearchEngine is also now oriented around a RequestProcessor. It's possible to obtain a RequestProcessor to issue multiple requests within a single connection, and this will make it significantly easier to reuse within a connector implementation (as well as within the 'dna-repository' module). Also, the SearchEngine's abstract RequestProcessor implementation provides default implementations for crawling and indexing entire subgraphs to generate the required UpdatePropertiesRequest and CreateNodeRequest stream. Specializations are responsible for fully-implementing the RequestProcessor to handle all the different kinds of requests, including AccessQueryRequest and FullTextSearchRequest.

Thus, the LuceneSearchEngine can be instantiated directly, as it is now a subclass of SearchEngine. It also exists in a new 'extensions/dna-search-lucene' project, and continues to use the two-index design that was implemented previously. This new implementation uses a customized RequestProcessor implementation. This also should make it easier to process Changes via the request processor's methods.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1418 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 84 more files in changeset.
DNA-562 Upgraded Lucene to the latest release (3.0.0), which was made just last week and contains only removal of deprecated APIs.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1376 76366958-4244-0410-ad5e-bbfabb93f86b

Changed the ${pom.version} to ${project.version}, since the former has been deprecated.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1356 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 28 more files in changeset.
DNA-467 Changed the way IndexRules are defined to address the different datatypes, allowing the queries to be properly created using the appropriate Lucene query objects.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1342 76366958-4244-0410-ad5e-bbfabb93f86b

    • -312
    • +338
    ./src/main/java/org/jboss/dna/search/IndexRules.java
DNA-467 Refactored the query functionality to minimize dependencies on the rest of 'dna-graph', meaning that we can probably extract it and that it would be useful for other projects. This impacted a fair bit of code, but it does clean up the assumptions and dependencies of the query engine. Search is still not very decoupled. But everything still works and all unit tests pass.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1340 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 90 more files in changeset.
DNA-467 Integrated the Lucene implementation of the SearchEngine component, added a number of unit tests.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1339 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 3 more files in changeset.
DNA-467 Removed use of a method that was deprecated in 2.9 (or maybe 2.9.1)

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1338 76366958-4244-0410-ad5e-bbfabb93f86b

DNA-467 Changed the implementation of the method to delete nodes under a specified branch, and added several test cases to verify that the content can be indexed, re-indexed (multiple times), and searched.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1337 76366958-4244-0410-ad5e-bbfabb93f86b

DNA-467 Renamed unit test to reflect the name of the class it's testing.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1330 76366958-4244-0410-ad5e-bbfabb93f86b

DNA-467 refactored the search engine components, moving the general-purpose classes (i.e., those that don't depend on Lucene) into 'dna-graph'.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1329 76366958-4244-0410-ad5e-bbfabb93f86b

    • -0
    • +74
    ./src/main/java/org/jboss/dna/search/LuceneException.java
    • -0
    • +368
    ./src/main/java/org/jboss/dna/search/LuceneSession.java
  1. … 7 more files in changeset.
DNA-467 Added query to constrain the length of a field.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1328 76366958-4244-0410-ad5e-bbfabb93f86b

DNA-552 Added to the abstract query model a new Constraint subclass 'Between' that represents a constraint on a DynamicOperand such that the values are within a certain range. Also added the corresponding support to the SQL parser for 'BETWEEN x AND y' and 'NOT BETWEEN x AND y', which is the syntax commonly used in other SQL grammars. (Note that this is an extension beyond JCRSQL2.) Also added support for 'BETWEEN x EXCLUSIVE AND y' and 'BETWEEN x AND y EXCLUSIVE' to be able to specify that the ranges do or do not include the boundary value. This feature allows the Lucene search/query implementation to apply a more efficient query to the indexes than does an AND of two Comparison constraints.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1323 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 8 more files in changeset.
DNA-467 Removed fields that are no longer used

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1322 76366958-4244-0410-ad5e-bbfabb93f86b

DNA-467 Removed dependency that was accidentally added/committed

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1321 76366958-4244-0410-ad5e-bbfabb93f86b

DNA-467 Continued implementation of the dna-search components. Current status is that the SearchEngine is nearly complete (feature-wise), but has had little testing.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1320 76366958-4244-0410-ad5e-bbfabb93f86b

    • -0
    • +1503
    ./src/main/java/org/jboss/dna/search/DualIndexLayout.java
    • -0
    • +66
    ./src/main/java/org/jboss/dna/search/IndexLayout.java
    • -0
    • +162
    ./src/main/java/org/jboss/dna/search/IndexSession.java
    • -0
    • +656
    ./src/main/java/org/jboss/dna/search/KitchenSinkIndexLayout.java
  1. … 33 more files in changeset.
DNA-467 Additional refactoring to move toward a working search engine.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1319 76366958-4244-0410-ad5e-bbfabb93f86b

    • -32
    • +32
    ./src/main/java/org/jboss/dna/search/IndexRules.java
    • -0
    • +115
    ./src/main/java/org/jboss/dna/search/KitchenSinkIndexStrategy.java
    • -0
    • +161
    ./src/main/java/org/jboss/dna/search/SearchContext.java
  1. … 17 more files in changeset.
DNA-467 Changed how the Queryable and QueryEngine use Schemata. Before, the schemata instance was passed into the engine, and used for all queries. But this meant that the schemata instance could change the Table objects it returns. This would be more difficult to implement than if a different Schemata were passed with each query, since that Schemata instance can be a reflection of the tables at the time the query is implemented. And, Schemata can then be made to be immutable, not only simplifying the implementation but also ensuring that the schema information doesn't change during the processing of a query.

So, the Queryable and QueryEngine were changed to always pass a Schemata in with each QueryCommand. This rippled down to the SearchEngine, but this also cleaned things up a bit there, too.

It also allows different Schemata instances to be used for different languages (if that would ever make sense).

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1318 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 3 more files in changeset.
DNA-467 Add search/query support to the graph API

Broke out the language parsing from the QueryEngine and into a new QueryParsers. The QueryEngine should do everything in terms of a QueryCommand, since it's possible for the query engines to be layered, and parsing languages is completely orthogonal to execution.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1291 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 7 more files in changeset.
DNA-467 Add search/query support to the graph API DNA-468 Add XPath query language support

Added a lot more implementation and unit tests behind the XPath parser, the XPath AST objects, and the translator that converts an XPath AST into a SQL query model. Found and fixed a number of issues in several of the 'dna-graph' classes, including some tweaks to the SQL query odel, the SQL parser, minor improvements to the JodaDateTime class, and some changes to the exception handling in the 'dna-graph' query model and builder.

At this point, the XPath support is pretty good, though still not complete. It may be good enough to use for a while - until we have more examples. The unit tests are verifying not only that the XPath expressions can be parsed, but that they're also converted correctly to an expected SQL representation.

The XPath functionality has not yet been integrated into the JCR implementation, since that requires hooking up the LuceneQueryEngine (which still needs some work). However, I plan to start putting more of the pieces together and focusing on wrapping up the LuceneQueryEngine.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1286 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 30 more files in changeset.
Merge branch 'dna-529'

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1276 76366958-4244-0410-ad5e-bbfabb93f86b

DNA-529 Upgrade to Lucene 2.9

Uploaded the Lucene 2.9.0 jars, sources jars, and POM files into the JBoss Maven repository, and change the 'dna-search' POM file to reference this new version.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1274 76366958-4244-0410-ad5e-bbfabb93f86b

  1. … 1 more file in changeset.
DNA-467 Add search/query support to the graph API

Added an initial version of the query engine functionality. This design provides a way to define and supply queries (in a various languages) and have the engine parse them into a single Abstract Query Model (equivalent to the abstract syntax tree for a query), plan, validate, optimize and then process the portions atomic portions of the query plan. This whole system was designed to be easily reused as-is or extended and customized to provide the desired behavior. But because this is a generalized query engine capable of query over a 'graph', the actual processing of the atomic portions of the queries must be provided when the engine is used. Part of this commit includes a new 'dna-search' project containing a specialization of the query engine with a processor capable of using a set of Lucene search indexes, along with utility and management methods to populate and update the indexes (by indexing the entire content and/or by updating the content based upon events).

A number of packages were added to 'dna-graph', including: an abstract query model (AQM) based upon the JSR-283 specification in 'o.j.dna.graph.query.model'; a query engine component in 'o.j.dna.graph.query', a query planning module in 'o.j.dna.graph.query.plan'; an extensible rule-based optimization module in 'o.j.dna.graph.query.optimize'; a simple way to define the schemata that is being queried in 'o.j.dna.query.validate'; a flexible processing plan model and execution framework in 'o.j.dna.graph.query.process'; and a framework for different query language parsers (including a JCR-SQL2 parser) in 'o.j.dna.query.parse'. This entire query engine framework was designed to be reused and/or extended in multiple places, and so includes a way to accept and execute queries from a number of different an abstraction of the actual processing of the low-level atomic queries. Numerous unit tests were added to test each of the components, including a large number of tests for the SQL parser.

A new 'dna-search' project was created and the initial Lucene-based query engine functionality was added. Quite a few tests were added to verify the desired behavior.

At this point, the general query engine and the Lucene-based specialization are for the most part complete and thoroughly tested, but these components need to be integrated into the larger connector framework and JCR implementation. All of the Lucene index generation and management needs to be coordinated and integrated with the administration and lifecycle of the DNA connectors and JCR engine. Additionally, while there are methods to create/update the indexes, the ability to extract text from binary property values still needs to be added. In short, there still is a lot of outstanding work.

git-svn-id: https://svn.jboss.org/repos/modeshape/trunk@1234 76366958-4244-0410-ad5e-bbfabb93f86b

    • -0
    • +386
    ./src/main/java/org/jboss/dna/search/DirectoryConfigurations.java
    • -0
    • +278
    ./src/main/java/org/jboss/dna/search/IndexContext.java
    • -0
    • +628
    ./src/main/java/org/jboss/dna/search/IndexingRules.java
    • -0
    • +121
    ./src/main/java/org/jboss/dna/search/IndexingStrategy.java
    • -0
    • +112
    ./src/main/java/org/jboss/dna/search/LuceneQueryEngine.java
    • -0
    • +305
    ./src/main/java/org/jboss/dna/search/SearchEngine.java
    • -0
    • +68
    ./src/main/java/org/jboss/dna/search/SearchI18n.java
  1. … 142 more files in changeset.