IBM Content Harvester Free Download

IBM Content HarvesterOverview

A document created with word processors is a collection of character sequences and embedded objects interspersed with formatting information. This makes it difficult to access the content. A large subset of such documents follows an underlying structure, whether the document is a resume created by an individual, for example, or by a team for project documentation.

IBM Content Harvester allows you to harvest such unstructured, formatted documents by:

* Extracting the content
* Cleansing sensitive information
* Tagging based on user-defined names
* Querying for selective tags, and
* Publishing information of interest in any open format.

You simply specify the regions of content that are of interest in terms of textual markers, what tag to assign to the extracted content, and what terms to cleanse off in the extracted content, using rules. The information is then processed for cleansing and tagging. The resulting output is an XML file which can be queried using XQuery for any assigned tag and published in any open format like HTML using XSL transforms.

NEW

Fixed some bugs.

IBM Content HarvesterInformation

Version

1.0

Date

05.28.10

License

Free

Language

English

File Size

13.91MB

Developer

IBM Corporation

IBM Content Harvester

A document created with word processors is a collection of character sequences.

IBM Content HarvesterOverview

NEW

IBM Content HarvesterInformation

More FromIBM Corporation

Office Suites SoftwareTop Downloads