Personal tools
Home Collaboration Tech Blog

Tech Blog

Get information on new and exciting happenings in the world of JSTOR.

Digitization for the rest of us? by Keith Kiser posted on Tuesday, March 31, 2009 0 comment(s) —
Decapod is a highly automated, intelligent capture system for bound documents based on low cost hardware and open source software.

The process JSTOR uses to produce the high quality images is documented here. The process is complex and time intensive. But what about the small historical society or smaller collections which would like to place their content on-line to achieve a broader access? This process may not work as well for them due to limited resources these groups have.

Enter Decapod. 

Decapod is a Mellon Funded project with a team drawn from the Universities of Kaiserslautern and Toronto and JSTOR, to build a “one click” document capture and conversion system for use by small and medium physical collections to generate web-ready corpora and to prepare scanned content for consumption on mobile devices.  This will make the generation of contributed collections easier and improve some of JSTOR’s production processes. Delivery will be in stages through the end of 2010.

Decapod will be an inexpensive attaché case sized hardware/software solution that can be readily procured and assembled and taken into the stacks by local staff or volunteers to quickly and unobtrusively capture the material and deliver it in usable format.  It will be open-source, easy to use, and will provide an out-of-the box method of digitizing small to medium archives of scholarly material.  
 
The project will develop new components while integrating existing components from DFKI's OCRopus system and ATRC's FLUID to create a high quality, low cost solution for digitization.  The primary output from the system will be reflow tagged, PDF/A files that can present either high fidelity reproductions of the original pages or can reflow for use on any supported PDF platform, including mobile devices.  Intermediate results such as archive quality TIFFs can also be generated and these can be fed into any digitization workflow via a standard 'watch folder' interface.  
 
Stay tuned for more. 
Have an idea you like to see talked about here, send us an email at .

 

 

 

 

Welcome by Admin posted on Friday, February 20, 2009 0 comment(s) —
Announcing the JSTOR showcase.

We're pleased to announce the opening of the showcase, where JSTOR's advanced technology team are bringing their projects into the bright light of day for the first time.

This is not just from JSTOR -- we recognize that innovation is taking place everywhere, and so much of what appears here comes from collaborations with partners elsewhere in the community.

Our role is advancing research and bringing you the tools and facilities that you need, so let us know how we can help you. Whether you need access to data from JSTOR, or have tools that you would like to share that make using JSTOR easier, or want to request tools, we will do our best to help

just to let you know what a few of the things that we are up to

  • We're working with partners in Germany and Canada to make the job of digitizing dramatically easier, using cameras and state-of-the-art document understanding we hope to make it possible to get paper into real digital documents as easy as pressing a button.  This is a hard problem and we'll report on progress  over the next year.
  • With partners from across the globe, we're looking at ways to make it easy to annotate almost anything, anywhere and anyhow and to seamlessly share those annotations between platforms and colleagues.
  • We're looking at ways to make our image resources more useful -- we'd like to become a workbench as well as a bookshelf. That includes the extensive plants database and document images as well as the wide ranging collection of images from the journal collection.
  • As you can see elsewhere on this showcase, we're trying to expose data in useful ways for corpus analysis and mining.  Go check out the Data for Research facility here on Showcase and at http://dfr.jstor.org .
  • We've got tools that allow you to use OAI-ORE over the JSTOR collections. 
  • We're trying to develop ways of exposing the structure of the JSTOR collections by analyzing relationships over time and presenting them graphically. That is a particularly rich vein and we'd welcome any suggestions or contributions on how to compare or relate items in the collection and how to present or analyze the structure.

Not everything we work on will make it into the Showcase, and not everything that appears on Showcase will make it onto the main JSTOR site. This is, after all, where we work with fellow researchers and not everything works out (or even works), but it is a place where you can see and contribute to our shared future.

On which note, if you want to meet some of us in person we will be attending various conferences over the year, the next of which is OAH in Seattle on March 26th to 28th. We'll be at the JSTOR booth, so come by and share your thoughts with us.

-john burns

Tech Blog
« November 2009 »
November
MoTuWeThFrSaSu
1
2345678
9101112131415
16171819202122
23242526272829
30
Recent additions
Digitization for the rest of us?
Welcome
More...
Categories
Digitization (1)
JSTOR (1)
MetaData (0)
Text Mining (0)