Section 3. Creating Digital Images
3.7 Image digitisation process: work flow, procedures and good practices
Introducing the production process
Choosing whether to undertake digital image creation 'internally', which may require recruiting specialist staff and equipment, or to 'out-source' it, will usually be influenced by the volume of images to be created, the nature of the original objects to be digitised and the staff skills and technology already 'on-tap'.
Whichever route is taken, it is important to be aware of good practices and procedures in the actual production process of creating digital images.
The image digitisation process goes from identifying and assessing the original analogue sources, to producing a digital 'primary capture' image with accompanying metadata. The creation of surrogate 'archival/master' and 'delivery' images can be extended from this.
Quality Assurance mechanisms should be in place throughout the digitisation process to ensure minimum error and maximum consistency in the digital images created.
The total image digitisation process, or work flow can be separated into the following three phases, each subject to QA:
1. Pre-digitisation feasibility study
This is the actual primary image capture procedure, including the application of initial metadata and storage of files. It also includes the preparation and handling of analogue originals prior to capture and on their return to their original source.
3. Post-digitisation processes
Procedure guidelines for digitisation
These guidelines are adapted from a report on the experiences of both HEDS (Higher Education Digitisation Service: http://heds.herts.ac.uk/) and TASI in advising and guiding the JISC Image Digitisation Initiative (JIDI) project. The JIDI project relates to large volume digitisation of existing analogue resources, using scanners as the image capture device. The original report has been adapted herein for more generic application. The principles discussed can be transferred to other types of digital creation scenarios.
For the unedited JIDI workflow report see: http://www.tasi.ac.uk/building/workflow1.html
For the full JIDI Feasibility Study report see: http://heds.herts.ac.uk/Guidance/JIDI_fs.html
The guidelines below relate to the digitisation phase of the creation process, after a feasibility study and prior to any major post-digitisation processes, which the rest of the guide advises on. (The procedures are based upon and applicable to digitisation feasibility studies.)
Digitisation production procedures
Good preparation of analogue originals and technical systems is essential to help avoid common errors or quality defects encountered in large scale digitisation projects. Preparation falls into two categories:
Preparation of analogue materials
Before the original items are transported to the site of digitisation (whether internal to the organisation or out-sourced) it is important that every individual item is in the best condition possible, clearly labelled and suitably packaged. This will speed operations at the digitisation point and enable the digitisation operator to differentiate between the originals. This is essential to name the files created accurately.
In addition, an inventory of materials for digitisation should be maintained, to be reflected across the whole production process. The inventory should be created by those supplying/responsible for the analogue items. (Metadata records, if established, may be used as the basis of an inventory.)
The inventory should include:
The inventory can be used to:
A spreadsheet or simple database is useful for creating inventories, but it is also useful for the items to be accompanied by a paper inventory through all stages. This will aid checking-in, annotating, and 'signing off' items as they pass through the process.
The inventory can also be used as the basis for file creation and recording mechanisms for the management of originals through the digitisation process.
The inventory should be reflected back to the originating body at the point of digitisation, to ensure that all items have been safely received. There should likewise be a further inventory check at the point of return of materials to the originator.
An inventory is thus a useful management tool. It can provide a full record of all production processes undertaken on analogue items and their digital versions. It is recommended that however far or near (even in the same building) items are moved for digitisation, they should be fully inventoried to this level to ensure against loss or mistaken identification.
Packaging of originals
The 'supplier' of the original materials should be responsible for the packaging of the items for movement. This should be done with the distance and type of transport to be used in mind.
The fragility of materials and any special handling requirements should be clearly indicated, to prohibit damage, either in storage or in production.
Summary of preparation of analogue materials
All of the above factors help to ensure that no items are misplaced or damaged either in transit or during digitisation. It also ensures against errors in naming files created. Further, it makes quality assurance at all levels more efficient by providing easily comparable input records against the end product.
The recording mechanism (inventory) for the throughput of originals will need to be designed, possibly at the media/format level. The inventory can then enable automatic filename creation at the point of scanning, by picking filenames from a pre-created list of materials to be processed. This reduces scanner operator interpretation and reduces file naming errors.
The directory file structures for digital images should be created on the capture system before any scanning takes place to enable items to be placed easily into the correct data structures and again to reduce operator interpretation.
Every possible mechanism should be set up to ensure that the scan operators can concentrate their core skills on handling materials, scanning and producing acceptable images, and not on constructing filenames or data structures, which should be ready in place.
Preparation local to digitisation
Preparation of scanner capture systems and software settings
Every digitisation service will use different equipment in slightly different configurations, but there are some common themes and methods that can ensure the best set up.Memory
On production equipment it is important to reduce the number of extraneous programs or memory resident applications that are running and to include only those that are going to assist in the digitisation process.
Scanning/manipulation software location
It is recommended that software resources required for scanning be locally available on the hard disk of the production machine, even if the machine is networked. (It may even be beneficial to disconnect the machine from a network during production.)
Light conditions can have a substantial effect on the ability of the operator to assess tones and colours accurately both in the originals and on screen. Natural light is preferable to artificial light, but should be controllable to allow the operator to modify conditions to suit the prevailing light conditions. Reflections and high light conditions should be avoided as they affect the fidelity of on-screen images.
The computer monitor configuration is important in gaining the most accurate results from the scanned images. The following are recommendations:
Image digitisation should aim for accurate, repeatable, colour from the input phase through the display and to output. This is where Colour Management Software (CMS) is brought into play as the best means currently available for enabling colour matching.
Colour perception is complicated by environmental conditions, and by psychological and physical factors in the operator. CMS assists in reducing these factors to ensure more quantifiably accurate outputs by relating output to a standard colour definition.
There are a number of CMS packages and systems available. With this variation in CMS operating environments it is not possible to give a definitive 'how to' guide for using CMS, but there are some common features that should be utilised:
Colour and greyscale targets are provided by Kodak and enable the scanner and monitor to be configured to reproduce accurately the colours and tones on the target cards or slides. As the outputs are measurable against the targets, then scanner profiles can be created and saved to be used in production set-up.
Output profiles can be used and modified to ensure that the output conforms to the International Color Consortium (ICC) colour standard whilst replicating the colour of the original as accurately as possible.
Colour and greyscale targets can be used to 'record' the correct colour or tone of an original before it goes into the scanner. This provides a reference point when the original is no longer available in the scanner for direct viewing. The record can be compared to the displayed image to ensure the tone and colour representation in the recorded section is accurate.
Handling and optimising originals at scanner point
The handling of the originals has to be considered quite carefully and the digitisation environment must be the first step in ensuring safe handling. The recommendations for handling at the digitisation point are:
Slides and other transparencies can get quite dusty and hairs can attach. These can be removed with careful application of a clean air device such as a puff brush.
Loading the scanner with original materials
It is preferable to use multi-frame holders for 35mm slides and negatives as this enables quicker transference of originals into the scan area and reloading of the holders whilst the scan is taking place.
It is recommended that two transparency holders for each size and media be used to enable loading of one whilst the other is in use, optimising production times.
Images from originals to be scanned on flatbed devices are optimised by being weighted flat onto the scan bed. If this is an acceptable method for handling such materials, then backing paper or glass can be used to weight the original flat.
When doing flatbed scans of very large format originals, more than one person may be required to ensure the positioning and care of the original is maintained. A further person can operate the scanner as needed.
Technical issues for scanning
Prior to capturing a digital image a 'pre-scan' is made to adjust and set the capture parameters in the scanner/imaging software. The following settings need to be established (for each scan, unless a scanning setting has been established for multiple items):
1. White point (highlight)
These are the areas of the image which may be used to maintain a full range of tone values for the image. The 'white point' or highlight is taken from the whitest area in the image with the most detail. The white point function on the scanner can be set on this point in the image and the tone values for the image will be adjusted.
It is important to choose a good white point, preferably towards the centre of the image. Do not use specular/reflective or overexposed areas as highlight points.
If the scanner provides a histogram or densitometer function then add about 5-10% to the white point value to give a little headroom for other highlight areas in the image and to ensure their details are not lost in the output image.
The overall 'weight' of the image is controlled by the brightness setting. Most scanners allow for adjustment of brightness or gamma that can lighten or darken the image. This function can help ensure that the colour saturation or tones of the image are closest to the target tones on the original. However, after adjustment it is essential to check the highlight and shadow settings as they may have been affected by any changes made.
These are defined as areas of the image that are dark and contain the most detail. It is essential to maintain the maximum detail in dark areas without them becoming too black or looking greyer than they actually are in the original. On an RGB readout, values of around 7-10 would be considered usual.
This function is not quite as important as the white point value in the scanning device, but devices without this support will be more limited in adjustment capacity.
The images on the preview scan can be optimised through cropping to ensure that only the original item is scanned and not the scanner bed, original mounting/frame or other extraneous matter.
It is essential that the cropping does not remove any information content from the original. For art images, a trained eye with subject knowledge will be able to make decisions about inclusion of frames and how to crop non-regular forms without jeopardising content.
At this point all the adjustments that are deemed fit to best represent the original have been made and the original is ready to be scanned. The operator will get a fuller image to assess and ensure that the tonal range and the colours are representative of the original.
Scanning (QA): Quality Assurance
It is recommended that the first scan of a new item and media type is done at a quite low resolution and the results assessed for colour and tone fidelity against the original. This will provide useful guidance for further scanning from the collection and save time if any adjustment is required.
The operator should assess the image against the original or the colour target to assess accuracy of representation.
The operator should also look at the image's histogram to assess whether the highlight values have been 'stretched' or 'clipped', as this can affect the tonal range of the image and the level of information recorded.
When the operator is satisfied then the image can be saved to disk. If not satisfied then a re-scan is required.
A factor for 'bracketing'/'rescanning' should be accounted for in the budgeting of the digitisation process. This does not imply failure on the operator's part, but a realistic assessment of digitisation output.
Saving to disk and recording information
The output file should be created by saving the image to disk. The format for 'primary capture image' recommended is RGB in Uncompressed TIFF with the thumbnail option activated.
A recommended directory structure for primary capture images is:
This would work quite well for organising the output of an imaging project as a whole. However, bear in mind that each filename should be unique to each digital image. No filenames should be repeated otherwise confusion about the identity of digital images will result.
Adding technical image metadata
Technical image metadata are the data which describe the digital image itself, e.g. format, resolution, file-size etc. They can also include 'capture' details, such as creation date, creator (scanner operator).
Technical image metadata can be noted throughout the scanning process, as the relevant information becomes available, e.g. scan pixel dimension can be taken at the scan point from the software display, and the complete record added at the point where the 'primary capture image' is written to disk.
Image metadata should be entered onto a separate machine/storage space to enable the scan station to remain free for continuous production.
Every resource creation project will record varying levels of image metadata and hence may require different methodologies to keep the scan production process free of interruption. (For information on metadata relating to the intellectual content of the image, see Section 4.)
Writing to media (production-data storage and data delivery)
This section deals with two aspects of writing data to media:
Storage of captured images in the production environment
Hard disk space on production machines is precious. It may therefore be required to write production data to another storage device or portable storage medium on a regular basis to free production resources.
Transferring such data volumes across a network may be detrimental to the network performance. In an ideal situation, this could be written in overnight processes. However, this may not be possible; thus a local or portable storage medium is recommended to move the data quickly and free up production resources.
Transfer of digital images created
The sending of data of large volume and file size would be untenable for FTP or other networked solutions. The most efficient method is to write to a hard medium that is agreed by the parties involved.
CD-ROM is fairly standard, but each disk can take up to 15 minutes to write, the failure rate for the media is quite high and large data volumes would create a very large number of CD-ROMs. This adds to the costs and also the chances of data being missed in the delivery of the end product.
Recommended transfer media is thus one which holds more data than a CD-ROM such as JAZ drive formats.
(See Section 7.1, storage and media issues, for a wider discussion of storage issues.)
Return of originals
Often forgotten about in arranging workflow is the handling of originals when they are returned to their sites of origin. Originals need to be checked against inventories both when leaving the digitisation point and on return to their source, to ensure that all are present and free of damage.
On their return to 'suppliers', originals, as well as being checked, will also need to be reintegrated into their storage systems. Time should be accounted for this in any planning.
QA procedures on completed datasets
Depending on the digitisation scenario regarding how closely the original source providers and digitisation point are connected, this may need to be carried out at either the digitisation point and/or the original source point.
Newly created digital image files should be subjected to QA procedures to assess the fidelity of the images against the originals, to discover any digitisation anomalies and to ensure filenames and any metadata added are correct for the corresponding image.
There is a cost balance to be made in whether every capture file is to be opened and viewed in detail or even superficially. This is usually considered prohibitively expensive, but for collections which have many distinct items it could be considered appropriate.
A 15% random sample on all images created is recommended as a suitable proportion to pick up systemic defects or problems with operator technique. This would enable a focused check on certain file areas, should a problem become apparent.
The equipment and software for quality assurance should be able to load, view and provide information on the image files quickly.
For QA checks on individual newly created digital images, access to the originals is appropriate. QA should include looking for the following:
A workflow diagram, adapted from the JIDI reports, and depicting the primary image capture sequence is available in Appendix 2.
The capture/digitisation phase of the digital image creation process will result in the 'primary capture image', and its accompanying metadata being achieved. This is the key content-creation exercise for digital image creation for visual arts resources.
However, it is only a part of the overall resource creation project, coming after planning and rights management has been completed and alongside or prior to the establishment of metadata and delivery and archive systems. The remainder of this guide considers the other areas of resource creation.
The procedures outlined above, coupled with the information in the preceding sections, should act as a guide to help ensure that high quality digital images suitable for a project's needs can be planned for and achieved, with requisite insights into the issues involved. Idiosyncracies related to some projects may, however, benefit from individual consultancy on their digital image requirements. (See Appendix 1 for a list of useful organisations.)