[PRCo] Re: Generic Description

Dwight Long dwightlong at verizon.net
Sun May 22 11:44:06 EDT 2011


Jim/Derrick (Not sure who wrote this--appears to be Jim but came from Derrick)

All that is well and good for a large business organization or perhaps even a large, well funded and staffed government or private museum.  But do you really think that any significant percentage of the enthusiast community is going to go to all that trouble?  I sincerely doubt it.  If CDs and the like have only an anticipated life span of a decade--even two decades--there are going to be a lot of wails of anguish down the road when those prized, never to be repeated shots have disappeared or become so corrupted as to become unusable.  It will make the complaints about color shifts in early Ektachrome and fading of Anscochrome seem like faint praise, by comparison.

Nor is the solution likely to be to print out each image and store it.  Unless one is willing and able to purchase and use archival inks and paper, the printed images won't last either.  At least not in the average enthusiast's storage conditions.  And probably not in the typical railway enthusiast museum's storage conditions, either.

We are accustomed to, and spoiled by, the relative permanence of silver halide prints.  We now live in an age where that technology is rapidly disappearing in common use.  If the projections you make, and I have seen similar ones elsewhere, are even half accurate, our progeny is likely to be confronted with a situation in, say, fifty years where there are ample photographs available (with or without captions) of twentieth century tramways and railways, but very little of the same for the early twenty-first century!  Having gone digital in 2000, I am no exception.

But unless we are more successful in regenerating the stock of tramway enthusiasts than we have been, few will care anyway by then!

Dwight
  ----- Original Message ----- 
  From: Derrick Brashear 
  To: pittsburgh-railways at dementia.org 
  Sent: Sunday, 22 May, 2011 11:19
  Subject: [PRCo] Re: Generic Description


  The Mime multipart apparently confused ecartis. Here's what he said
  before it got encapsulated and sealed away.


  ---

  The longevity of digital records is a valid concern.  Home-burned CD's
  and DVD's are only good for a decade or two, though newer media is
  better (http://www.thexlab.com/faqs/opticalmedialongevity.html).  Hard
  disks fail; computers die.

  The best way to handle digital information is to make it redundant.
  Additionally, the best way to make sure it will be readable is to use
  formats that are open and will still have programs that can read them X
  years from now.

  I'll try to find concrete requirements from other organizations, but in
  my experience is that

     1. Multiple RAID-6*  or NAS systems in different buildings that sync
        often (hourly or nightly)
     2. Have backups* at buildings different from ones with the systems
     3. Have copies of backups at different buildings
     4. Test backups regularly
     5. Keep gradated backups
     6. Storage space required to backup data is ~3-4 times the actual data

  * RAID-6: given X identically sized disks (of size y) provides (X-2)*y
  storage space and guards against 2 disks failing before recovering
  information cannot happen with 100% accuracy.  Often a "Hot spare" is
  also used which stores no data, but if a disk fails the system
  automatically rebuilds the raid array using the spare disk.  Raid is not
  a back up.  Raid helps recover from hardware failure.

  * Backups: Can be another running system, optical media, or tapes stored
  at a different location from the main system.

  I'm estimating that with 200,000 images (slides, negatives, prints,
  glass slides) (correct me if I'm too far out of the ball park), each
  scan will be on the order of 30MB (kodachrome slide at 2000 dpi 4bytes
  per channel and some fudge).  We would also have some compressed
  versions and thumbnails, so let's say another 6MB.  We'll probably have
  about a megabyte of metadata (praise anyone who's typing more than a
  million characters as a caption per slide?)  So, we're at 40MB (give or
  take).  That gives us a grand total of about 7TB of data to store.

  A system capable of storing that much data would run (with commodity
  hardware) around a one to two thousand dollars (see end of email).  Tape
  drives that store TBs of data can be expensive.  NAS (Network attached
  storage) are also very useful.

  It might be best to make DVD's incrementally as we add the initial
  data.  As metadata changes (I assume images won't change often) we can
  add those to DVDs incrementally as they are changed or at the end of the
  day.  The DVD's should be made in duplicate and tested yearly, though.
  I don't have enough information on the longevity of BluRay disks to
  recommend them at this point.

  The key points with digital information though are:

     1. Have duplicates store separately and securely
     2. Have spare (hot) media on hand
     3. Test those duplicates regularly

  Whether it be DVDs or hard disks, those 3 points apply equally.

  The graduated bacup part above is important so that, if there is a
  problem found (data corruption, malware (though unlikely), or just
  "someone delete that!?!?!" we can go back in time and find the file at
  some point (hopefully in the not so distant past).  The amount of time
  we want to go back will also affect how much space we need.  Though, if
  the images don't change often/at all, storing all the changes in
  metadata and system configuration is next to trivial.

  Having the backups stored at separate facilities is very important in
  the case of natural disaster and fire.t

  I'm still looking for proper documentation from universities, museums,
  libraries, or government on proper procedures, but this is a good primer
  on it.  The basic idea is to accept that things fail, and instead of
  trying to prevent it (like you would with hard copies (concrete
  buildings, fire suppression, &c)) you simply live with it (through
  physical redundancy, though fire suppression &c helps too;) ).

  Just to stave off any possible misinterpretations, securing the physical
  media is a definite must.  I feel that having digital copies and records
  of that media will aid in securing it, reduces the times it is actually
  needed to be accessed, and allow the records to be better curated and
  edited, because anyone (we let), anywhere could look over a record and
  make it better.  If need be we could also keep hard copies of all of the
  (metadata) records, though that's a lot of paper.

  This is not limited just to images, either.  The books and other paper
  records at the museum can also be scanned, OCRed (optical character
  recognition, image to text basically), and made searchable and viewable
  without having to handle the paper medium.  Everything I just said
  applies equally as well to scanned documents.

  Hope any or all of this is helpful,
  Jim

  On 5/22/11 12:51 AM, Dwight Long wrote:
  > Fred
  >
  > Just keep those glassine negatives away from humidity.  I had all my older
  > (50s-70s) in same but somewhere along the line Delaware humidity caused the
  > glue on the envelopes to seal, but worse, in many cases the envelopes to
  > adhere to the negatives.  Major restoration effort required to salvage them.
  >
  > My later ones are in archival sleeves.  Hopefully the material will perform
  > as advertised.
  >
  > Of somewhat greater concern is the longevity of electronically recorded
  > data.  Will images recorded on CDs, hard drives, DVDs, etc., still be around
  > in x number of years?
  >
  > Dwight
  >
  > ----- Original Message -----
  > From: "Fred Schneider" <fwschneider at comcast.net>
  > To: <pittsburgh-railways at dementia.org>
  > Sent: Friday, May 20, 2011 7:17 PM
  > Subject: [PRCo] Re: Generic Description
  >
  >
  >> Jim:
  >>
  >> I was trying to throw out the complications.
  >>
  >> Ed Lybarger, with his fabulous sense of humor,  explains how complicated
  >> it can really be by placing a number on the door of the library at PTM.
  >> The number on that door is the Dewey Decimal System number for railways.
  >> He is telling us that everything in that room is one number in the time
  >> honored library cataloging system and by inference that the standard
  >> system doesn't work at all when you have 10,000 square feet of floor space
  >> covered with stuff all meeting the same definition.   (Actually Dewey used
  >> 385 and 625 and we probably would not know how to split them.    The first
  >> was transportation; the second was technology.  Can you visualize the guys
  >> arguing over which is which?  He had no separate category for trolleys;
  >> just railroads, although the on line reference I have only shows the first
  >> three digits ... railroads and highways all are in 625.  After the decimal
  >> we might split it into trolleys.
  >>
  >> Now you have to find a new system and it needs to be a system that works
  >> not only for the aficionados who collected that crap but for the people
  >> who know absolutely nothing about it.   The hired educator for the museum
  >> who has to teach children about trolleys has to be able to find what she
  >> wants in the library without becoming discouraged.   The director needs to
  >> be able to use it to answer a newspaper's query.   The librarian needs to
  >> be able to find the pictures we have from Williamsport when someone wants
  >> to do a book.   Hopefully the library will also be a resource that
  >> contains more than just pictures and an occasional engineering drawing;
  >> wouldn't it be nice if it also contains financial and business records
  >> about the industry?
  >>
  >> I can tell you a lot of the problems but I personally cannot be there much
  >> of the time because I live four and a half hours away.  If I were five
  >> miles away, I would probably be there two days a week but I'm not there.
  >>
  >> The ideal way of archiving collections is to put everything in a standard
  >> data base.   Sometimes you simply don't do what you know you should
  >> because you have so little free time that you must attack those things
  >> that were not done in any way at all and ignore those that were done.
  >>
  >> For example ... my trolley negatives are there.   They are already in
  >> individual glassine envelopes with the negative and the envelope each
  >> bearing a file number.   Perhaps it is not the best storage medium but it
  >> is workable as long as they are all safety film (and all except a handful
  >> are).   All have been numbered from T-1 up to T-3000 something.   There is
  >> a loose leaf index that describes each one in numerical order.   There is
  >> also a file of photographic proof sheets in order by company, i.e. all of
  >> the Pittsburgh negatives were pulled out and proofed and then those 8x10
  >> proofs are in a Pittsburgh folder.   All the Washington negatives were
  >> proofed and in a Washington folder.  And so forth.   Now, even if that
  >> does not meet your standards, do you mess with it or do you simply leave
  >> Fred's filing alone?   Answer, until everything else is done, you probably
  >> wisely leave Fred's system alone because you don't have the money to redo
  >> it.    You spend precious resources on th!
  >> e negatives that are not identified and those that are not in acid free
  >> envelopes.   So what do you do with Fred's?   You probably put an FS in
  >> front of his number (or something else unique to help you find them) and
  >> then copy his file into a data base as simply as possible and scan them
  >> ... you make it a KISS project because there are too many other projects
  >> screaming for help.
  >>
  >> More important might be to take all of the thousands of negatives I
  >> brought over from the Goldsmith and Watts collections that are mostly on
  >> non-safety film (highly combustable) and refile them in open sided, acid
  >> free envelopes and then build a concrete vault away from the main building
  >> to house all the combustible negatives....  Can you see the need for
  >> millions of dollars?
  >>
  >> If you are not familiar, remember the words SAFETY FILM on the edge of
  >> films produced in the 1940s and 1950s?    As long as there were still some
  >> older combustable materials produced, the newer cellulose acetate
  >> materials were labeled SAFETY FILM.   When we moved from glass plates to
  >> flexible materials, the films were made of cellulose sodium nitrate.   It
  >> will, if stored in stacks, spontaneously combust.  It needs to breath.  If
  >> you get enough of that crap, it will blow the roof off a building.
  >> Theater movie projectors were designed with very sophisticated light
  >> baffles so that if the motor quit running, the light would also be shut
  >> off to prevent combustion of the film.   My father remembered a major
  >> theater fire in Cleveland in the late 1920s.   I've been told that an
  >> entire 800 foot, 20 minute reel of 35mm film could easily go up in smoke
  >> in seconds.   The Hippodrome in Lancaster was gutted in the early 1920s
  >> ... same reason.   Eventually it became law that projection !
  >> booths in theaters had to be surrounded by concrete!
  >>
  >> By the late 1930s we were producing films on cellulose acetate ... but
  >> some photographers still bought the cheaper stuff.   I know my father
  >> still found some nitrate base 35mm film right after World War II ... he
  >> had that 35mm film rolled up and it basically turned to jello.
  >>
  >> That should give you a clue that a lot of the collections from older
  >> railfans are time bombs.
  >>
  >> Those images on sheet films made of cellulose sodium nitrate are largely
  >> lost because the thicker the base, the more likely it was to decompose.
  >> I remember Harold Cox telling me that most of the Philadelphia Rapid
  >> Transit archive from the end of the glass era until the beginning of the
  >> safety film era had virtually vanished because it was professionally done
  >> on thick sheet film negatives and they simple decomposed to flammable
  >> dust!  (Thinner roll film negatives were more permanent.)
  >>
  >> So, Jim, do you worry about what Fred did with his collection?   I don 't
  >> think so.  It is not done in a fashion which I believe suitable for future
  >> users.   I wrote it like a railfan.   The journal reads:  "Company, car
  >> number, direction, location, date and any other relevant items we might
  >> like.   If I were redoing it today for a new generation of users, I would
  >> probably put city, county, state, date right in the first position.    But
  >> there is a record that someone can work with in a few years when I'm gone.
  >>
  >> Fred
  >>
  >> (Only proof read once ... if you don't understand something, ask.)
  >>
  >>
  >>
  >> On May 20, 2011, at 4:29 PM, Jim Keener wrote:
  >>
  >>> Sorry for my naivit�.  I guess I'm trying to jump into a discussion I
  >>> haven't been involved in before and might not know pre-existing
  >>> protocols.  I've done databasing and cataloguing of things, but never
  >>> really archiving before.  I'm also not familiar with how other museums
  >>> arrange their archives.
  >>>> 1)   The title that includes company and car number is bad because you
  >>>> might have, in a museum such as ours, a hundred identical titles.
  >>>>
  >>>> 2)  That description: "West Penn.  FT 3.  Connellsville Shops."  is
  >>>> apparently what Frank put on the slide and it means nothing to the
  >>>> average person.   If you come to the museum from Pocatello, Idaho, what
  >>>> does Connellsville shops mean?   But a descriptor that reads "Company
  >>>> car repair facility in Connellsville, PA" might be understandable.   And
  >>>> what does that FT 3 indicate.   Be damned if I have a clue.
  >>>>
  >>> While not an ideal situation, it's at least something.  For instance,
  >>> "West Penn.  FT 3.  Connellsville Shops." doesn't really mean much to an
  >>> outsider.  However, someone can come along later and flush it out
  >>> later.  Especially if these are all scanned in and in database, it's
  >>> trivial to change the captions and keep track of the changes.  Even if
  >>> they are captions on paper, it can be changed later, but at least
  >>> something is there and initial time can be spent towards ones with
  >>> poorer captions (e.g.: company and car number with no location).
  >>>> A description should probably start with a file number or archive
  >>>> number.   Next we probably need to figure out who the user is and what
  >>>> he wants.   Does he want to find West Penn Railways?   Or does he want
  >>>> to find trolleys from Uniontown, PA?   Or might he be interested in
  >>>> trolleys from Fayette County, Pennsylvania?   Or Southwestern
  >>>> Pennsylvania?   All of these are possible descriptors that we might wish
  >>>> to use to help the user find something.   Remember guys, we're looking
  >>>> at this as rail fanatics.   The ultimate user might not be one of us.
  >>>> He might simply be a transport historian or a historian in general 50
  >>>> years from now.  Incorporating the car number into the descriptor might
  >>>> be a minor thing for the user we will be serving.   (I am a railway
  >>>> historian trying to think how someone else might want to use our files
  >>>> when we are not here.   I can look at the declining number of hobbyists
  >>>> in groups like the NRHS or the ERA and understand that we won't be
  >>>> here.)
  >>> Will the database be electronic, or do you want a lot of information on
  >>> the physical slide and in the record number?  If its electronic a record
  >>> Id on the slide might suffice?  Otherwise, the identifier on the slide
  >>> could contain encoded information. <map grid> <company> <car #> <year>
  >>> <record id>.  The map grid could be designed to flow so that someone
  >>> looking through the physical archives wouldn't have to skip around all
  >>> too much to view someone geographically close. Lexically sorting by the
  >>> order suggested would have the records sorted in a psuedo-geographic
  >>> manner and then grouping by company and car.
  >>>
  >>>> Countless hours?   Again, nothing is impossible for those who are not
  >>>> doing it.     If you have 200,000 photos that need to be captioned and
  >>>> it takes an average of 15 minutes to do a caption, we are talking 24 man
  >>>> years.  Is that a safe number for the collection.   Might be.   My own
  >>>> collection is close to 50,000 prints and I am simply extrapolating from
  >>>> the number of file cases.
  >>>> I have not hauled the other file cases out to Washington yet.   I might
  >>>> add that PTM also has my albums already and that might include another
  >>>> 5,000 prints or six months worth of full time data entry.   Did I hear
  >>>> anyone volunteering?
  >>>>
  >>> I'd be near useless identifying places outside of the city, but I would
  >>> be able to scan and/or enter descriptions into a database.  Doubly so if
  >>> I could take a small deck of slides home each week and do them at nights
  >>> and mornings when I have small bits of time to spare, though I don't
  >>> have a slide scanner at home.
  >>>> Ray, a simple description is fine.   One that reads West Penn 700-type
  >>>> car on the Fairchance line believed to be near Hopwood about 1948 is OK
  >>>> until you refine it.   But it requires historians willing to write such
  >>>> words as "believed"  or "unknown" or "suspected" or "circa" or "about"
  >>>> when we do not know for certain.
  >>> Is it uncommon for people to mark their captions with uncertainty? Do
  >>> they just refuse to write them or write them with certainty?
  >>>> Perhaps trolley near Hopwood, Fayette County, Pennsylvania circa 1948
  >>>> might even be better for the future user with the railfan details buried
  >>>> farther down in the description.
  >>>>
  >>>> Regardless, what is written needs to be correct and there are thousands
  >>>> of pictures and slides which were never captioned.  The guys that
  >>>> volunteer simply look at Ed and say what's this.   Then he throws them
  >>>> in a pile and waits for Fred to appear.   There are still going to be a
  >>>> large number that I don't know.   We need more resources.
  >>>>
  >>>> When I edited Headlights magazine 40 years ago and someone gave me a
  >>>> picture that they couldn't identify, I used it to fill space.   It
  >>>> became a Can you identify this? feature.   But we had national
  >>>> circulation.   We usually found out.   Unfortunately doing the same in
  >>>> Trolley Fare probably won't get us the same following.
  >>>>
  >>> A friend of a friend did this: http://retrographer.org/  I don't know
  >>> how useful it would be in helping us though.  I'm not sure of their
  >>> traffic volume.
  >>>
  >>> Also, wouldn't it be OK to scan in slides and negatives as-is and
  >>> caption them with all the information on the slide (if any) and caption
  >>> them later?  It would be easier on the physical media to not have to be
  >>> handled as people try to figure out where it was taken and what is in
  >>> it.  It would also make it easier for the general public to browse.
  >>>
  >>> I could also imagine some computer vision (CV) or artificial
  >>> intelligence (AI) students at CMU or Pitt having fun (doing a school
  >>> project) trying to guess locations, which would then have to be approved
  >>> by a human.  It'd only be useful with a reference of some kind in part
  >>> of the picture, however, but there are good/decent archives of much of
  >>> what's in the city as well as how extensive Google Street View is around
  >>> the city which could help. Just a thought ::shrug::
  >>>
  >>> Jim
  >>>
  >>>
  >>> -- Attached file removed by Ecartis and put at URL below --
  >>> -- Type: application/pgp-signature
  >>> -- Desc: OpenPGP digital signature
  >>> -- Size: 901 bytes
  >>> -- URL :
  >>> http://lists.dementia.org/files/pittsburgh-railways/02-signature.asc
  >>>
  >>>
  >>>
  >>
  >>
  >


  On Sun, May 22, 2011 at 11:16 AM, Jim Keener <jimktrains at gmail.com> wrote:
  > -- Attached file removed by Ecartis and put at URL below --
  > -- Type: text/plain
  > -- Size: 19k (19473 bytes)
  > -- URL : http://lists.dementia.org/files/pittsburgh-railways/ecartjjX7Wl
  >
  >
  >
  > -- Attached file removed by Ecartis and put at URL below --
  > -- Type: application/pgp-signature
  > -- Desc: OpenPGP digital signature
  > -- Size: 906 bytes
  > -- URL : http://lists.dementia.org/files/pittsburgh-railways/03-signature.asc
  >
  >
  >
  >



  -- 
  Derrick






More information about the Pittsburgh-railways mailing list