Wednesday, August 29, 2012

Indexing and cross-referencing

MyHeritage recently introduced their new SuperSearch technology and hopefully they will also sometime in the future include newspapers and other sources from Sweden. (and the whole world).

But is that a task that simply is to big to handle?

Here in Sweden, genealogy is not yet such big business as in USA and other parts of the world.
But many Swedes are interested in doing genealogical research. Our church-records are quite good (mostly) and free to use (at the archives).
Now these days most people want to have the sources available in digital media, and the most important sources are scanned and to access them you need to pay for the service. Thats OK with me, many people here battle and argument that also the digital sources should be free, claiming its a constitutional right.

Indexing are an interesting thing. Indexing takes the digital media one step further by enabling searching. And searching a database is of course a major, huge advantage compared to the search in going through the books manually.

But still, the technology of OCR (optical character recognition) and similar software are not yet able to scan/read and produce an output that is useful when it comes to old historical handwritten records. I am not even sure if this is something that is worked upon? But I hope it is.

Anyway, that means that indexing are still made by humans manually, which of course makes it time-consuming and expensive. And since not everyone that actually perform the actual indexing are experts in reading and understanding old handwritten documents, the results may be variable and unreliable in many degrees and ways.

There are alot of projects today involving indexing here in Sweden and I guess that is great. Some are very local and some are covering the whole country.

For example there are CDs available for the 1880, 1890, 1900, 1970, 1980 and 1990 census here in Sweden. These CDs are not reliable as a primary source, but great tools when it comes to locate someone that you lost track of.

There are also indexing projects ongoing for old Swedish newspapers. And there they try to use OCR-software (since printed media are more uniform and readable in general.) That works rather well, but not perfect. See an example from Swedish National Library (KB)

So when we finally have many indexed sources, the most important thing that needs to be established and developed is the possibility to do cross-referencing searches. And that I guess is exactly what MyHeritage´s SuperSearch is all about.
To be able to search for a person and that search will be able to retrieve and present all relevant information from all indexed sources, that will probably be one of the most important future feature in genealogy.

Will this ever happen? I don´t know, but it would be nice.
And if all these CDs could instead become online databases with possibilities to make needed corrections and additional input... Well then we are talking...

The future will be exciting in many ways when it comes to genealogy. DNA-testing is already here and will surely be a natural- and integrated part of traditional research.