Wednesday, April 05, 2006

Making Genealogy Accessible

The predominant approach of those trying to help novices engage in genealogy is to attempt to train them to do research. There are some that are extremely skilled in this area and have made a substantial contribution to the ranks of genealogy hobbyists and credentialed genealogists. There will always be a need for those with the rare talent of helping someone along the path toward sound genealogical research.

Last summer issued a press release indicating that according to a recent poll, 73% of Americans are interested in discovering their family history. Most of us have seen similar studies before. In the context of taking genealogy to common people, the question could be asked, “How do we get 73% of Americans to participate meaningfully in family history?” It is probably not feasible to get them all through a genealogy class or paired with a great mentor. It also seems that a relatively small percentage of those interested in their ancestors are able to do genealogical research. This is not a derogatory comment about average intellect but a recognition that there are many factors beyond cognitive skills which prevent people from doing genealogical research.

So if the intention is to take genealogy to common people, how can it be done? The philosophy behind this blog is not to attempt to turn 73% of Americans into researchers. Rather, it is to encourage innovation that makes the experience of genealogy substantially easier and more engaging than it is today. Make it so that individuals don’t have to become researchers in order to have success finding their family history. Simplify genealogy for researchers or those that want to become researchers as much as possible.

This will require technology to do much of the heavy lifting for the uninitiated. There are many examples that could be drawn upon to illustrate this type of shift. Some that we are all familiar with include:
  • the shift from paying a telegraph operator to send a message for you to picking up the phone and calling someone
  • the shift from professional typists to ubiquitous e-mail
  • the shift from professional photography to point and shoot to digital cameras

One of the most interesting things about these shifts from a domain of experts toward pervasive use is the increasing rate at which they happen. Some recent shifts that are happening quickly but not yet complete:
  • blogging
  • video editing
  • podcasting

We are not attempting to teach 73% of Americans to be genealogists. We are attempting to make genealogy accessible to 73% of Americans.


Enochville said...

I agree that technology needs to be involved in order to make genealogy more accessible. The LDS Church is digitizing their entire granite vault collection which contains microfilm and fiche of orginal vital and church records throughout the world.

This is advantageous for many reasons. One, no one has to wait a couple of months for the film to make it to a local family history center, or go to the FHC during their hours, or pay for the film to use it. Two, once the info is in a digital form, it will be easier to index it so that search engines will be able to locate one's ancestor in a particular file.

Ben Crowder said...

It seems like the first thing to do is identify all (or at least the main) obstacles getting in the way of making genealogy easy and engaging. Why do people stop doing genealogy? What irritates them? What makes them avoid genealogy and find other things to do instead? Sure, there's no way to make genealogy be 100% easy, because some things in life are hard, but I agree that there's a great deal of improvement and innovation to be made.

Case in point: I have a hand-coded website and a blog. To make changes to the website, I have to go in and edit the PHP source by hand, then copy it over to my web server via FTP. Not huge, but when compared to working on my blog, which runs on WordPress, it's a pain. And as a result I end up working on my blog far more often than on my website. Psychologically, WordPress feels smooth, whereas editing a website by hand feels like gravel inside my sock.

And genealogy can be smooth and easy.

Mark Butler said...

A very serious problem in making family history accessible to more people is the practical inability of any desktop family history software to enable multiple family members to work on the same project.

The current state of GEDCOM based match merge is primitive in the extreme. Are there any software packages that merge based on transitive relationship closure? Or display the differences between two files? Or group modifications into independently mergeable change sets? There isn't even a something resembling a 'patch' format - i.e. an appliable representation of a set of changes to a base file.

Right now, family history software is generally about as collaborative as software development tools were about forty years ago. It is generally still "personal" software, not "family" software in any real sense.

Even something as uncontrolled as Wikipedia would be an improvement, but the level of editorial control supported by distributed revision control systems is a much better model.

Dan Lawyer said...

I couldn't agree more. I think Dallan Quass was talking about these same issues when he suggested synchronization as an 11th item for the top 10 list. Currently the primary mechanism (as you point out) for transferring data between users is GEDCOM. Could a mechanism be created to facilitate this type of control using GEDCOM or by making small extensions to GEDCOM? Would it require a new file format, database structure? What would it take to solve the collaboration issues described by Mark at a data level?

Ben Crowder said...

For what it's worth, I'm starting work on a CVS/revision control system for genealogy, called Beyond. It was originally going to be an only-for-Mac record manager, but something like CVS for genealogy is really needed and would make life much, much easier for collaborating genealogists, so I've changed the focus of the project. Revision control is the right direction for collaboration -- it's worked for years for software developers, and I can't see any reason why it wouldn't work for genealogy.

No, I don't think GEDCOM is a
good base format for something like this. I envision a central server (with the data stored in an SQL database, probably), with various clients synchronizing via diffs sent in XML. Something like that. But definitely not GEDCOM. :)

Dan Lawyer said...


I agree that revision control is critical for collaboration. I would expect multi-user revision control in an asynchronous environment to be challenging from a user perspective. The trick will be to make it seem natural and intuitive to the user. A good friend of mine talks about this as a 'user tax'. User tax: the amount of pain a user must endure to derive the value of a feature.

Anyone know of any good implementations of a multi-user revision control for an asynchronous environment that ordinary people can use?

Ben Crowder said...

Revision control does indeed need to be seamless and transparent to the user. Perhaps the software should sync automatically with the server/repository, since the user would be able to roll back any undesired changes. (Hmm, this provides unlimited undo, basically. A nice bonus.)

The question is, do users want to have to click Upload and Download buttons? The only reason I can see is that you may not want to commit your changes until you've finished making them, but with rollback that doesn't seem as important. Something behind-the-scenes would be better.

The trick then is dealing with conflicts. Maybe the best way to deal with it is in the pedigree (and other views) -- flag individuals with conflicts (or even individuals who've been modified by someone other than yourself). A dialog box would get in the way and be annoying, so the flagging should be in the background, just part of the UI. That way the user can fix them when he/she wants to.

Dallan Quass said...

I think all of these are good comments, but I think we're missing the point. In the survey, 73% said they were "interested" in their family history, but only something like 29% of those had ever sat down to draw out a pedigree. I once heard a very wise man say that "genealogy needs to be like a *game*." I think about this every day. If we want to get more people doing genealogy, we need to make it more fun, and improving how we create pedigrees isn't enough. Pedigrees are like tax forms. Accountants love tax forms, but how many other people do? Genealogists love pedigrees, but what about everyone else? We need to think about the kinds of things that most people find fun: looking at photos, reading stories, discovering and nurturing personal relationships, being creative and sharing what they know.

We need be thinking much further outside the "record manager box." For example, what if there were a website where people could upload pictures and stories about their ancestors, and those pictures and stories were tagged with regard to time and location and experience. Now suppose I know that my great-grandmother was born in Norway in the late 1800's and that she traveled to Minnesota when she was a young girl, but I don't know anything about what living back then or taking that journey would have been like. But let me search on the uploaded stories/photos to find individuals that were born in a similar time+location, or had similar experiences crossing the plains as a young girl, and I can learn more about what my great-grandmother's life might have been like through the lives of others. That experience might make me want to see if I can learn more about my own great-grandmother. It might even make me want to fill out a tax form (I mean pedigree chart).

Mark Butler said...

Although potentially intrusive and requiring some users to learn special skills, distributed revision control does make genealogy more accessible. Often, one person in a family will build up a reference file, and other members will get a copy of it from time to time. However, there is no way for them to monitor what has changed or what actual research has been done. Short of re-printing thousand page books, there is not a lot they can do with the updated file.

Just being able to see the change history would do a lot to improve the visibility of any research being undertaken, with numerous attendant benefits.

Mark Butler said...

One problem with using a derivative of GEDCOM or something similar as a patch format is providing context for sub record changes. Basically, sub objects like events do not have unique identifiers, so identifying them for updates and deletes is problematic. One way to partially mitigate the problem is to carry the unchanged lines from the base record as context and substitute the the ambiguous section completely if necessary.

It is also important that the xref identifiers not be duplicated or re-used, at least not for a reasonable amount of time, or the record identification problem becomes intractable.

Context-diff style differential GEDCOM would be adequate for distributed edits to the same file. Making the format base-file independent would require considerably more work along the same lines as the work required to make importing an arbitrary GEDCOM file a clean and trouble free process - in particular interactive inference-driven (i.e. relation transitive) alias identification prior to or during the merge process, instead of uncoordinated one-at-a-time hunting and elimination after the fact.

John Vilburn said...

It is true that there isn't a "revision control for genealogists". But, to answer one of Mark Butler's questions in the affirmative, there is a software package, PAF Insight, that displays differences between two PAF files. It also lets you selectively synchronize the changes that were made. There are also other efforts at collaborative environments. I believe that Ancestral Quest has something like this.

Those are steps in the right direction, but only baby steps. Making genealogy engaging for the common person requires a broader vision. A big part of that is using technology to make genealogy easier. But the more important question is "What makes genealogy engaging to someone who has no experience?"

Those who have gotten "the bug" generally talk about genealogy like it is a fun puzzle they are working on. This group of people is excited when they hear about more records becoming instantly available online. They like tools that help them find the right records. They want to collaborate with other genealogists.

We have to remember, though, that this is not the group that this blog is focused on. If we improve the tools that dedicated genealogists use, we are helping a few of the 73% by lowering the entry barriers. But what can be done to make family history fun/exciting/engaging for the vast majority of those who aren't really "into" it yet?

Is the answer to just make everything easier? Possibly. Could the answer lie in other aspects of family history such as photos, stories, and historical context? Maybe. Is there something we're missing. Probably.

Maybe we should each spend some time talking to some people who are interested but not involved. Ask them why they are interested and what might make them more interested.

Maybe part of the answer lies not in lowering barriers but rather in raising interest.

Dallan Quass said...

Well said John!

I wonder if people in the early days of automobiles or airplanes asked themselves a similar question - how to get more people involved in the hobby. And now here we are: almost everyone's driven in a car or been in an airplane. However, only a few know how to rebuild an engine or fly a plane, which was essential knowledge in the early days. I wonder what doing family history will be like 10-20 years from now.

Mark Butler said...

I am not suggesting revision control technology is "the answer" - just that from my experience the lack of it is the number one obstacle to collaboration within extended families. I consult for private genealogical research firm that run into variants of this problem on a daily basis.

They often generate gedcom files so that clients can have a copy of their own genealogy. Quite often clients have two separate files, one for the husband/father's side and one for the mother/wife's side. We use custom software to present a unified view for printing and gedcom generation.

So within a family, we might typically distribute three different gedcom files, H/F side, M/W side, and combined. Then of course the family members import the files into their favorite genealogy program and start making changes - perhaps just entering information regarding their own children.

As you might imagine - all these files rapidly get out of sync, not only with each other, but with the original copy back at the research firm, where very often active research is being conducted and entered in.

With or without our help, the family members have no reasonable way of exchanging this information except to write it down and have it manually re-entered everywhere.

In this case there are only three logically distinct files being worked on. In practice, whenever a new family is formed, they want to merge the genealogy from the husband's and wife's sides together into a new file, so a family with N married children deals with N+1 logically independent files - different record numbers and everything, such that merge capability that works with revisions of the same logical file no longer applies.

So now one person wants to get the relevant work that has been entered into say one of their sibling's files. The first obvious problem is how do they find it. I simple list of changes is not enough, because their sibling's file contains lots of information that about lines they are not interested in, but whose status is not obvious.

Now suppose a great aunt does some research - this process would have to manually be repeated by every interested party - perhaps twenty or thirty times. In practice it just doesn't happen at all and non-incidental collaboration between members of the same extended family doesn't happen at all, and the files in the possession of each member rapidly grow stale even with respect to research / data entry of their own near relatives.

Now if this can all be made transparent to the user, fine, but is is untenable to say it shouldn't be done at all because it is too complex - anything serious would improve the status quo, even if it has a considerable learning curve, because even use careful by a few would have a considerable impact on the quality and extent of the data available to other extended family members.

Mark Butler said...

In terms of the more "gee whiz" bells and whistles that might interest more people, I would suggest that automated, network based distribution of not just base genealogy data, but pictures, documents, map overlays, etc. is a potent "force multiplier".

Who wants do to a lot of work that they can't share with others? The ability to share effectively is a strong motivation to actually produce something rather than just learn about it.

Logan Allred said...

First, let me say that I think synchronization and collaboration will really enable a whole new level of community and excitement around genealogy. It's just way too hard right now to share little chunks of data with each other in any kind of reliable or standardized way.

As an example, my mother-in-law is an avid genealogist who just borders on computer literate. She's gotten pretty good at entering data into Legacy, but it took me forever on the phone to talk her through exporting a small subset of her family into a GEDCOM to send to me via email to merge in with my data. This really needs to be easier. I realize there are many complexities hidden below the surface, but these need to be worked out.

Second, we need to find some quick and easy tasks and can be good starters for what might be called the 20-minute group. I see this often amongst my siblings, who get excited and want to do something, but have no experience or training and only little chunks of time. They need tasks that can be done in about 15-30 minutes without too much experience (or scaled up experience, where they do more difficult tasks over time), yet are meaningful and interesting. I think some of these tasks exist, but haven't had the time to organize or prepare things for them when they suddenly call me up with the bug. Instead, the moment is past and they feel overwhelmed or inadequate. I'm reminded of something I saw last year, I think on Amazon, where for a few pennies each, anyone could log in and look at photographs to determine their suitability for one of their search tools. Things like transcription, photo tagging, certain searching tasks should be very accessible to many people and will give them a sense of accomplishment in a short time, and start getting names and stories floating through their heads, and just need some infrastructure.

Third, Forgive all the programmer talk, but one thought I've yet to try out but occurred to me recently while I was doing some Pair Programming--in this case teaming up a junior and senior programmer to code together--where each one takes turns typing while the other is thinking and they are both discussing. It's a great way for the junior programmer to get good hands-on experience (as long as the senior programmer can stand it ;) ), and there are tools for doing this remotely, like SubEthaEdit and Netbeans/Java Studio Enterprise as examples in the programming world. Some kind of shared workspace we can work on together remotely when they have a few minutes (NetMeeting/VNC/Remote Desktop might work for now). They can transcribe some information we found while I perform a more advanced search or prepare the source citation. They gain confidence, and then one night you'll get that excited call that they remembered the website you searched and went on by themselves and found something interesting, you say great, fire up the shared workspace and check it out with them and help them add it properly to the records with sources (and ideally an entry in your research log, etc.).