Friday, June 01, 2007

Source-Centricity

Ex Nihilo?
Years ago I worked on a project using ethnographic research methodologies to study the life cycle of the creative process of knowledge workers. One of the primary take aways from this research was that there is no ex nihilo creation (ex nihilo is Latin for out of nothing). I'm not sure why we got hung up on the Latin phraseology. The point is that every 'new' creation or discovery is always built from something, it never comes out of nothing.

Genealogy is the same way. There is always some piece of evidence or source material which leads us to draw a conclusion about an ancestor. Some of you may wish to argue the no ex nihilo creation principle based on the grounds that you've seen many conclusions that appear to have come out of nothing but technically even those have come from somewhere. For many reasons, keeping track of this evidence is a core requirement to successful genealogical reasearch.

Warning, I'm about to make a critical comment. Please don't take offense.

It would seem that the genealogy industry or rather, those that provide tools to the genealogical community have fallen short on this requirement. I have yet to see a system that accurately maps the relationship between genealogical evidence and conclusions based upon that evidence and does it in a way that assures integrity (you should not be able to add, edit, or heaven forbid share conclusions without accurately associating the evidence). Nor have I seen a system for tracking evidence that is sufficiently intuitive and usable that it does not require staunch discipline to use. There is of course a reason for this. It is complex and hard. I don't believe it is impossible. In fact, it seems like a medium-hard problem.

While I can't offer statistics on this, I can say that the overwhelming majority of lineage-linked pedigrees submitted to the Church do not have source citations. I doubt this is a surprise to you. Why is this the case? Sure people pass around GEDCOMs, download pedigrees, etc. but this is just a proliferation of the problem, not the root of the problem. I believe the root of the problem is that the tools used to organize our family history do not offer a usable tool set for tracking evidence. The user interfaces and data models do not accurately represent the reality of the relationship between the evidence and the conclusions. They have totally missed the boat on what the user needs to accomplish in this manner. They have built square pegs for round-holed problems and users have been forcing the pegs through the hole ever since. The process is so hard as to prevent anyone but the most diligent from doing it. I am acquainted with many skilled genealogists that have given up on this process and simply file their evidence and notes into their filing cabinet.

I want to throw down the gauntlet. The manufacturers of these tools need to step up to the plate and solve this problem. We hope to take our first stab at this as part of our continued efforts on the Life Browser prototype on the FamilySearch Labs site. I'm sure we'll experience many failures before we get it right.

I'd love to hear your feedback on this issue. Have I missed some great tool out there? Does someone have a solution already? Have I finally gone completely off the deep end? I know these aren't mutually exlusive questions but please, share your thoughts.

15 comments:

Anonymous said...

I agree with much of your article. While I am impressed with the progress in making original records available online, most of the genealogy-specific tools are ancient and cumbersome technologies. Where are the innovators?

My most useful research tool is Diigo , a social bookmarking system for researchers - all researchers, not just genealogy. Using Flickr I can post current family photos along with my archive of old photos and documents. AND, as a Floridian currently in the middle of a tropical storm, it's a cheap and easy way to safely archive my collection - old and new. Add to that, my family blog using WordPress. I'm presenting these documents/photos and the stories that go with them in small, manageable chuncks - not waiting until I can do a complete "formal" family history.

Each of these systems is easy to use and either free or reasonably priced. The one thing that really ties them together is tagging. Adding a few keywords to each article or photo makes it searchable for everyone.

These tools may not be the answer for connecting structured sources locked in genealogy databases to our research, but they are significant in releasing the huge collection of personal records hiding in closets and trunks all over the world. That's a source that's largely been ignored.

Logan said...

I too have been surprised that nothing has really come out in this space. However, I am not surprised that it hasn't come from existing vendors. It doesn't make financial sense for an existing vendor to expend the effort on a product that could flop terribly (remember, this is a market where out-of-date PAF is still way more widely used even though there are is a nice selection of affordable superior products). Something new like this I expect to come out of a new vendor trying to create a new market, or out of academia or the Church where the need for financial reward is not as high.

I know of several people who want to build tools like this, but don't have the resources. And apparently this aspect of the genealogical tools market isn't seen as viable enough for a startup to take a leap at it (face it, there is a lot of lower hanging fruit for a startup).

I'm glad that the Church is starting to take interest in the tools market again, because I think they are uniquely positioned to do things like this. They have the resources (if they choose to use them) and they have a vested interest in the growth of this space.

But I think the gauntlet has been thrown to the wrong group. Only existing vendors who are losing their market are likely to take a risk like this, and they may be too tainted by the existing paradigm to really innovate successfully here. I don't see why there's any real responsibility on existing vendors to step up to the plate and solve this problem, only an opportunity for a new market.

In fact, I don't see the need for any gauntlet at all. The Church has finally picked up their gauntlet and is pushing forward. Instead let's get a conversation started with the startup market, LDSOSS, existing vendors, and academia (especially BYU) and tackle this together. There are a lot of smart people who care about this, but there just doesn't seem to be enough momentum in any one place to get something moving.

Anonymous said...

Dan,

I was much impressed with your presentation at the National Genealogical Society in Richmond - with the way you stepped in at the last moment as a substitute, the way you disregarded the lack of connectivity in the lower depths of the conference hotel, with the quality of questions and suggestions you received from the audience, and the aplomb with which you answered them.

You are clearly headed in the right direction by seeking to provide a mechanism so that the ordinary family historian - even the novice - can provide documentation.

As a genealogy librarian, I frequently hear negative comments about people who simply download GEDCOM files that have no notes and no sources listed. Your point is well taken. Whether listed or not - there is some reason that the original person put that person in the chart or used that particular estimate of birth date.

Possibly an interface could include a source blank to be filled in either with a source citation or a number of blanks to be checked - say, computed from age at death, estimated from guessing an age of a relative in person, from a photograph, etc. That would relieve the common person of their perceived need to come up with a book and author and page number or a census date and state and district and page number, when that is not at all how they derived the information.

Anonymous said...

Hurray! He's blogging again!

>>> I'm not sure why we got hung up on the Latin phraseology. <<<

Used a lot by other religions to talk about the creation. Seems appropriate in this context.

I had a strange thought (maybe because of the hour). Currently in all of our genealogy programs, we enter data on people - our conclusions. It's a throwback to the old paper days when we filled out family group sheets.

BUT .... What about a program where we enter our sources and what they contain? And let the program fill in the group sheets? Instead of being people-centric, it would be source-centric? Wouldn't that greatly simplify the source citation? It would also allow us to enter "scraps" of information and the program will help put it in context - perhaps at a later date when more info is found. It would also encourage people to do research to add more sources. A GEDCOM download would be just one source. A grading of sources could show that the GEDCOM is unsubstantiated.

I know it would help me when I do my research. I keep finding that sources are like peering at your genealogy though a toilet paper core. It's only showing you one fragment. Without a big charge of all my known ancestors, it can be difficult to know what this small piece of the puzzle means. But with a source-centric program, I can just enter it. If it can't find it's own place in the puzzle, then later I can manually move it into position. Or leave it as "stray data" that I might be able to tie in later.

We can take this a step further and link sources together. Either individually or collaboratively. As we link sources together, we will build our genealogy - the right way.

bjs said...

I haven't actually experimented with it because I don't have a Mac, but the software that is tempting me to get one is Tinderbox from Eastgate whose creator is Marc Bernstein.

It uses hyperlinks to connect things that a related in ways that may not be hierarchical. And it seems as if it can be used in many different capacities. As a fiction writer, I'm as interested in it for novel structure and making sure loose ends are tied up, as I am intrigued by its possibility as a way of keeping track of collaterals and neighborhood clusters.

I haven't see any hyperlink technology that addresses genealogical concerns specifically (which may mean I've simply missed it), but it would seem to me a very useful way of allowing researchers to show "underlying connections" and perhaps indirect as well as direct evidence for statements they make regarding family makeup.

Anonymous said...

This is an important topic.
A Genealogy needs to be built up from the facts. The GENTECH Genealogy Data Model addressed this need in 1999 but to my knowledge no genealogy program has been written that works on that basis. One problem is found in the "Innovators Dilemma", namely, the current companies are successful giving customers the current conclusion based programs but are not set up to give customers something else.
Another problem is that I don't think there is an understanding of what a source based program should look like. We all know what a pedigree and family group record should look like. We know what the source records can look like when entered. We don't know what the analysis screens that take us from one to the other should look like.
Current programs are good at presenting conclusions (i.e. pedigrees). The extraction programs used by the LDS church seem to be good for recording source information (along with images). What is needed are programs in the middle. Tying sources together, recording conclusions (positive and negative) and building pedigrees from those conclusions.

eric said...

The Gramps project has goals that seem in line with this discussion. (see: http://www.gramps-project.org/wiki/index.php?title=Portal:Using_GRAMPS)

I've used Gramps a little, and like it so far.

Anonymous said...

>>> We don't know what the analysis screens that take us from one to the other should look like. <<<

Actually, I think PAF Insight and nFS gives us some ideas. Both are based on matching people to additional data. I'd think a source-centric program would be similar. Start matching everything to all others. (Although that leaves a problem that grows geometrically.)

What I see:
1) Enter source image, text, or at minimum a citation.
2) Extract source data into a flexible group sheet/pedigree.
3) Match individuals between sources (this would also be recorded as a source - who and when)
4) Computer crunches the results.

Step 3 could be done as I suggested matching individuals, or it could be done right after entering the source in a GUI interface, drawing a link from the source-group sheet fragment to another source or the completed group sheet.

Step #2 I say a flexible group sheet since some source may be narrative. We need to enter "grandpa", "uncle", "cousin" etc without having to define if they are related on the mom's or dad's side. I'm proposing the group sheet as a way of standardizing the back-end data for crunching. Nothing preventing custom forms for specialized things like birth certificates or census, etc.

Anonymous said...

First of all, I like the tagging approach being used on social network sites like Geni and Facebook. When you upload a photo, you're asked to tag the photo with the name of each person in your tree / network who appears in the photo. The software then links that photo to the profile page of each person tagged in the picture. Could we do something like that with vital records documents. Marriage certificates could be "tagged" with the names of the husband and wife. Birth certificates could be tagged with the names of the child and the parents.

I think genealogy software should be oriented to drive the user to identify which sources they have not obtained rather than simply which events are missing.

As we're getting more repositories (both personal and commercial) online, I think we need to find a way to link to images of the original document using some identifier like doi.

Marian Johnson said...

I once prepared a spreadsheet showing the content of all the columns of the censuses 1850-1930. I listed the census years on the top and the column contents along the side, and showed which column the information was located in for each census year. Then it occurred to me that I could actually enter data instead of column numbers.

I think a genealogy program should offer something like that for inputting all data on an individual. One could choose which source and a form would pop up with spaces to enter the data contained in the source. The information would be correlated by the computer with information from other sources in a small spreadsheet. By looking across each row one could see each version of the name, date, place, etc. contained in each source. At the top would be listed the various sources used. One could specify the country the person lived in so that the appropriate census forms would be accessible. The genealogist could then look across the row, select which version of the name, date, place etc. he wanted to be the default to be entered into his Family Groups and Pedigrees. When each source type is selected, the user is then given the opportunity to enter the specifics of his source - page, film number, county office, etc.

For example, across the top of the spreadsheet would be birth certificate, marriage certificate, death certificate, obituary, each census year the person was alive, will, etc. Each column would show the contents of that document that applied to that individual. One could immediately see discrepancies between the documents. Notes could be added to explain discrepancies.

Dan Lawyer said...

Sorry for not replying to these comments individually. I have read them and considered them multiple times. I just haven't had the time to compose responses to all of them. I'll try to do better moving forward. Thank you so much for your thoughts. Many valuable insights.

Anonymous said...

Like Blake Christensen, your comments about Source-centricity reminded me of the GENTECH model from the National Genealogical Society.

http://www.ngsgenealogy.org/ngsgentech/projects/Gdm/Gdm.cfm

They started with "documents". I prefer your use of the word "artifact". To most beginners documents are only made of paper, so they really don't know what to do with an oral history interview, for instance, or a photo album. Regardless, the term "artifact" is a little more vague, and might get people thinking a little more creatively about sources. I like it ( not that I feel like I have to approve ).

On the other hand, "conclusions", doesn't quite feel right. While GENTECH also uses that word, in addition they use the word "Assertions", which I think more accurately describes what we do when we read a document.

I have several source-centric databases that are modeled in different manners in The Master Genealogist (TMG). While not specifically designed to be source-centric, many users have invented ways to use it in that manner. It should be noted that the author of TMG, Bob Velke, is one of the authors of the GENTECH model.

I have advocated the source-centric approach for over a decade, long before I discovered GENTECH. In my experience, the biggest obstacle to doing research in this manner is that most people are too focused on their own lineage to be interested in sources per se, which, sadly, can be one of their biggest challenges in extending their lineage. Sources are merely a means to an end, and thus are considered a distraction.

Interestingly, some people have attended my various "Data Mining" presentations for years, and have FINALLY caught the vision of what I'm trying get them to become interested in.

This is another facet of the question you posed regarding the lack of interest in entering sources. Some of it has to do with the LDS approach of "taking (a laundry list of) names to the Temple", as opposed to "turning the hearts of the children to the fathers". Anyway, I've got to go.

If you want to discuss this further, let me know. I have many more comments, materials, etc. about the primitive genealogical lifeform known as the "hunter/gatherer".

Jorge said...

This was a very interesting post. I research a lot of Brazilian/Portuguese ancestry and names/surnames change quite frequently. e.g. Antonio Siqueira married to Barbara de Jesus = Antonio de Siqueira married to Barbara Jesus = Antonio de Siqueira married to Barbara Davila. Very often one needs to check the grandparents and even other family relationships (such as those evidenced by the choice of godparents in Catholic records) to make sure two individuals or two couples are the same.

Thus, I want to document in which record I find each name; especially should I ever come to the conclusion that some two individuals are really not the same, I will then want to separate the individuals and their children.

So, what I started doing recently (I have been researching family history for seven years now) is to record a complete summary of the names of parents and grandparents, along with godparents, for each new record. In PAF, I am including all this information in the "Comments" area of the source citation. (Film/Volume/Page number for microfilm number, then volume and page of original record; I use "date record made" as "date record found" so I can follow my own logic someday in the future).

I am not sure this makes sense or whether this is what you had in mind with the post, but these are the thoughts the post prompted to me.

Adam Short said...

I'm in the process of building a source-centric genealogy app at the moment. Don't want to give too much away, but based on what you and others have said here, it should meet your needs. Mind if I let you know when it's done?

Dan Lawyer said...

Adam,
I'd love to hear more about it when you're ready.