Modified June 1, 2008

Attribute Elicitation Exercise

"The most distinctive mark of a cultured mind is the ability to take another's point of view; to put one's self in another's place and see life and its problems from a point of view different from one's own. To be willing to test a new idea; to be able to live on the edge of difference in all matters intellectually; to examine without heat the burning question of the day; to have imaginative sympathy, openness and flexibility of mind, steadiness and poise of feeling, cool calmness of judgment, is to have culture." -- A.H.R. Fairchild

Abstract

In this entirely online discussion exercise (there is no formal document to turn in, and nothing will be graded beyond participation) we will practice our online collaboration skills and explore our assumptions and expectations as we each elicit two sets (Part 1 and Part 2) of values for one set of 12 graphic images. In Part 1 will will elicit simple values based on groups of three cards, the first two cards will share a value and the third card will be the logical opposite of that value. After we have elicited and discussed our first set of values, we will move on to Part 2 of the exercise, in which we will combine and condense these values to devise a set of overarching attributes that will serve the collection as a whole. We will then use these elicited attributes to discuss potential fields for a hypothetical datastructure for the card collection. This is a thinking and sharing brainstorming assignment. There is no "right" nor "wrong" answer.

 Value: 25 points for sharing your ideas and actively participating in the Blackboard discussion forum.

 

"Your assumptions are your windows on the world. Scrub them off every once and a while, or the light won't come in." --Alan Alda

 

PURPOSES:

  1. To develop insights into the ways that human beings discriminate and aggregate by assigning values, attributes, and fields to objects.

  2. To consider the relationship of data structures to this discrimination/aggregation process

  3. To brainstorm a hypothetical data structure to provide access to objects through fields, attributes, and values.

This activity (eliciting values, attributes, and fields, and creating a hypothetical data structure) is not cataloging, nor is it indexing. It is one of the first steps in rudimentary database design and information retrieval system design. It is the first creative step.   Cataloging comes later and involves taking each postcard and, using the data structure created, generate the actual record for each card. Indexing is the act of assigning values for the fields or devised.

Definitions:

Datastructure - The architecture of a database.

Database - A collection of records and (perhaps) the software required to use them.

Field - A standardized placeholder or "empty place" in each record into which data specific to that record is entered. (Fields are what are searched.)

Attribute - The type or the nature of data that is allowed in a field. For instance: text only, numbers only, date in this format xx/xx/xx, a list of allowed words, whether words are capitalized, whether a field can be searched, etc. A field's attributes are the "rules" for that field.

Value - the actual data entered into a field for each record, for instance, for a field named "color" we might have the attribute "text only, must be from a validation list, required" and the value "red." Each record's fields contain data specific to a particular information object. When values are the same, records can be aggregated during search; when values are different, records can be differentiated during search.

Record - The unique collection of data, arranged in standardized fields, for each individual information object in a collection of objects. Records "represent" actual information objects, such as books, and are surrogates for those objects.

PART I:

1.  Elicitation of values

You’ll be thinking individually about the same 12 cards.  Our task is to identify (elicit) values for these information-bearing objects, with as few preconceptions or assumptions as possible.  To provide a systematic way of considering these values, and to encourage the elicitation of a wide range of values, we will consider each group of three postcards separately.

Instructions:
Consider the first group of three cards (cards #1, #2, and #3). 
What value can you identify that is held in common by the first two cards but NOT by the 3rd?  (NOT = logical opposite)

For instance, for the following three cards, I might elicit the value "red" for the first two cards, and "not red" for the third.
Note: "blue" is NOT the logical opposite of "red." The logical opposite of "red" is "NOT red."

     


List that value in the column provided.  Next, list its logical opposite.  (Example:  If two of your postcards have a brick structure and the third one has a wood structure, “brick” might be the value you identify; and the logical opposite in this case is “not brick” -- the logical opposite of that value is NOT “wood.” )
To restate this in another way:  your value list would show brick in the value column and not brick in the logical opposite column.  The idea of a logical opposite is important because indexing terms (the approved list of values allowed for any field) must to be comprehensive. For instance, if you were to have the values "brick" and "wood" for a field called "Construction Material" there would be no accurate value for a structure that is built of stone, so your set of values could not accommodate all the possibilities...it would not be comprehensive.   If your values are "brick" and "not brick," you have a term available to use that accurately describes all possible materials.  

Work your way through the four sets of cards.

Can you see any pattern to the kinds of values you elicited? 

What were your assumptions or biases?

Share your work in the "Attribute Elicitation Exercise" forum in our Blackboard; read and comment on others' posts.

PART 2:

Now, let's take it up a notch!

In Part 2 of the exercise, AFTER we have elicited and discussed potential values for the sets of cards, we will elicit a higher order scheme that will work for our image collection as a whole, and then we will turn that scheme it into a simple datastructure by naming fields and, if we have time, ascribing attributes to those fields.

Look at the following graphic which illustrates the basic components of a simple datastructure. First, there is a database (in green) , which is simply a collection of records, and sometimes includes the software or hardware that makes it function. Notice in the database that there are three records. Each record in our database represents a specific and individual "information object" ...in this case, people. Each person has their own record, but all the records share the same set of fields, and each field shares the same attributes. Notice, however, that each record contains distinct data in each field...data that represents the individual.

Notice, too, that each field has a set of rules or attributes concerning what is allowed in that field (in our example, the field "last name" has the attribute "text only", meaning that only text can be entered in the field, no numbers or graphics are allowed. When we actually create a record for a specific person, we would enter data into our database by adding data to the fields,...data about that person, and following the field's rules or attributes. The actual data we enter in a field is the "value" for that field on that record. In our example below, the field "last name" [which has the attribute (thus, allows) text only] would have the value "Modar" on the first record (someone's last name.) Each record would contain a unique "value" in each field....and represent the unique data about that person.

Let's say we have twelve people's records in our database who all have the last name "Smith." If we search the "last name" field for the value "Smith" all twelve records would be retrieved. Let's suppose we also searched on the zip code field for the value 90210...only the records for those people with the last name "Smith" who also lived in the zip code "90210" would be retrieved!

At first, the most obvious datastructure for our 12 record collection of graphic images might seem to be a set of simple "yes/no" fields for the values we generated in Part 1. For instance, we might have a field named "food" and allow the values yes, or no. We might also have a field named "animal" and allow the values yes, or no. But it quickly becomes obvious that a list of yes/no fields becomes unwieldy and quickly uninformative....particularly if we think that we might want or need to grow our database to include hundred, thousands, even millions of records! What's the point in having a "food" field for a collection of graphic imagines, most of which aren't about food?

So, we must start abstracting our values and combining them into something a bit more comprehensive or inclusive, something that helps us better aggregate and discriminate among our records.

For instance, we might have elicited the value "flowers" and the logical opposite, "not flowers" for one set of cards, and we might have the value "trees" and the logical opposite "not trees" for another. For a higher order organization, it might make sense to combine these values into a single field called “Vegetation,” and then to allow the terms bush, grass, flower, tree as values for that field.  But our list of values must be comprehensive!...what about vines? dead vegetation? plastic vegetation? Other fields might continue to work well as simple binary (yes/no) fields – for instance, we might want a field for “human” with yes/no values.  It depends.

Attributes

As we combine and abstract the values we elicited in Part 1, our goal is to elicit fields for a proposed datastructure for the collection of 12 cards, so we need to begin to exert some control, or impose a finer structure over what is allowed in our fields. Enter field attributes. Among other things, we must consider which fields we might want to assign content validation (i.e., a validation or "drop down" list, or a list of values that are allowed in any given field.) This is also known as a controlled vocabulary.

Some fields we might want to leave without validation, and allow free-form text. The more we control what is allowed as a value for a field, the more specific and the more relevant our retrieval results can become, but at the same time, the more rigidly we must search the fields.

Keep in mind that the values allowed in any field must be mutually exclusive. In other words, an item can be one and only one of the allowed values (unless we happen to decide to allow multiple values in a field.) or we decide to allow that field to be a "free text" field that will accept whatever value anyone might want to include. In the case of a "free text" field, one person might enter the value "big" for an object, while another person might see that same object as "small" and enter that.....so free text is usually not a good choice!

In this exercise we want to control what values a field can contain, thus allowing more specific discrimination and aggregation, and improved search results. So, an item in our vegetation field cannot be a bush AND be a flower, and if we have “building” on our original value list and also “church,” we may need to make some changes by combining them. In other words, an item cannot be a "building" AND a "church so we’ll need to come up with a list of values that are both comprehensive and mutually exclusive.

Also consider entry validation as an attribute for a field .  "Entry" simply means what is entered into a field. When an entry is "validated" it means it comes from an approved list of words, and no other words are allowed, or accepted.

For which fields should indexers be required to enter a value?  Are there fields for which the indexer could be allowed to just skip the field if it doesn't apply? Perhaps it might be better to have “not applicable” or NA as one of the possible values and to specify that a value should always be required.   Are there any fields for which we would want to require unique entries?   Which fields should be single entry only, and which would fields might be useful if we could repeat them, as repeatable fields?

These decisions are part of the process called "delimiting" or setting the rules or limits for a datastructure.

All this decisions must be considered before we begin building a datastructure.

 Questions for discussion: 

You created your list of values for Part 1 under a constraint:  the attribute had to be present in two postcards and not present in the third.  Did the values you elicited for one set of cards actually “fit” the other sets of cards in the collection as well? – in other words, did they make sense for ALL the post cards?

When might this "attribute elicitation" technique be useful in generating potential fields for a database?  (useful as a brain-storming technique, to break out of  habits of expectation, etc.)

What are the disadvantages of this technique?   (one big one is that we do not have a defined purpose or user group and there are usually multiple, heterogeneous user groups in any single info environment.)

What about literary warrant (what the information object itself warrants something) vs. user warrant (what the user, rather than the information object, warrants)? Do they sometimes conflict? Did you experience this?

What kind of search engine could be used with the simple datastructure or list of fields that we came up with?  (Simple, single field look-up?  Multiple access point (simultaneous searching on multiple fields?)  Weighted indexing?)

Comprehensiveness – think about every possibility; we might want to use ‘catch-all’ terms like other, none, not applicable, unknown, even blank...these can be valid and even meaningful terms in some situations, but can cause ambiguity. For example: a postcard database might have the field "layout" with the allowed terms: portrait or landscape, but there are other possibilities. What about the field "Time of day" which might have the allowed terms: day, night. But what about indoor shots, still life, art reproduction; geographic location, how would we make allowances for pictures without no geographical or temporal element?

In a datastructure for books and other documents, also known as a "bibliographic datastructure" we create records that represent those books or documents. We might have a field called "type of book" and the allowed terms might be fiction or non-fiction, but we could also have mixed, unknown, hybrids, poetry, sacred texts, etc. There are lots of possibilities...but remember, the values allowed in a field must be mutually exclusive. For example, a field "Border" might be a single entry field with the allowed entries Wide, Narrow, Colored, White, Plain, Design; Labeled....but what about a border that is a narrow, white design? or what about an object that doesn't have a border?

In other words, you can’t say "redwood" and "evergreen" but we can have "redwood, other evergreen".

This method of eliciting attributes is a form of brainstorming – a preliminary step done without making judgments about what is appropriate. It helps us get all the options onto the table for consideration. It is much easier to think of things ahead of time than to have to reconstruct a datastructure later, when we discover flaws.

When we begin actually constructing the data structure, then we must make judgments – based on who our users are.  The practice of using triads in systematic way forced us to find less obvious attributes.  Is this only way? – certainly not!  The best way? – not necessarily!   In a real life situation, we could take what we already know about our collection, add to that what we can learn by talking to our actual users, and construct a data structure to accommodate both.   One problem with this method is that users may not be able to envision all the ways our database might be useful to them, nor see possibilities they are not already familiar with.  Another problem is that we may not see possibilities for other groups of users, or be able to foresee other way our database could be used.

Can groupings be incorporated into data structure? This is usually done through the interface design: how the information about records is presented, or how the query forms are designed.  Metadata schemes separate attributes of the original work from those of the reproduction of the work.  (The process of combining certain binary attributes into a more refined data structure – or not combining them – is not aggregation and discrimination.  We aggregate /discriminate among documents in the retrieval process, not among attributes.

Knowing the users 

All of this sort of careful consideration is useless until we know who are users are. The broader and more diverse our user base, the harder it is to design the database.  We may have to balance the goal of trying to attract the greatest number of user against trying to do a great job of meeting the needs of one particular group.  Also, you must consider things from the indexer’s point of view:  how easily can they determine the correct values?  How must time will it take?  How much domain knowledge is required?

Fields, attributes, and values are important.  Even more important is having clear definitions of what the fields, attributes, and values represent.  This may require documentation – guidelines or rules.  Does ‘house’ include a teepee?  Does ‘man’ include a picture of a sculpture of a man?  The terms we use can’t always do the job of defining themselves.  (a "good" value may not simply be ‘house, including teepees, yurts, huts, mansions, but not hotels, or caves.’)  Who decides?  We do – based on needs of our users. 

We don’t have to restrict ourselves to the values present in our sample card set.  We can include values that our common sense tells us are likely to occur in other documents that MIGHT be eventually added to the collection.

We may have rejected certain attributes because too many (or too few) postcards had it.  Don’t worry about this too much.  If 95% of our collection depicts California, it still may be OK to use that as a value. "California" can be "AND-ed" with other attributes to narrow it down, or the user might want to search ‘not California’ (only 5% retrieved).  Conversely, if only one or two postcards share an value (scalloped border,) that attribute may still be important to some users.  So don’t start counting up how many postcards share the attribute; instead, ask if the attribute is something that some user might find significant and want to be able to search on.

Not all attributes will have predetermined values.  Having predetermined values (a validation list, a drop down list, a controlled vocabulary) is good when the number of possibilities is limited and we want to control what values are assigned.  But some attributes (title, date) are best left up to the cataloger/indexer.

Let's talk about these and other considerations in our "attribute elicitation" forum in the blackboard.