San José State University

SCHOOL OF LIBRARY & INFORMATION SCIENCE

Data Structures in DB/TextWorks

First be sure you're clear on the difference between an information-bearing object, a record, and a data structure.

The data structure is the design plan for the fields which the records will contain. Fields have 3 major characteristics - type, indexing, and validation lists. You will need to make decisions about these three things in defining your Textbase structure.

This is the screen which shows your current structure and allows you to make changes (graphic in popup window).

Type

There are several field types available in DB/TextWorks. The three you will need for this class are

This decision is pretty straight-forward. If the field is going to contain text (words and letters), then it should be a text field! If it's going to contain numbers, it should be a number field. The automatic number feature is used primarily to assign a unique ID# to a record/object. If you select the automatic number option, the first record you enter will be assigned the number 1, and each subsequent record will receive the next number. If you delete a record, you can't go back and re-use its number; the numbers will continue to be assigned sequentially.

Indexing

This refers to how the field itself is indexed by DB/TextWorks - in other words, how the various values you will enter for the different records/objects will be stored and retrieved. The indexing of a field is the inverted file for that field. There are two possibilities - term indexing and word indexing.

Term indexing. This option indexes the contents of the field (the value) as a whole. For instance, if you have a field for color and it is term indexed, the value "blue green" will be stored as one single value. You will be able to search for the term "blue green" and retrieve the records with that value, but you will not be able to retrieve the records with that value if you search on the term "blue" alone or the term "green" alone.

Word indexing. This option indexes each word (a string of characters with a space before it and a space behind it) separately. That is, it stores each word as a separate entity. So, for instance, if you have a field for color and it is word indexed, the value blue green will be stored as three values - blue green, blue, and green. Because it is stored in this way, it can be retrieved in the same ways - if you search on blue, you will retrieve records with the value blue green, if you search on green, you'll retrieve the values for blue green, and if you search on blue green, you will retrieve the records with that value.

The indexing is (to repeat) what determines how the values entered in this field will be stored, and therefore how they may be retrieved. Using the example above, if you want the database search engine to retrieve only records with the value blue when you search on blue, then you should term index the field. If you want to be able to retrieve any record with the word blue in the color field, including blue green, Navy blue, and blue gray, then you should word index the field.

Another example: Many journals have the word "Libraries" in the title. If you have a title field in your data structure and it is word indexed, searching on the word libraries would retrieve the following titles: "Libraries," "American Libraries," "School Libraries," "Special Libraries," and "Libraries and Computers." (I've made up some of these titles.) If the field were term indexed, a search on the word libraries would retrieve only the journal "Libraries," and none of the others.

Validation lists

Each field may have a validation list. For some fields, validation lists are very useful; for others, you may not want to have one. A validation list is the list of allowable values for that field. If someone creating a record tries to enter a value that is not on the validation list, an error message will be generated, and the person will be able to browse the list of valid terms. A validation list thus creates a controlled vocabulary for that field.

For instance, in a database of photographs, you might want to have a field for general subject (of the photo), and restrict the values to family, work, buildings, and scenery. This would prevent your indexer who is creating the actual records from inventing subjects such as church or landscape. Or in a database of students, you might want to have a field for native state, and force the indexer to use the full state name rather than the 2-letter abbreviation (in this case, you would have to enter all 50 state names in the validation list). Having a validation list assures consistency in terms which may be selected, although it doesn't guarantee that the correct term will be selected from the list.

However, in the database of photographs, you might want to have an open field for notes or description, and then you would not create a validation list because you want the indexer to be able to enter any information he or she thought important. The indexer might, for instance, enter "picture of church showing arches, windows, and doorway; 4 people standing in front; tree in background."

The decision about whether or not to have a validation list is up to you; it depends on the level of standardization you want to impose on the field values. Having a validation list increases the likelihood that your indexer will use the proper values in the records she or he creates.

Another approach to standardizing content is to create a rule. Instead of entering the three primary colors into a validation list, you could simply create a rule for your indexer that says to use primary colors only (and hope that your indexer knows what the primary colors are!). Rules are not part of DB/TextWorks; you would create a list of rules as a separate guide, just as a word processed document. Obviously, rules work only if your indexer follows them.

An example of a rule might be "enter the name of the state spelled out fully; do not use abbreviations." This would save you the trouble of creating a validation list with all 50 states listed. A rule can also be something like "measure the object from top to bottom to the nearest 1/4", and round up if the measurement falls between the quarter inch marks." This has obvious value for a field where it would be extremely cumbersome to list all the possible measurements.

 Back