The Differences Slavery Made -- Thomas and Ayers -- American Historical Review

View : Methods Overview | GIS | Statistics

Development of a GIS Database For Augusta Co., Virginia, and Franklin Co., Pennsylvania, 1860-1870:

Overview, Outline, and Detailed Discussion of Procedures For Data Automation

Aaron Sheehan-Dean and Scott Crocker, September 2000-November 2001
Steve Thompson and Ariel Lambert, June 1999-August 1999
Overview

The Geographic Information System (GIS) data bases created for Augusta County, Va. and Franklin County, Pa. are based upon mid-nineteenth century maps produced in each county. In 1870, Jedediah Hotchkiss, an Augusta County resident and Confederate Cartographer, published a detailed map of the county derived in large measure from surveys he conducted during the Civil War. Like the Hotchkiss map, the Franklin map, created by D.H. Davidson, shows the location of commercial establishments and residential locations for over 4,000 households. In addition to showing major and minor roads as well as rivers, streams, and smaller water courses, the maps are significant in that they show the locations of over 2,000 named structures. Although mills (flouring, saw, and paper),churches, schools, mines, and a variety of manufacturing establishments (black smithies, potteries, forges) are shown on the maps, the vast majority of named structures are private residences with the name corresponding either to the property's owner or inhabitant. Viewed alone, the maps are capable of providing many insights into the physical and cultural geography of Augusta and Franklin Counties during the Civil War period. The major goal of this project, however, is to use the maps as a basis for projecting detailed Census records (population, Agricultural, manufacturing, and slaveholding) of the county for 1860 and 1870 into space. The abundant family names provided by the maps provide the key which enables us to link Census records to inhabited space.

Photographing of the Maps
Augusta

The original Hotchkiss map was photographed by Special Collections in Alderman library. In its published form, the map consists of twenty-four paper sections (referred to as "quads" by the photographer)arranged in six rows each of which is comprised of four sections and all of which are affixed to a single canvas backing. So that the map could be easily folded, ca. 1/2 inch spaces were left between individual map sections. The Special Collections photographer shot the map in twenty-one sections, with each photograph corresponding to a 1:1 reproduction of a section/quad of the original map. The map sections were numbered quad01 through quad24 by the photographer, beginning in the upper left-hand corner of the map and proceeding from left to right and from top to bottom. Thus, the quads were numbered as follows:

1	2	3	4
5	6	7	8
9	10	11	12
13	14	15	16
17	18	19	20
21	22	23	24

The upper left (NW), upper right (NE), and lower right (SE) quads (numbers 01, 04, and 24) where not photographed as the margins of the county map did not extend into these sections. A map of the city of Staunton drawn at a smaller scale occupies the lower left hand corner of the map (quads 17, 18, 21,and 22). The twenty-one photographed sections were delivered to the VCDH as full color TIFF images, each one about 17 MB in size.

Franklin

The original Franklin map consists of twelve paper sections arranged in three rows and four columns. For scanning purposes, the sections were numbered sequentially beginning in the upper left-hand corner of the map and proceeding from left to right and from top to bottom, as one would read a manuscript. However, because the map sections were larger than the scanning bed, each section was scanned in halves. The top half was saved with the letter "a" appended onto the section number and the letter "b" was appended onto the section number when naming the bottom half. For example, the top half of section 1 was saved as 1a, and the bottom half saved as 1b. Some overlap was left between the two halves in order to facilitate the edge-matching process later. All twelve map sections were scanned in this fashion and the resulting twenty-four images were cropped as closely as possible to the borders of individual map sections and saved as color Tiffs.

Constructing a single digital image of the Maps
Augusta

A single image file comprising the whole of the county was "stitched together" in Photoshop. Individual quads were first aligned and their margins cropped as closely to the borders of individual map sections as possible. Quads were then edge-matched one to another, first in "blocks" defined by four contiguous map sections. These first-order recombinations were labeled `block01' through `block06.' Block01 was comprised of quads02, 05, and 06; block02 of quads03, 07, and 08; block03 of quads09, 10, 13, and 14; block04 of quads11, 12, 15, and 16; block05 of quads17, 18, 21, and 22; and block06 of quads19, 20, and 23. The six blocks were thus arrayed as follows:

01	02	03
04	05	06

The six blocks were saved both as full color, compressed TIFF images (block01-06.tif) and as black and white, uncompressed TIFs (block01b-06b.tif). The final stage of combining the six blocks into a single image required, for reasons of file size, that the blocks be converted to black and white while the geo-registration and rectification of the resultant image (see below) required an uncompressed TIF image.

The final stage entailed edge-matching and joining the six blocks into a single black and white image file. This image file, named "augmap," was saved in both compressed (augmap.tif) and uncompressed (augmap2.tif) formats. The insert of Staunton wholly contained in block05 was clipped and saved as the uncompressed "stntn.tif."

Edge-matching and joining of the quads and blocks was a tedious and not wholly perfectible process as both the paper and the canvas backing are very elastic and individual sections of the map appear to have been variably stretched and distorted over the past 130 years. In addition, the edges of the paper sections were often frayed and worn so that less than perfect matches often could not be achieved.

Franklin

A single image file was made by "stitching together" the twenty-four individual images in Photoshop. The complimentary halves (a and b) were first combined by decreasing the opacity of one half and overlaying it on the other half. The top and/or bottom images were slightly rotated as necessary to compensate for any errors in the scanning process, and features such as text labels, roads, and rivers were then used to align the two halves. The opacity was then changed back to 100% and the image was flattened and saved according to the appropriate section number. For example, images 1a and 1b were combined and saved as 1.tif. The sections were then edge-matched to each other, although this was somewhat more difficult than matching the halves (a and b) since there was no overlap between sections. Once the sections had been combined to create a single large image, the image was cropped just outside of the county boundary to create a composite map of Franklin County. The town insets were also cropped out and saved individually. The results were saved as uncompressed gray scale TIFs so that they could be properly geo-referenced.

Geo-referencing and rectification of the digital images
Augusta

Assigning real-world coordinate values to the individual pixels of the augmap2.tif image, known as "geo-referencing" was carried out in Arc/Info using the Arc commands REGISTER and RECTIFY. Before this process could begin, however, it was necessary to obtain stable control points from a source that has already been geo-referenced. These control points should be features such as buildings, bridges, and road intersections that can be found on both the target (franklin) map and the geo-referenced source, and that are known to be in the same location on both the target and the source. For example, an old church that has been in the same location for hundreds of years would be a good control points. Within REGISTER, links were initially made to county boundary and hydrology vector line coverages from the U.S. Census (Tiger/Line data) and reasonable results were obtained. Better results were achieved by establishing links between the Augusta Co. image and georeferenced TIF files of 1:24000 scale USGS quadrangle maps (Digital Raster Graphics (DRGs), as numerous stable points such as churches, road intersections, etc. could be located on both target (Hotchkiss) and source (USGS) maps.

Perfect geo-referencing of the Hotchkiss image was not possible due to various factors. First, as mentioned, the original paper map appears to have been stretched and distorted significantly. Second, distortions were undoubtedly compounded both during photography and subsequent editing, edge-matching, and joining of map sections and blocks. Third, the cartographic precision of the original Hotchkiss map appears to be less than that of modern maps of the county. This is particularly notable along the northwestern and southeastern borders of the county, both of which lie in mountainous terrain. The most significant departures in the actual contours of the county's boundary between the Hotchkiss map and modern maps occur at the southwestern and southeastern corners of the county.

Approximately twenty links were established between the DRG source images and the Hotchkiss image and included a number of points along the county's boundary and throughout the internal area of the county. Links were added and deleted until the RMS error of all links was less than 500 meters. Lower average RMS errors could not be achieved despite much experimentation. In the main, then, points on the geo-referenced Hotchkiss image (as indicated by there x,y coordinates) lie no more than 500 meters from their "actual" locations and often times are significantly closer.

The RECTIFIED Hotchkiss image file is labeled "augmap2r.tif" (with corresponding `world file' "augmap2r.tfw), and it is this rectified image that was used as a background for all subsequent digitizing of vector data.

Franklin

Assigning real-world coordinate values to the individual pixels of the franklin.tif image, known as "geo-referencing" was carried out in Arc/Info using the Arc commands REGISTER and RECTIFY. Numerous control points could be located on scanned 1:24000 scale USGS topographic quadrangles known as Digital Raster Graphics, and these DRGs were used as the source from which the control points were digitized. During the registration process, approximately twenty links were established between the Franklin image and the control points taken from the DRG source images. These included a number of points along the county's boundary and throughout the internal area of the county. Links were added and deleted until the RMS error of all links was less than 50 meters, meaning that a given point on the geo-referenced Franklin image lies an average of 50 meters from its actual location. Numerous factors prevented a lower RMS error from being achieved, the most significant of which was probably errors in relative distances between points on the Franklin map resulting from the scanning and edge-matching process.

The rectified Franklin image file is labeled franklinrec.tif and is accompanied by a corresponding world file franklinrec.tfw. This rectified image was used as a background for all subsequent digitization of vector data.

Creating Digital Vector Coverages from the Geo-Referenced Image

A series of digital vector coverages were produced using the rectified raster images of the two county maps. All digitizing was carried out within the ArcEdit module of Arc/Info. Features were traced from the rectified image, with the resultant digital "coverages" being in the same real-world coordinate system as the source image. All the Franklin digitization was done in the Albers projection and the coverages were subsequently reprojected into the UTM coordinate system to match the work done earlier on Augusta County.

Line Coverages
Augusta

Three county-wide line coverages have been digitized, one detailing hydrology (stream1870), one roadways (roads1870), and one the railroad (rail1870). Line features representing water courses in the hydrology coverage have all been coded (within a field name "RANK" added to the arc attribute table (aat) of the coverage so that all streams are classified into one of three types (major, lesser, and minor). Stream length was the criterion upon which this classification was based (>12000 m = Rank 1/major, 6000 - 12000 m = Rank 2/lesser, and 6000m = Rank 3/minor). A second field named "NAME" was also added to the aat of the hydrology coverage to contain stream names as they appear on the Hotchkiss map.

Digitized roadways have also been classified according to a tripartite scheme. Within the aat of the roads coverage a field named "RD_TYPE" was added to contain this information. Roads classed as type 1 are considered "major roads" and are represented by double solid lines on the Hotchkiss map. Type 2 roads are "minor" and are represented by single solid lines on the original map. Finally, type 3 roads or "paths" are those routes shown by single dashed lines by Hotchkiss. As with the hydrology coverage a RD_NAME field was also added to the aat to contain road names, though most of these features are not named on the original map.

Since the county had only one railroad, this coverage did not require additional coding by class.

Franklin

Three county-wide line coverages were digitized, one detailing hydrology (Rivers), one roadways (Roads), and one railroads (Railroads). A field named Rank was added to the arc attribute table (AAT) of the Rivers coverage and contains a code that classifies the streams and rivers as either major, lesser, or minor based on stream length. A stream that was longer than 12,000 meters was coded as Rank 1/major, lengths of 6,000 to 12,000 meters were assigned Rank 2/lesser, and streams less than 6,000 meters long were given Rank 3/minor. A second field named Name was also added to the AAT of the Rivers coverage to contain stream names as they appear on the Franklin map.

Digitized roadways were classified as major (1) or minor (2) in a field named Rd_type in the AAT of the Roads coverage. Major roads are represented by double solid lines on the original map, while minor roads are represented by single solid lines. There were no distinguishable paths (coded as Rd_type 3 on the roads coverage digitized from the Augusta map) on the Franklin County map. As with the Rivers coverage, an Rd_name field was also added to the AAT to contain road names, although most of these features are not named on the original map.

Since the county had only one railroad, this coverage did not require additional coding by class.

Polygon Coverages
Augusta

The most basic polygon coverage digitized represents the boundaries of Augusta County. This coverage is named "bord1870".

The boundaries of six electoral districts plus Staunton are portrayed on the Hotchkiss map and these have been digitized into a single county-wide coverage named "dist1870." A single field named "DISTRICT" was added to this coverage's polygon attribute table (pat) to contain the name of each district polygon.

A coverage named Soils was digitized from a general soil map of Augusta County produced by the U.S. Department of Agriculture's Soil Conservation Service in 1974. Before digitizing, the soils map was registered to the Border coverage digitized from the Augusta map using the procedure described in the geo-referencing section above. The county Border coverage was also used as the border of the Soils coverage to ensure that the Soils coverage would overlay properly with the other digitized data. A Code field was added to the polygon attribute table (PAT) of the Soils coverage and contains the numerical code (1-14) associated with each color in the legend of the original map. The Type field was also added to contain the actual name of the soil type (association) that corresponds with each color and code in the legend.

* The historical maps we used as the basis for the GIS we constructed for each county did not include any information on soil type or productivity. By incorporating soil type into our GIS, we would be able to compare residents of both counties against one another, as well as help isolate the difference slavery might have made in Augusta. Lacking reliable historic soil type or quality maps, we decided to use current U.S. Geologic Survey soil association maps for each county. The Augusta Soil Survey included suitability ranking for crops; we applied these when we created new variables within the GIS/Census database ranking the soils by their suitability for agriculture. We relied upon the expertise of the Augusta County Cooperative Extension Agent, Tom Stanley, for help in interpreting the suitability of different soil associations. Drawing on the Virginia Nutrient Management Standards and Criteria, produced by the Virginia Department of Conservation and Recreation, Tom provided us soil suitability rankings, by crop, for the different soil types in Augusta.County Soil Survey.

We identified polling stations for Augusta from newspaper reports following the 1860 presidential election. The reports listed voting returns by party for each polling station. We created a .dbf file for each county with this data, adding variables that calculated the percentage of the total vote given to each candidate. In the GIS, we created new coverages for the polling stations (20 in Augusta). All of the Augusta polling stations were located in towns. Though we did not know the exact location of the station, we digitized a single point as close to the town center as possible since voting probably occurred at some prominent, centrally location in each place. We then created Thiessen polygons around each polling station, in essence, recreating the voting precincts. We could then aggregate household socio-economic and demographic data by precinct in order to have a profile of the districts that supported each candidate in the 1860 election.

The final polygon coverage represents elevation. This coverage was not digitized, but instead was created from USGS Digital Elevation Models (DEMs) using the capabilities of both Arc/Info and ArcView. In order to fully cover the entire county, all DEMs containing any part of the county were merged using the Grid command MOSAIC. The GRIDCLIP command was then issued in Grid (not Arc) to clip the merged DEM using the county's Border coverage to obtain a single large grid in the shape of Franklin County. This grid was then reclassified as follows using ArcView's Spatial Analyst extension:

1 = less than 226 meters
2 = 227-331
3 = 332-435
4 = 436-540
5 = 541-644
6 = greater than 645 meters

The reclassified grid was then converted to a shapefile. This process created a polygon for every cell in the original grid, and allowed the Gridcode field (containing values 1-6, as described above) to be carried over from the reclassified grid cells and assigned to each polygon. ArcView's Geoprocessing Wizard was then used to dissolve the polygons based on the Gridcode attribute, so that all polygons having the same Gridcode value were grouped together in a single polygon. Finally, Arc's SHAPEARC command was used to convert the shapefile to the Elevation coverage while keeping the Gridcode attribute as a field in the PAT of the newly created polygon coverage.

Franklin

The most basic polygon coverage digitized represents the boundary of Franklin County and is named Border.

The boundaries of the fifteen electoral districts are portrayed on the Franklin map and these have been digitized into a county-wide coverage named Districts.

A coverage named Soils was digitized from a general soil map of Franklin County produced by the U.S. Department of Agriculture's Soil Conservation Service in 1974. Before digitizing, the soils map was registered to the Border coverage digitized from the Franklin map using the procedure described in the geo-referencing section above. The county Border coverage was also used as the border of the Soils coverage to ensure that the Soils coverage would overlay properly with the other digitized data. A Code field was added to the polygon attribute table (PAT) of the Soils coverage and contains the numerical code (1-6) associated with each color in the legend of the original map. The Type field was also added to contain the actual name of the soil type (association) that corresponds with each color and code in the legend.

The historical maps we used as the basis for the GIS we constructed for each county did not include any information on soil type or productivity. By incorporating soil type into our GIS, we would be able to compare residents of both counties against one another, as well as help isolate the difference slavery might have made in Augusta. Lacking any historic soil type or quality maps, we decided to use current U.S. Geologic Survey soil association maps for each county. For Franklin County, we contacted Scott Metzger at the County's Natural Resources Conservation Service, who provided us with the most recent Soil Survey of the county. The Franklin Soil Survey included suitability ranking for crops; we applied these when we created new variables within the GIS/Census database ranking the soils by their suitability for agriculture.

We identified polling stations for Franklin from newspaper reports following the 1860 presidential election. The reports listed voting returns by party for each polling station. We created a .dbf file for each county with this data, adding variables that calculated the percentage of the total vote given to each candidate. In the GIS, we created new coverages for the polling stations (23 in Franklin). All of the Augusta polling stations were located in towns. Though we did not know the exact location of the station, we digitized a single point as close to the town center as possible since voting probably occurred at some prominent, centrally location in each place. For Franklin, eleven of the twenty-three polling stations were identified only the township name within which it lay. This required us to use more discretion in identifying the location of the polling station. We placed the point representing the polling station for these eleven places in the center of the largest town within the township. We then created Thiessen polygons around each polling station, in essence, recreating the voting precincts. We could then aggregate household socio-economic and demographic data by precinct in order to have a profile of the districts that supported each candidate in the 1860 election.

1 = less than 226 meters
2 = 227-331
3 = 332-435
4 = 436-540
5 = 541-644
6 = greater than 645 meters

Point Coverages
Augusta

The heart of this project entails the digitization of point coverages from the Hotchkiss image that record the locations of all named (and unnamed) structures and establishments that appear on the map. It is through the establishment of links between map names and census names that a fully spatially referenced statistical database was generated.

All point features on the map were digitized and assigned a unique identifier that was used to join/relate the gis point coverages to a series of data files containing information regarding matches between points on the Hotchkiss map and records contained in the 1860 population, agricultural, and slave holding censuses. The initial task of matching map points to census records was carried out by VCDH staff prior to the initiation of this gis data base. The compilers of this spread sheet worked systematically by election district, proceeding typically from point to point along roadways, recording named points along with general locational information (toponym and reference within a grid that was superimposed over the map) and indicating whether the point could be matched to a record in any of the three censuses (pop, agric., and slave).

Within ArcEdit, adding point features to a coverage entails the software automatically adding a unique "user-id" to each feature. The user-id field is the fourth field in the coverage's point attribute table and is generated automatically. The name of this field is always -id and should not be confused with the third pat field named #. By default, ArcEdit calculates user-ids sequentially, beginning with "1" each time a new coverage is created. The user-ids of added features that are later deleted are NOT reused, again by default. The assignment of user-ids, however, can be controlled by the digitizer; the start number as well as interval of a sequence, for example, can be specified. User-ids can also be changed for individual points or a series of points using the CALCULATE command. In this project, digitizing and thus the assignment of a series of user-ids to point features follows exactly the record sequence of the Excel files (and therefore the MAP-IDs contained there). Essentially, digitizing moves from point to point in the same sequence followed by the compilers of the Excel files. In addition to the automatic assignment of a user-id to each point digitized, the digitizer also fills a value in the added field named "PNT_TYPE." This allowed us to identify all the buildings represented on the map by type (eg: residence, commercial, public) as well as by their specific owner or use.

Because of the extremely repetitive nature of point digitizing (there are approximately 1000 points to be digitized within each of the 6 electoral districts) this process has been automated and is now being carried out with the use of an AML script named "points.aml" This aml, of course, cannot automatically correct entry errors which must be corrected manually outside of the script. Once all named points contained with the Excel file for a given district have been digitized, points.aml can be used to add and assign point types to any additional locations (usually unnamed) within the district. This entails adding, typically, several hundred additional points to those contained in the Excel file. Assignment of user-ids to these points can take place irrespective of sequence. Once the last point within a district has been digitized, the next number in the id sequence can be used to define the first user/map-id to be used in the next district to be digitized.

An additional points coverage that was created represented town centers (Towns). This coverage was used for a simple proximity analysis and also aided in the creation of a polling stations coverage that served as points from which Thiessen polygons were generated (see Polygon Coverages section).

We used the newspapers to confirm the location of railroad depots based on what we could determine using the maps by themselves. Franklin County had only two depots; one in Chambersburg and one in Greencastle. Augusta had five: Waynesborough, Fishersville, Staunton, Swoopes Depot, and Craigsville.

Franklin

The heart of this project entails the digitization of a Points coverage from the Franklin image that records the locations of all structures and establishments that appear on the map. It is through the establishment of links between map names and census names that a fully spatially referenced statistical database could be generated.

All point features on the original map were digitized and assigned a unique identifier that can be used to join/relate the GIS Points coverage to a series of data files containing information from the 1860 population, agricultural, and slave holding censuses. The digitization of point features began in the northwestern corner of the map and proceeded from west to east and north to south, following a grid that was overlaid on the map. To join the Points coverage to the data file containing information on census records, unique identifiers were created in both the GIS and Excel files that link each point with a corresponding Excel record. IDs were assigned sequentially to each record in the Excel file beginning with the number 1. Corresponding Ids were then added in the point attribute table (PAT) of the Points coverage by creating a Map-id field. This field was coded for each point by assigning it the number associated with its corresponding record in the Excel file. In this project, digitizing and thus the assignment of a series of Map-ids to point features follows exactly the record sequence of the Excel file (and therefore the MAP-IDs contained there). Essentially, digitizing moves from point to point in the same sequence followed by the compilers of the Excel file.

After all of the point features had been digitized and coded, it was discovered that several hundred labeled points on the Franklin map did not have corresponding records in the Excel spreadsheet that had served as a guide for the digitization process. These records had to be added to the spreadsheet and then digitized as an addition to the Points coverage. Therefore, the Map-ids of these points are out of order when compared to the left to right, top to bottom sequence of the other points. What is important, however, is that the Map-ids assigned to these points are still unique and therefore relate to the proper record in the Excel spreadsheet. In addition to assigning a unique Map-id to each digitized point, the digitizer also entered a value in the field Pnt_type, which was created in order to contain numerical codes for all point types such as residences, schools, churches, etc. This allowed us to identify all the buildings represented on the map by type (eg: residence, commercial, public) as well as by their specific owner or use. Once every point had been digitized and coded with unique Map-ids and Pnt-types, the Excel spreadsheet was joined to the PAT of the Points coverage as described later. It was then fairly simple to select out certain Pnt-types such as residences, churches, and schools for use in further analyses.

Checking and Cleaning Census Match Excel Files

Once all points within the counties have been successfully digitized, the next step is to join the data records contained in the corresponding Excel file to the coverage's point attribute table. Before this is carried out, however, the Excel file MUST BE checked and cleaned of any erroneous or ambiguous entries.

Occasionally, transcription or spelling errors are encountered in the Excel file during the process of digitizing and these should be corrected. Very occasionally, Excel file records may be encountered for which no clear point on the map can be found (and thus digitized). In this case, the Excel file record can be deleted (not deleting the file record will have no consequence upon the gis data base as lacking a match to the ids in the point attribute table, the Excel record will not be imported - as long as the digitizer did not assign this user-id to another point). A more serious problem arises in the case in which a single point feature on the map may have been assigned (erroneously) two or more records by the compilers of the Excel file containing references to Census record matches (and thus multiple map-ids will have been assigned to such single points. In this one of the records (and its map-id)must be deleted from the Excel file before it is joined to the coverage's point attribute table.

Much of the labor and time required in cleaning the Excel files results from the fact that the compilers of the file frequently matched multiple point features with a single census record. That is, multiple point features on the maps share references to a single, unique Census Page#/Family# (Pop. Cen.) or Page#/Line# (Ag. Cen.). This is a case of a "many to one" match and may have happened for various reasons.

One common cause is that multiple features on the map often actually are labeled with identical names and, thus, appear to be owned by a single individual. For instance, many points exist on the Hotchkiss map that are labeled with the possessive form of an individual's name (i.e. "A. Crawford's). Invariably, however, there will be one point in this spatial cluster that is not labeled in the possessive.

Although the interpretation cannot be verified, our working assumption is that points labeled with a possessive represent properties of the named while points not labeled possessively indicate place of residence of the named. The compilers of the Census Match files, however, ignoring those cases labeled in the possessive, typically assigned all points with the same name,to the same individual (unique record) in the Census records. While some, perhaps even all of the features labeled with a possessive may be dwellings owned by the individual indicated, they need not be. That such points represent barns, outbuildings, or other agricultural or manufacturing installations cannot be ruled out without more information. Even if these features are residences, however, it is important that they be associated with the the Census data related to their OCCUPANTS RATHER THAN THEIR OWNERS.

If the task of reconstructing the routes of Census takers and of infilling more point to record matches is profitable, it may be possible to associate some of these points with Census households as families that rent their residences can be detected in the Population Census as they have zero Real Estate wealth.

Many-to-one matches also appear to have other causes. It may also be the case, quite understandably, that multiple individuals within the county shared the same first and last names, and thus the possibility exists that the compilers of the files of matches to the Census records will have matched inadvertently more than one person/dwelling to the same Census data record. Such cases can only be resolved, if at all, by examination of the Page/Family Numbers of nearby matches.

Cases have also been encountered (with the 1860 Census Records) in which points on the map indicated as belonging to individuals sharing a common family name but having different first(and middle) names/initials (e.g. A. Crawford and T. Crawford) have been matched to the same unique Page#/Family# or Line#. The only explanation for this situation seems to be the time lag between the recording of the 1860 Census and the drawing of the 1870 Hotchkiss map. That is, in 1860 A.Crawford and T.Crawford were listed as belonging to the same family because they were sons of the same father and lived with him under the same roof, but by 1870 the father had died and his estate had been divided among his heirs whose names were recorded by Hotchkiss. The difficulty here becomes that of deciding which point (if any) of those matched to the Census record should be retained as the most probably residence of the family's head of household in 1860.

All cases of "many to one" matches are problematic and MUST BE rectified at this point. If not, the process of joining data tables (.pat and Excel file) will simply join the first occurrence of an ID in the .pat with its first occurrence in the Census Match data file - and this is dependent simply upon the (arbitrary) order of the records in these two data files. EACH DIGITIZED POINT ON THE MAP (EACH USER-ID) CAN BE MATCHED TO ONLY ONE UNIQUE RECORD IN EACH OF THE CENSUSES. LIKEWISE, A UNIQUE CENSUS RECORD CAN BE MATCHED TO ONLY ONE POINT IN SPACE. To match a single Census record entry to more than one point on the map will result in the replication of statistical census data in any aggregation of this data above the level of the household. In other words, not resolving cases of one-to-many matches will result in individuals being counted more than once whenever statistical data is aggregated/summarized at higher order spatial scales.

As most replications of unique census records with the Excel matching files probably are based upon the replication of names (either intentionally or due to a multiplicity of W. Smiths, for example) in the map, a means of checking for them is to sort the Excel files (but not before Map-ids have been established) alphabetically on last names. The file can then be studied for duplicate last names and duplicate matches to single census records. The information on matches to census records should be removed from all records except that one deemed most likely to represent the primary residence of the individual in question. It is also possible, however, that census records are duplicated in the Match files because of transcription/typographical errors (either in the original documents or in subsequent versions. A more complete check, then, entails sorting the files by page# and fam#/line# for each of the three censuses and checking for additional duplicated matches. This sorting and checking procedure should follow a sort based on Last Name, First Name however.

Prior to importation of the Census Match Excel files into Arc/Info, it is imperative that all potentially confusing characters be removed from or replaced in the Excel files prior to importation into Arc/Info. Commas, since they will be used as field delimiters (see below) must be replaced with colons (:). Forward slashes (/) should be replaced with underscores (_), and single right quotations (') with single left quotations (`). The fields containing page, line, and family number information that match records to census records must contain only numerical data although characters can be contained within the original composite fields in which this information was initially recorded. Thus, remove all "M"s, "N"s, "/" and any other character information from these fields (PopCen60Page, PopCen60Fam, AgCen60Page, AgCen60Row, SlvCen60Row). Occasionally these fields may contain references to more than one census record. When this occurs, all but one of these references must be removed (though this information should be retained in another field (Orig, or Name fields).

At this point, it also makes sense to add the unique MAP-ID associated with each point in the GIS coverage to its corresponding Page#/Dwelling# record in the aggregated Census Data base. Adding MAP-IDs directly into the Census data bases allows these files to be sorted on this field, thus providing an additional necessary check to make sure that individual records in the Census data bases have been matched to ONE AND ONLY ONE point on the map. (see above regarding many-to-one matches).

Please see DEVELOPMENT OF A DIGITAL CENSUS DATABASE FOR AUGUSTA CO., VIRGINIA, AND FRANKLIN CO., PENNSYLVANIA, 1860-1870 for a full explanation of the methodology and creation instructions for the SPSS data file that was joined to the GIS.

Importing Data files and Joining with Arc/Info Point Attribute Tables (.pat)

Once the Excel file has been checked and cleaned for any erroneous data, the file can be prepared for importing into the GIS. There are several ways to do this. The Excel file can be saved in comma delimited text format (.csv). Arc/Info can read ascii text files with, by default, comma delimiters. The corrected file should be so saved and ftp'ed to the vdhc/augusta/data directory on ptolemy (sending the file as ascii rather than binary data will prevent record delimiter characters (^M) from appearing at the end of each record of the text file). Once on ptolemy, the .csv file should be inspected (use the UNIX command "more ". Before importation, the first line of the comma delimited text file (containing field names) must be deleted. In xedit, with the cursor positioned at the beginning of a line will delete the line in its entirety. If extraneous characters exist in the text file (such as the record delimiters ^M mentioned above), these also must be removed. Such characters are best removed using the vi text editor.

Alternatively, the Excel file can be saved in dBase format and joined to the GIS through ArcInfo using the DBASEINFO command.

The simplest method is to save the Excel file in dBase format (which also makes data analysis in SPSS much easier), and join the file to the data attribute table of the relevant shape file in ArcView. If you save the shape file, with new data attached as a coverage, the join will be made permanent. The resulting coverage can be resaved back into a shape file for easier manipulation within ArcView.

Towns

The towns are singled out for treatment here because on both maps several of the larger towns were drawn as insets on the main county map. A discussion of how we handled this for each county follows.

Augusta

Within Augusta, we had a blow-up map indicating household residences and commercial and public buildings for only the county seat of Staunton. The remainder of the towns had their residences and other buildings noted on the general map. Compiling data for Staunton and then digitizing the city's points involved a process similar to the one explained in Checking and Cleaning Census Match Excel Files, which was used to complete the digitization of each of the county's electoral districts. However, the VCDH staff was forced to manipulate this process in order to accommodate Staunton's unique circumstances. Before reading the following explanation of these changes, be sure to study the procedures that were used to compile data for and then digitize the rest of the county.

Staunton, located in the Beverley District, exists on the Hotchkiss map in two forms: on the "augmap2r.tif" image and in more detail as an insert which was clipped and saved as "stntn.tif." To include points within the insert, "stntn.tif" was georeferenced and rectified, resulting in "stntnbr.tif." Inevitably, the two images did not fit perfectly. That is, when "stntnbr.tif" was drawn over the larger map, the various line coverages (streams, roads, and railroads) did not follow perfectly the features on the insert. To remedy this, the digitizer edited the coverages.

Compiling an Excel file for Staunton was difficult and time-consuming. Prior to the initiation of the data base, VCDH staff produced an Excel spread sheet for Staunton by cross-referencing census and tax record information. The file, called "Stcynew.xls," included the following information for all of the city's tax payers: last name, first name, other, population census information, agricultural census information, slaveowner census information, acres, rods, poles, residence, estate, lot number, building value, lot and building value, tax amount, city tax amount, and notes. While the "aumap.xls" file maintained the record order in which points were entered, the "Stcynew.xls" file did not. It was therefore impossible to locate names on the map simply by following the list of names in the file. The "Stcynew" file also included a number of names that did not exist on the map and could therefore not be included in the final Excel file.

A second source of information for Staunton was the Staunton Fire Insurance Depositions. Compiled between 1850 and 1860 by "The Mutual Assurance Society Against Fire on Buildings of the State of Virginia," the depositions include the following information: policy number, policy holder's name, location of building, bordering homes or businesses, occupant's name, building value, total value of the policy, and a description of the one or more buildings included in the policy. Company agents also drew sketches of individual insured buildings. Thus, each of the policies is linked on the Valley site to a preliminary drawing of the buildings on that block. These sketches and their associated policy information allowed the project's staff to associate names (and various information) with points that were not labeled on the Hotchkiss map.

The first step in compiling Staunton's data was to scan the "stntnbr.tif" image for labels and produce a list of these names. These names were then cross-referenced with the "Stcynew.xls" in order to determine whether or not tax record information was available. The tax records use a coding system for the locations of buildings which includes: N for New "Town," O for "Old Town," B for "Beverley Addition," S for "Staunton," and OL for "Outlying." This coding system also includes numbers that refer to tax grid blocks. An image of Staunton's tax grid can be accessed through the Valley's insurance deposition index. The image is called taxgrid.tif." The coding system used in the "Stcynew.xls" file made it possible to determine which tax record was associated with which building. Unfortunately, the tax grid does not include areas classified as "Staunton" or "Outlying." Thus, it was impossible to associate a tax record with a name that appears more than once on the map in either of these areas.

Cross-referencing labeled points with the tax record information produced a list of 226 points. Some of these names were successfully associated with tax record data, while others were not. Points were often clustered near a label. In these cases, it was assumed that all of the points within a lot (which is contained within a polygon on the Hotchkiss map) belonged to the person whose label appeared in that lot. Each of these points was then linked to its appropriate census record information (population, agricultural, and/or slaveowner), if possible. Because only one point can be associated with each census record (see Checking and Cleaning Census Match Excel Files), one of the points within a lot was matched to its occupant's census record while the rest of the points were classified as the property of that person.

The city of Staunton raised a number of issues for the VCDH staff concerning occupancy versus ownership. The staff used the following reasoning in determining how to be consistent and accurate with regard to data compilation:

Labels outside of the city most likely refer to the property's owner. But, because we have no information regarding the occupancy of each of these properties, it was assumed that the owner was also the occupant. Thus, census information is associated with this occupant/owner and his or her name appears under the "Last Name" and "First Name." Note: Although there is a category for "Owner's Last Name" and Owner's First Name," the name is not repeated under these headings.
Within the city of Staunton, tax records and insurance depositions have allowed the staff to determine in certain instances if a building is occupied by one individual, but owned by another. In these cases, two names appear. The "Last Name" and "First Name" refer to the occupant, while the "Owner's Last Name" and "Owner's First Name" refer to the owner. Again, census information is associated with the occupant. These points are unique in that tax record and insurance deposition data indicate not how much the occupant pays, but how much the owner pays. The VCDH staff does not view this as an inconsistency within the data base because, most importantly, the information associated with these records gives the audience a better understanding of the building, its value, its physical qualities, etc.

After cross-referencing and inputting tax record and census data, the spread sheet compiler accessed the Staunton Fire Insurance Depositions on the web and added appropriate information to the list of 226 labeled points. Information relating to the insurance depositions includes: policy number, building, location, bordering properties, building value, total policy value, building type, year, and description.

At this point, an Excel file existed which included 226 points as well as their census, tax record, and/or insurance deposition information. In order to digitize these points, however, it was necessary to create a field for the "Map ID" and fill this field with a series of unique numbers. The first "Map ID" for the "pntsstan" coverage was 2037, because the last point in the "pntsbev" coverage was 2036. Although the Excel file for Staunton was not complete at this point, the VCDH staff went ahead and began the digitizing process in Arcedit. The "pntsstan" coverage was created and the "pntsstan-id" field was added to the "pntsstan.pat." The digitizer then added the 226 points to the coverage, making sure that each point was given the appropriate "pntsstan-id."

The next phase of the Staunton project was to associate unlabeled points on the map with information from the census records, tax records and insurance depositions. The tax grid and the preliminary drawings of blocks available on the insurance deposition page allowed the VCDH staff to relate names and other information with points that Hotchkiss failed to label. The process for this phase of the project was reversed. Instead of compiling the data and then digitizing the points, the VCDH staff began by adding the points to the "pntsstan" coverage in Arcedit using the next number in the sequence of "pntsstan-id" values. At the same time that the digitizer was adding the points to the "pntsstan" coverage, she recorded each "pntsstan-id" on the hard copy of the GIF images next to the appropriate building. After over 50 unlabeled points were digitized, these points were added to the Excel file with their respective data. The "Map ID" for point corresponded with the "pntsstan-id" that was recorded on the GIF images during the digitization process.

After inputting this data into the Excel file, it was necessary to return to Arcedit and "CALC" point types for each of the digitized points. To determine the point type for points that had been associated with insurance depositions, the digitizer referred to the "MASbuildtype" in the Excel file. Many of these points were classified as both a dwelling and a business. Thus, a "pnt_type" for "Residence and Business" (or 46) was added to the "points.aml" list. When necessary, other point types were added to this same list. Throughout the rest of the county, points labeled with a first and/or last name are classified as "residences." The same rule has been used in Staunton.

Next, the digitizer returned to Arcedit. At this point, two types of points remained undigitized. First, there were numerous unlabeled points that had not been associated with data and therefore did not require a place in the Excel file. These points were digitized and their point types were classified as "unknown" (or 99). Throughout the rest of the county, points like these (unlabeled and unassociated with data) were given the classification of "residence" (or 1) due to the high likelihood that these points were indeed residences. In the Staunton area, where businesses existed in greater numbers, this assumption could not be made.

Finally, the last points that required digitization were the churches, mills, factories, cemeteries, etc. That is, points that were labeled, but not by a first and/or last name. These points were coded according to their "l_name" (or label) and "pnt_type," using the "CALC" command. After all of these points were digitized, the "pntsstan" coverage included 1014 records.

Franklin

Unlike Staunton, the inset map of Chambersburg (Franklin's County seat), did not contain labeled residences. Without knowing the precise location of individual households within the city, but having census information on approximately 1200 city residents, we were left with the problem of deciding how, or whether, to digitize these residences. Ultimately, we decided that we had to include Chambersburg residents, even if their locations within the town borders itself were arbitrary, because they comprised a crucial part of the county. Consequently, the location of all the residences within the city of Chambersburg are arbitrary, and do not represent any historical relation between the household named and the location it was given. For the purposes of the analyses we conducted, identifying the location of a household to the correct block was not necessary. The finest grained analysis we completed was determining urban v. rural settlement ratios, using buffers drawn at 1-mile radii around the towns of each county. Since all the residences within Chambersburg fell within this definition, not having their precise location did not effect the outcome.

For the remaining Franklin towns we were able to digitize the residences with the same degree of accuracy as we did for all county residences. The town inset maps that accompany the main Franklin map include names for almost all features in each town. On the accompanying map, corresponding unlabeled points can be found. By cross-referencing the two maps, we were able to accurately identify almost all of the town residents for the rest of Franklin County.

GIS Analysis

Once the census dataset was connected to the related features on the GIS, we began our analysis. We used the GIS to add data calibrating the geographical and spatial relations between points on the map (private residences as well as public institutions and commercial establishments, roads, railroads, etc.) and natural features (rivers, elevation, soil type, etc.). These were done through the creation of buffers around points or line features (a standard GIS approach) or, in the case of polygon items (as with the digital elevation models or the soil type coverages), through assigning variables denoting location inside or outside specific polygons. For both Augusta and Franklin we added the following variables to the Census database: proximity to the railroad and railroad depots; proximity to a major road; proximity to a church; proximity to a school; proximity to a town (all with 1 mile buffers around the relevant points or lines); elevation; soil type; and voting precinct.

We created buffers around many of the features in both counties using the ArcView create buffers command. We drew 1 mile-radius circles around: schools, churches, towns, roads, railroads, and railroad depots and a five mile-radius buffer around railroad depots. All the residences were coded based on their inclusion or exclusion within these buffers. Residences were coded "1" if they fell within the buffer and a "0" if they were outside.

In order to calculate the distance between objects within each of the counties, we used ArcInfo's pointdistance command. We were interested in determining how far residences were from specific features, in a more exact manner than creating buffers allowed us to determine. The pointsdistance command uses an input file (in this case the residence coverage for each county) and a place to calculate the distance to (in this case, Chambersburg and Staunton, the county seats) and produces a single value for each point in the input coverage. These values can then be averaged to determine the average distance between each residence the county centers. We used the this to analyze the degree of dispersal in each county.

Projection Data

The Franklin coverages and images were initially in the following projection:

Projection: Albers
Datum: NAD83
Units: Meters
X-shift: 0
Y-shift: 0
1st standard parallel: 40
2nd standard parallel: 42
Central Meridian: -78
Latitude of Origin: 39
False Easting: 0 Meters
False Northing: 0 Meters

They were reprojected to bring them in line with the Augusta projection, which is as follows:

Projection: UTM
Zone: 17
Datum: NAD27
Units: Meters

Citation: Key = TM1