Conflation and Integration of Archived Geologic Maps and Associated Uncertainties

Old, archived geologic maps are often available with little or no associated metadata. This creates special problems in terms of extracting their data to use with a modern database. This research focuses on some problems and uncertainties associated with conflating older geologic maps in regions where modern geologic maps are, as yet, non-existent as well as vertically integrating the conflated maps with layers of modern GIS data (in this case, The National Map of the U.S. Geological Survey). Ste. Genevieve County, Missouri was chosen as the test area. It is covered by six archived geologic maps constructed in the years between 1928 and 1994. Conflating these maps results in a map that is internally consistent with these six maps, is digitally integrated with hydrography, elevation and orthoimagery data, and has a 95% confidence interval useful for further data set integration.


Introduction
While the push from paper to digital geologic maps has encouraged new mapping efforts among many state and federal mapping agencies, large areas of United States coverage are best thought of as a "patchwork quilt" of maps authored by different compilers, constructed at different times, with differing scales, and even using different stratigraphic units (Soller, Berg, & Wahl, 2000).This implies that, at least for the foreseeable future, it will be necessary to rely on archived geologic maps for geological and societal interpretation for many areas of the United States.Indeed, a large number of geologic maps are stored (rather than formally archived) in geological surveys, university faculty file drawers, theses and dissertations in university libraries, engineering reports, and elsewhere (Hatcher, 2005) and, as a result, represent an underutilized, and occasionally unknown, resource.
These archived geologic maps may contain a wealth of information about a region, but due to losses during storage or in institutional knowledge, they often have little or no associated metadata.While accessing these maps has become considerably easier with online database constructions such as the National Geologic Map Database (NGMD) project of the United States Geological Survey (USGS) (Soller, Berg, & Wahl, 2000;Soller & Stamm, 2014), moving from a paper geologic map used by geologists to a digital on-line geologic map easily accessible to the general public introduces several challenges to a mapping institution (Wunderlich & Hatcher, 2009).Further, considerable added value can be obtained by integrating these digital maps with other types of data on a modern GIS platform.
A general problem arises in the use of on-line digital geologic maps when accessed by individuals or agencies not familiar with their limitations.Geologic maps represent the interpretation of the surface and, to some extent, subsurface geology, generally based on limited geological and geophysical data observed or collected by the creator of the map at a given scale.Digital geologic maps, however, can be scaled down to a single point were an exact latitude and longitude is delivered along with a feature element (rock type, age, formation, etc.), that may imply an unwarranted certitude of bedrock knowledge.Three-dimensional digital geologic models add to these challenges by extending this problem to depth within Earth.For example, a digital geologic 3-D model can show a constrained aquifer at a depth of 100.35 m beneath a given surface location, but that precision may be far too great considering the uncertainties inherent in the model.
This research looks at some of the problems and uncertainties associated with conflating archived geologic maps together (in regions where modern maps are, as yet, non-existent) as well as vertically integrating these maps with layers of modern GIS data (in this case, The National Map of the U.S. Geological Survey).

Example Area
Ste. Genevieve County, Missouri was chosen as the test area.It is covered, in part, by six archived geologic maps constructed in the years between 1928 and 1994.Conflating these maps result in a map that is internally consistent with these six maps and is digitally integrated with hydrography, elevation and orthoimagery data from The National Map.
The archived geologic maps to be combined cover all of Ste.Genevieve county and parts of St. Francois and Perry Counties, all in southeastern Missouri about 95 kilometers (km) south of the city of St. Louis.The area of investigation is bounded by latitude 38.12 o N on the north, longitude 89.75 o W on the east, latitude 37.50 o N on the south and longitude 90.50 o W on the west, covering about 2000 km 2 (Figure 1).This area includes a small section of the St. François Mountains in its southern extent, as well as the Avon igneous intrusions, and Hawn State Park.

Data
The data used from this study include six paper geologic maps (shown in Figure 2 and cited in Table 1), and elevation, hydrography and orthoimagery data.The paper maps were digitized by the Missouri Department of Natural Resources, Geological Survey Program and were obtained through the U.S. Geological Survey National Geologic Map Database Portal's link to the Missouri Geologic Survey Map Index (Soller & Stamm, 2014).Of these six maps, one was published (Weller & St Clair, 1928).The other five maps were either new geologic surveys (Harrison & Schultz, 1994;Schultz & Harrison, 1994) or proposed updates to older surveys (Stewart, Aid, Kidwell, & Robinson, 1951;James, 1951;Satterfield, 1981) none of which had been rendered into a final map product.The metadata available for the evaluation of the maps were extensive for the published map but sparse for the other five maps.Each map, after downloading, had to be georegistered during which process their Root Mean Squared (RMS) errors were recorded; where necessary, they were also transformed into Universal Transverse Mercator coordinates based upon the 1927 North American Datum.
Data layers on hydrography, elevation and orthoimagery were downloaded from the U.S. Geological Survey web site that contains The National Map viewer (Dollison, 2010).The area downloaded consisted of sufficient border area (more than four times the RMS error) to preclude any anomalous edge effects.

Geologic Map Uncertainties
In this paper, the term 'cartographic uncertainty' is applied to that uncertainty which is entirely related to map properties that can be completely defined at the point of use.These properties include georeferencing, line resolution, and pixel size.The cartographic uncertainty is independent from uncertainties inherent with the field data, such as the geographic uncertainty, associated with ascertaining the location of a feature on the surface of the earth, or the geologic uncertainties, associated with the correct identification, measurement, and extension of the feature.These later two uncertainties can only be recorded during the generation of the map data.In a perfect world, these uncertainties would be reflected with the metadata of the digital map, but typically for an archived map, they are not.The cartographic uncertainties described in this study can be tabulated by the user of the maps without relying on how the generating map data were obtained.As the cartographic uncertainties represent only a small part of the total uncertainty, they do not truly reflect the uncertainty of the geologic map, but they do provide a useful tool for the conflation of several geologic maps and their integration with different data sets.The cartographic uncertainty used here is a linear combination of the georeferenced uncertainty and the feature location uncertainty.

Georeferencing
Many archived geologic maps have been digitized, usually without any metadata available on the process.Some of these maps are stored in databases linked to the internet (for example, Soller & Stamm, 2014), while others remain in books, theses, dissertations, or personal files that are far less easy to access.In any case, it will be necessary to geo-register these files.For maps based on USGS topographic quadrangles, the geo-registration can be based upon at least the16 standard fiducial points.Other maps may have fewer, but in either case, the RMS errors to these fiducial points during this registration are saved.Unless there is reason to suspect non-randomness of the RMS errors, they may be considered as coming from a Gaussian distribution and, therefore, confidence limits may be determined from these data (Table 2).The part of the cartographic uncertainty associated with the registration of any two maps can then be obtained by adding the RMS errors from each map in quadrature (as the errors associated with each map are assumed to be independent of the others).The correct relation between the two maps would then have a 95% probability of occurring within this space (Table 2).

Linear features and Scale
Geologic contacts and faults as projected onto the surface of the Earth are essentially lines.As such, these one-dimensional features are unaffected by map scale.The depiction of these lines by pen-mark, however, does have width, and this width, in meters, is scale dependent.For this study we consider the width of the contact depiction on each map as an essential cartographic uncertainty (measuring only the ambiguity in the contact positioning on the map, and saying nothing about the uncertainty of the actual location of the contact in the field).This uncertainty is combined with the geo-registration uncertainty to give a radius of cartographic uncertainty which also incorporates scale changes between maps.

Cartographic Uncertainty
The total cartographic uncertainty then will be the summation (again, in quadrature, as the two uncertainty measures are linearly independent of each other) of the geo-registration uncertainty with the scale-dependent feature depiction uncertainty.It is this total cartographic uncertainty that will be used to constrain the models of integration and conflation of the various geologic maps and digital databases (Table 2).
Once these uncertainty regions have been mapped out, a target map can be translated throughout the uncertainty region to gain the best fit to the base map based on any number of misfit criteria.For the conflation of the different geologic maps, this simply constitutes an edge or body matching algorithm.

Conflation and Integration Procedure
For the integration of the geologic maps with a database (for example, one of the layers of the U.S. Geological Survey's The National Map), the maps are translated throughout the uncertainty space and misfit is measure based on a given misfit criteria.These criteria might include lithological mapping of limestone boundaries to cliff breaks in the elevation layer, stream lengths in the hydrologic layer contained within Quaternary alluvium markers, glacial ridges and cirques with topographic layers, and so forth.For this region, which exhibits low rolling hill topography (total relief = 360.7 m), and a dendritic drainage pattern, the hydrographic layer from The National Map was chosen as the integration interface with the Quaternary alluvium units from the geologic maps.
The step-by-step procedure that was used for the conflation and integration is given below.
1) All maps are geo-registered based upon at least 16 points Note: Registration is based upon linear interpolation, other methods such as Rubber Sheeting are usually continued until the registration points to control points fit is perfect (Rosen & Saalfeld, 1985) and, as a result, inference on registration uncertainty is lost.
2) RMS misfit map registration is recorded (in meters).
3) All maps are projected into the same coordinate system (if necessary) 4) Width of contact lines are measured (in meters) Note: This will be scale dependent, and makes no statement on the accuracy or uncertainty in the placement of the line only the uncertainty represented in the actual line itself.
5) For each map a cartographic uncertainty is calculated by combining contact line width with 95% registration confidence limit (assuming the misfit is random) in quadrature.
Cartographic Uncertainty (CU) = [ (1.96 * RMS misfit) 2 + (contact line width) 2 ] 1/2 6) Total cartographic uncertainty is the combination in quadrature of uncertainties of any of the maps being conflated (in this example maps were conflated in sequence starting with the most recent map, as a result, only two cartographic uncertainties were combined in any give conflation).
Total Cartographic Uncertainty = [ (CU Map1 ) 2 + (CU Map2 ) 2 +(CU Map3 ) 2 + … ] 1/2 7) Total cartographic uncertainty is then used as the radius of the uncertainty space (a gridded circle about which map translation is allowed in order to minimize contact misfit for conflation and hydrography misfit for integration).
8) One map is chosen as the target to be translated to each of the uncertainty grid points.The sum of the misfit of each edge point (for edge fitting) or body point (for map overlays) is calculated (Figure 3).
Note: Fitting (edge or body) requires 1:1 mapping of geologic units, therefore, any name changes or stratigraphic code changes as a function of time must be rectified before this process can proceed.
Figure 3. Example of the conflation of two geologic maps.In this case an edge fitting algorithm is used to find the minimum misfit across geologic contacts for these two maps.A similar process is used within the map for overlays 9) Simultaneously, in this study, the length of stream lines from the National Hydrography Dataset (NHD) contained within the Quaternary alluvium on the most reliable map (taken as the more recent map unless other evidence presents itself) is calculated at each uncertainty grid point (Figure 4).This is done by translating the origin of the alluvium to each point on the uncertainty grid (each point dislocation being one lag, which in this case was one meter) and determining misfit, Note: As suggested above, in other regions, different geographic markers might be better as a tie-in (e.g.Mesas, Buttes, Canyons, Hogbacks and possibly even Cuestas in arid environments; horns, drumlins, eskers and moraines could be used in glacial environments; sink holes and disappearing streams in karst; highly erosional resistant beds contacting low erosional resistant beds, or any truly distinctive geologic marker.)11) To avoid either the conflation or integration misfits from dominating the results, both misfits are normalized by the maximum misfit for each (e.g.fractional conflation misfit at grid point 1 = conflation misfit at grid point 1 in meters divided by maximum conflation misfit over all grid points, in meters).
12) The misfits are summed in quadrature to generate the combined misfit statistic, and the minimum denotes the translation that best matches the maps and The National Map data.
13) Once the maps have been translated to optimal position, remaining misfit is minimized by linear interpolation within the uncertainty region of the conflated map if possible.If the misfit is too large for reasonable accommodation within the uncertainty region, the misfit is left unmapped and noted in the metadata.

Results
Misfit for the conflation of the geologic maps is measured by the distance between aligning points (the spatial distance between these two points, when minimized, is equivalent to the L1 Norm, which in statistics is the absolute value of the difference between points).For edge-mapping conflation, these points are typically the edge-terminus of geologic formations or fault lines.For body-matching conflation, misfit is measured by the misalignment distance of selected identical points along matching formation boundaries or fault lines.The misfits are summed for each position in the geographical uncertainty space (one-meter spacing) and the minimum misfit location is found (Figure 5).The distance and direction from the origin (representing no movement of the conflating map features, shown by a red triangle in Figure 5) to the minimum misfit position (shown by the black cross in Figure 5) is the amount of translation that all the features in the conflating map undergo.These vectors are shown for each map conflation in Table 3. Figure 5 shows a marked asymmetry to the misfit.Such an asymmetry should be expected, in that Figure 5A, B, C and D are edge-joined and Figure 5E is body-joined.If the conflated map is joined along an eastern or western boundary of the origin map, then the least misfit axis should be aligned in a direction with a significant east-west component (see Figures 5A and 5C), since features aligned in north-south strike would not cross such a boundary.Whereas maps conflated along a northern or southern boundary should show a distinct north-south alignment (Figure 5B).As maps are conjoined about both a north or south boundary and an east or west boundary (Figure 5D) or are body-conflated (Figure 5E), a more symmetrical pattern should be apparent.
Integration misfit is parameterized by how much of the total stream length does not lie upon Quaternary alluvium.While not all streams produced Quaternary alluvium, and some streams may have meandered outside the alluvium mapped in the past, it is assumed that a best fit model is the one that minimizes the length of the stream lines that lie outside this stratum over a large area.The National Map layers are considered to be the data upon which the conflated geologic map features will be integrated.Figure 6 shows the sum of the misfits for a set of lags of the geologic map around the origin.The drainage pattern is dendritic for this region and as a result a more random misfit pattern is seen with very little variation shown within 10 meters of the origin.The best fit location is denoted by the large black cross.This is 6 m east southeast of the origin (Table 4).Translating the conflated geologic map through the vector described in Table 4 provides the best fit to the National Hydrography Dataset's (NHD) stream lines.These lines have been extensively integrated with the other The National Map data layers (Usery, Finn, & Starbuck, 2009;Sugarbaker & Carswell, 2011) and, as such, the conflated geologic map is integrated with all data layers for The National Map (Figure 7).The NHD accuracy is in compliance with National Map Accuracy Standards of 90% of all features within 12.2 m (40 ft), (Simley & Carswell, 2009).Further, since both conflation misfits and integration misfits were normalized by the maximum misfit of each before summation, no one statistic is seen to dominate.The best fit is seen to involve little or no movement of the maps, implying that the original digitization of the maps was done fairly consistently.A standard chi-squared (c 2 ) analysis yielded the 95% confidence limit show in Figure 8.This confidence limit is roughly elliptical in shape with a semi-major axis trending WNW to ESE (Table 5) with a semi-major axis of 10.6 m, and a semi-minor axis of 6.8 m, which yields an area of about 226 m 2 .This uncertainty ellipse is the minimum combined uncertainty (as it is associated with the anchor map in the conflation process) and therefore gives the tightest constraint for the integration of additional datasets.The conflated geologic map can now be digitized and the resulting digital dataset overlain on The National Map layers as a fully integrated dataset (Figure 9).The NHD is also integrated with the NED data (shown in Figure 7), as well as layers on orthoimagery, land use, transportation, structures, and geologic names.Associatively, the digital geographic map is also integrated with these layers

Discussion
The division of the uncertainties associated with geologic map usage into three categories (cartographic, geographic and geological) allows for the distinction between uncertainties associated with plotting and storage (cartographic), those associated with location in the field (geographic), and those associated with measurements and interpretation (geological).The cartographic uncertainty, while useful for conflation and integration purposes, will typically be minor compared with non-GPS based geographic uncertainties (Clegg et al., 2006;Whitmeyer, Nicoletti, & De Paor, 2010), particularly when field notes are not associated with the metadata, and the measuring and interpretive geological uncertainties.
The latter of these (the geological uncertainties) are likely to be the largest, particularly when subsurface models are codified.Trying to constrain these uncertainties is a topic of active research with the most prominent methods involving: • Perturbation methods where the uncertainty is derived from end-members of multiple possible model simulations (e.g., Lindsay, Aillère, Jessell, de Kemp & Betts, 2012; Lindsay, Aillère, Jessell, de Kemp & Betts, 2013; ) • Withholding a subset of geologic data from the model construction which is used afterwards to quantify model accuracy (e.g., Sturkell, Hokobsson, & Gyllencreutz, 2008) • Building stochastic models and other statistical approaches (e.g., López-Vázquez, 2012;Wellman, Horowitz, Schill & Regenauer-Lieb,, 2010;Wellman & Regenauer-Lieb, 2012;Leung, Goodchild & Lin, 1993) • Using expert analyses to estimate model uncertainty (e.g., Lark et al., 2013;Lark, Mathers, Marchant, & Hulbert, 2014) • Using geophysical data to constrain geologic model uncertainty (e.g., Jessell, Aillères, & de Kemp, 2012;Joly, Martelet, Chen, & Faure, 2008) The common thread through these methods is the importance of understanding the uncertainties inherent in the use of geologic map and models.This becomes paramount in the digital age as any user has the ability to call up a map or model designation down to the meter level on any of an assorted variety of electronic devises.While geologists have long known the interpretative and scale dependent nature of geologic maps, this knowledge may be lacking for varied uses by developers, planners, or home owners.
The uncertainty associated with any feature on a geologic map must then be a function of the cartographic, geographic and geologic uncertainties.In general, these are from independent sources of error and, as such, can be combined in quadrature.
Combined uncertainty = [(cartographic uncertainty) 2 + (geographic uncertainty) 2 + (geologic uncertainty) 2 ] 1/2 Once uncertainties have been established, there remains the question of how best to represent these in a digital database.Again several methods have been suggested including the inclusion of statistical parameters in the data (Kennelly, 2002), presenting pixel size (or in the case of a three-dimensional model, the volume element ('voxel', Lindsay et al., 2013) size), or simply not presenting an image when the image representation drops below the uncertainty.In any case, it is important to relate the uncertainty in the model to the potential user.
The 95% confidence limit for the combined conflation and integration process (as given in Table 5) used here has the added value of providing the basis for further data integration associated with this particular map.This confidence ellipse represents the uncertainty space through which additional data sets can be translated in order to achieve a best fit with the host data.The utility of integrating survey data to a base such as The National Map layers has been previously demonstrated in general by Shoberg, Stoddard, & Finn (2013), for gravity surveys in particular by Shoberg & Stoddard (2014), and for disaster restoration modeling by Ramachandran, Long, Shoberg, Corns, & Carlo, (2015;2016).

Conclusions
Using cartographic uncertainties derived from six digitized archive geologic maps as constraints, these maps were conflated with each other and integrated with The National Map data sets for parts of Ste.Genevieve, St.
Francois, and Perry counties in southeastern Missouri.Analysis of the conflation and integration process yields a 95% confidence interval that is useful for further data integration efforts.
This confidence interval, based upon cartographic uncertainties alone, suggests the position of any particular geologic feature on the combined map can be located with 95% confidence within an area of 226 m 2 .This uncertainty does not incorporate positional uncertainties due to the field work or geologic measurements or interpretations.Nor does it assume that field errors are greater or less than cartographic errors, merely that it is difficult to quantify these errors by a third party, even if the literature describing outcrop locations exists.However, there is considerable work in progress to quantify these other sources of uncertainty, the representation of which will become increasingly important as more and more digital geologic maps and models become available to all users (Goodchild, Lin, & Leung, 1994).
Given the parameters by which the cartographic uncertainties were constructed, it is safe to say that the conflation and integration process described here does not improve the accuracy of any geologic interpretation from these maps.What they do provide, however, is a self-consistent method that limits additional degradation of this accuracy while at the same time yielding the valuable addition of both wider geologic map coverage and other diverse data sets.

Figure 1 .
Figure 1.Location map of Missouri with inset box showing the example area

Figure 2 .
Figure 2. Maps used in the study and their relative locations with respect to the Ste.Genevieve County geologic map.Inset maps include: red box, a portion of the Farmington 15' Quadrangle; pink box, Coffman 7 ½' Quarangle; green box, Minnith 7 ½' Quadrangle; blue box, Higdon 15' Quadrangle; and orange box, Fredericktown 15' Quadrangle

Figure 4 .
Figure 4. Integration of the NHD stream line data with the geologic map.The length of stream lines that do not fall within the Qal areas is minimized

Figure 5 .
Figure 5. L1-Norm misfits space for each stage of the conflation process.X-marks show 1 meter spacing, red triangles show origin, and black crosses show least misfit locations.Hot colors (red, orange, yellow) show the greater misfit, cold colors (white, magenta, blue) show the lesser misfit

Figure 6 .
Figure 6.L1 Norm for misfit through uncertainty space of the geologic map integration with hydrography dataset (NHD).X-Marks denote 1-meter spacing.Again, hot colors (red, orange, yellow) show the greater misfit, cold colors (white, magenta, blue) show the lesser misfit

Figure 7 .
Figure 7. Integration of the conflated geologic map features with the data layers from The National Map.Blue lines show NHD flowlines, shaded region is hillshade version of the Digital Elevation Model (DEM)

Figure 8 .
Figure 8. L1 Norm for misfit through uncertainty space of combined conflation and integration statistic, Circle = origin, Red triangle = best fit, Solid black line = chi-squared, 95-percent confidence limit.Crosses show 2-meter spacing

Figure 9 .
Figure 9. Conflated and integrated digital geologic surface bedrock map for areas of Ste.Genevieve, St Francois and Perry counties in SE Missouri.The geologic maps have been conflated to give a larger areal content than each of the individual maps and have been integrated with the NHD of The National Map of the U.S. Geological Survey.The NHD is also integrated with the NED data (shown in Figure7), as well as layers on orthoimagery, land use, transportation, structures, and geologic names.Associatively, the digital geographic map is also integrated with these layers

Table 2 .
Cartographic Uncertainty parameters for each map

Table 3 .
Conflation parameters that minimize misfit for each map addition.