Comparing attribute tables with Python
- Problem #1: You are merging layers that have different attribute table structures and you don’t want to lose any important information.
- Problem #2: You want to use Python to return a list of shapefiles that match a search phrase.
- Problem #3: You want to use Python to compare two lists and return the differences.
The following will solve all three!
I learned a lot from taking Introduction to Geoprocessing Scripts Using Python. When I returned to work, I got the grand idea that I was ready to write a script that would make one looming task more manageable.
As is always the case, the solution seemed easier in my head than it turned out to be when I sat down to write it. The ESRI course had given me just enough information to be dangerous. I had most of the pieces I needed to put together this puzzle, but not all. Those missing pieces are what I would like to share with you today.
The Background
I had created a geodatabase feature class and I needed to append and bunch of shapefiles to it. All of these shapefiles had attribute tables that were close to each other, but not exactly alike. Some had extra fields. I needed to make a list of all of those extra fields so that I could add them to the feature class before I started loading shapefiles.
The Solution
The Python arcgisscripting module has a ListFields() function that forms the basis of my solution. But before I could get to that point, I needed to make a list of the shapefiles whose fields I wanted made into a list.
Depending on your work flow, that might be all the shapefiles in a particular directory. For me, it was all the shapefiles in a particualr directory that matched a file name search phrase. One example: I wanted to merge together all the shapefiles that had “bridge” in the name. My teacher introduced us to the os.walk() function, but I needed to modify it slightly to do exactly what I wanted. Here’s how:
# import modules and create the geoprocessor object
# (some of these you won't use until later)
import arcgisscripting, os, fnmatch, sys
gp = arcgisscripting.create(9.3)
# returns a list of shapefiles in the specified folder
# that match the specified search pattern
folder = r"F:\StagingArea\shapefiles"
pattern = sys.argv[1]
shpList = []
for path, dirs, files in os.walk(folder):
for filename in fnmatch.filter(files, pattern):
shpList.append(os.path.join(path, filename))
In my setup, the search folder is hard-coded and the search pattern is a variable that the user inputs. This worked for my purposes because I was searching through the same folder on a lot of different file names. Modify according to your purposes. Note: The search pattern needs to be in the form *bridge*.shp. You can use * wildcards and you need to put the .shp on the end.
The code so far has only created the list. To make it visible, use this:
gp.AddMessage("Shapefiles returned:")
for shp in shpList:
gp.AddMessage(shp)
Both segments of code assume you will be running the script from within ArcToolbox. Use a print statement if you will be running it from within IDLE or PythonWin. Note that this list returns not just the shapefile names, but their full paths. This makes the list usable as input into the next function.
The code to create a list of attribute field names is pretty simple. Here’s how it looks for my single target feature class (the one I’m going to be appending everything to):
targetLayer = sys.argv[2]
targetList = gp.ListFields(targetLayer)
targetNames = []
for field in targetList:
targetNames.append(field.Name)
To create a list of attribute field names for a list of shapefiles, you add another loop. Here’s how it looks using the shpList I created earlier.
x = 0
fieldNames = []
for shps in shpList:
fieldList = gp.ListFields(shpList[x])
for field in fieldList:
fieldNames.append(field.Name)
x = x + 1
Now we have two lists. The first list is all the field names in the target layer and the second list is all the field names in every layer we want to add on to the target. What we really need is the difference between these two lists. In other words: we need to know which fields appear in the second list and not in the first list.
That sounds pretty complicated, but guess what, you can do it with two lines of code:
diffList = [fname for fname in fieldNames if not fname in targetNames]
diffList = list(set(diffList))
The first line compares the two lists. The second line’s list(set()) function removes any duplicates and alphabetizes the final list of differences.
Finally, to display your results:
gp.AddMessage("Missing fields:")
for field in diffList:
gp.AddMessage(field)
My finished code does a few extra things, like:
- eliminating unmatching geometry types from the shapefile list
- eliminating empty fields from the field name list
- listing the field length and type in the final output
Perhaps I will talk about those in a later entry. What I’ve shown here are the parts that were the most difficult to figure out.
Tags: attribute table, geodatabase, list, python, search, shapefile
August 6th, 2010 at 12:06 pm
This is a great post. I’m a recent grad trying to learn more about the web design industry and I really enjoyed reading your post. It’s without a doubt worth sharing!