Posts Tagged ‘python’

Creating a list of keywords from the attribute table: New & Improved Version

Problem: You need a better way to summarize the contents of a dataset without wasting time typing a long list.

Back in March, I posted about creating a list of keywords from the attribute table. I knew that:

The solution I have come up with is not perfect. I keep thinking there must be a better way.

I had explained how to make a list of the administrative area types in the Natural Earth Admin 1 shapefile, using ArcMap’s “Summarize” tool. The end result was a text file that had extra quotation marks, commas, and other characters to remove.

There is a better way — with Python! I now have the power to structure the text file the way I want it from the start. Here’s the entire code:

import arcgisscripting
gp = arcgisscripting.create(9.3)

report = file(r"C:\gis\AdminTypes.txt", "w")
shape = r"C:\gis\10m-admin-1-states-provinces-shp.shp"



cur = gp.SearchCursor(shape)
row = cur.Next()
adminList = []

while row <> None:
     adminList.append(row.ENGTYPE_1)
     row = cur.Next()

adminList = list(set(adminList))
adminList.sort()

for admin in adminList:
     report.write(admin + "\n")
report.close()

print "Finished"

Change only the portions marked in red to make this work with any data layer. They represent, in order from top to bottom:

  1. The output text file
  2. The input data layer
  3. The name of the attribute table field to summarize

The text file generated by this code is a list of each unique entry in the ENGTYPE_1 field (all duplicates removed) — one per line, in alphabetical order. It’s the exact same thing you would see in the table of contents if you symbolized by category on that field. Only now it’s in a form you can copy and paste into other places, like a metadata record.

Python makes everything better.

Tags: , , , , ,

Comparing attribute tables with Python

  • Problem #1: You are merging layers that have different attribute table structures and you don’t want to lose any important information.
  • Problem #2: You want to use Python to return a list of shapefiles that match a search phrase.
  • Problem #3: You want to use Python to compare two lists and return the differences.

The following will solve all three!

I learned a lot from taking Introduction to Geoprocessing Scripts Using Python. When I returned to work, I got the grand idea that I was ready to write a script that would make one looming task more manageable.

As is always the case, the solution seemed easier in my head than it turned out to be when I sat down to write it. The ESRI course had given me just enough information to be dangerous. I had most of the pieces I needed to put together this puzzle, but not all. Those missing pieces are what I would like to share with you today.

The Background
I had created a geodatabase feature class and I needed to append and bunch of shapefiles to it. All of these shapefiles had attribute tables that were close to each other, but not exactly alike. Some had extra fields. I needed to make a list of all of those extra fields so that I could add them to the feature class before I started loading shapefiles.

The Solution
The Python arcgisscripting module has a ListFields() function that forms the basis of my solution. But before I could get to that point, I needed to make a list of the shapefiles whose fields I wanted made into a list.

Depending on your work flow, that might be all the shapefiles in a particular directory. For me, it was all the shapefiles in a particualr directory that matched a file name search phrase. One example: I wanted to merge together all the shapefiles that had “bridge” in the name. My teacher introduced us to the os.walk() function, but I needed to modify it slightly to do exactly what I wanted. Here’s how:

# import modules and create the geoprocessor object
# (some of these you won't use until later)

import arcgisscripting, os, fnmatch, sys
gp = arcgisscripting.create(9.3)

# returns a list of shapefiles in the specified folder
# that match the specified search pattern

folder = r"F:\StagingArea\shapefiles"
pattern = sys.argv[1]
shpList = []

for path, dirs, files in os.walk(folder):
     for filename in fnmatch.filter(files, pattern):
         shpList.append(os.path.join(path, filename))

In my setup, the search folder is hard-coded and the search pattern is a variable that the user inputs. This worked for my purposes because I was searching through the same folder on a lot of different file names. Modify according to your purposes. Note: The search pattern needs to be in the form *bridge*.shp. You can use * wildcards and you need to put the .shp on the end.

The code so far has only created the list. To make it visible, use this:

gp.AddMessage("Shapefiles returned:")
for shp in shpList:
     gp.AddMessage(shp)

Both segments of code assume you will be running the script from within ArcToolbox. Use a print statement if you will be running it from within IDLE or PythonWin. Note that this list returns not just the shapefile names, but their full paths. This makes the list usable as input into the next function.

The code to create a list of attribute field names is pretty simple. Here’s how it looks for my single target feature class (the one I’m going to be appending everything to):

targetLayer = sys.argv[2]
targetList = gp.ListFields(targetLayer)
targetNames = []
for field in targetList:
     targetNames.append(field.Name)

To create a list of attribute field names for a list of shapefiles, you add another loop. Here’s how it looks using the shpList I created earlier.

x = 0
fieldNames = []
for shps in shpList:
     fieldList = gp.ListFields(shpList[x])
     for field in fieldList:
          fieldNames.append(field.Name)
     x = x + 1

Now we have two lists. The first list is all the field names in the target layer and the second list is all the field names in every layer we want to add on to the target. What we really need is the difference between these two lists. In other words: we need to know which fields appear in the second list and not in the first list.

That sounds pretty complicated, but guess what, you can do it with two lines of code:

diffList = [fname for fname in fieldNames if not fname in targetNames]
diffList = list(set(diffList))

The first line compares the two lists. The second line’s list(set()) function removes any duplicates and alphabetizes the final list of differences.

Finally, to display your results:

gp.AddMessage("Missing fields:")
for field in diffList:
     gp.AddMessage(field)

My finished code does a few extra things, like:

  • eliminating unmatching geometry types from the shapefile list
  • eliminating empty fields from the field name list
  • listing the field length and type in the final output

Perhaps I will talk about those in a later entry. What I’ve shown here are the parts that were the most difficult to figure out.

Tags: , , , , ,

Removing Empty Feature Classes with Python

The Problem: You have a geodatabase with a lot of empty feature classes

This often happens if the geodatabase has been created from a template of standard themes, but this particular version doesn’t contain information to stick into all the available slots. It can be difficult to get a handle on what’s actually there with all the empty feature classes clogging up space. It can take up extra processing time or cause errors depending on what you’re doing with the data later. And, it’s just bad organization. The best practice is to get rid of them. You can do so by moving down the feature class list in ArcCatalog, checking for empties and deleting them one by one. But with large databases (the kind that need trimming down the most) this can take a lot of time.

The slow manual method:
Right click to delete feature class

The fast automated method: use this python code that an associate shared with me!

Copy and paste the following into any text editor and save it as a python .py file.

##############################################################
##  Delete Empty Feature Classes from Multiple Geodatabases
##
##  This script loops through all geodatabases within a directory
##  and removes empty featureclasses from each geodatabase.
##
##  User must hardcode directory.  Run this script in IDLE.
##
##  Python Version 2.4
##############################################################

# Create the Geoprocessor object
import arcgisscripting, os, sys, string
gp = arcgisscripting.create()

# Load required toolboxes...
gp.AddToolbox("C:\\Program Files\\ArcGIS\\ArcToolbox\\Toolboxes\\Conversion Tools.tbx")

# Set workspace
gp.workspace = "C:\\Documents and Settings\\user\\Desktop\\gis\\"

# list personal geodatabases in the current workspace
listmdb = gp.listworkspaces("*", "access")
mdb = listmdb.next()

# loop through the personal geodatabases
while mdb:
    print mdb
    gp.workspace = mdb
    # list feature datasets in personal geodatabase
    listfdset=gp.listdatasets()
    fdset=listfdset.next()
    if not fdset == None:
        gp.workspace = mdb + "\\" + fdset
    # list feature classes in the personal geodatabase
    fcs = gp.listfeatureclasses()
    fcs.reset()
    fc = fcs.next()

    # loop through the featureclasses
    while fc:
       # print fc

        # if the feature class table has no records, delete the featureclass
        if gp.GetCount_management(fc) == 0:
            gp.Delete_management(fc)
            print fc + "deleted from" + mdb
        fc = fcs.next()

    mdb = listmdb.next()


The only thing you will need to change is the directory under # Set workspace. Put the path to your geodatabase(s) in between the gp.workspace = “” quotes, remembering to use two slashes \\ in between folder names.

This works with personal geodatabases. If you are using file geodatabases, go to the next line under # list personal geodatabases… and replace “access” with “FileGBD” between the gp.listworkspaces(“*”, “”) quotes.

Now, start Python IDLE, which ships with the ArcGIS suite. Open up your .py file. Then, select Run –> Run Module (or hit F5). This is what you will see:
Python module in IDLE
Python results in IDLE
It may take some time to run, but that is time you can spend doing other things. When it’s done, those empty feature classes are history.

You can edit, save and rerun the .py file as many times as you need if your geodatabases are spread between different directories. (Or, for even more automation, copy them all into the same directory before you start, and let the script loop through them).

Python is an extremely handy tool. If you are interested in learning more about it, see the resources below.

Tags: , , ,