Posts Tagged ‘attribute table’
Problem: You need a better way to summarize the contents of a dataset without wasting time typing a long list.
Back in March, I posted about creating a list of keywords from the attribute table. I knew that:
The solution I have come up with is not perfect. I keep thinking there must be a better way.
I had explained how to make a list of the administrative area types in the Natural Earth Admin 1 shapefile, using ArcMap’s “Summarize” tool. The end result was a text file that had extra quotation marks, commas, and other characters to remove.
There is a better way — with Python! I now have the power to structure the text file the way I want it from the start. Here’s the entire code:
gp = arcgisscripting.create(9.3)
report = file(r"C:\gis\AdminTypes.txt", "w")
shape = r"C:\gis\10m-admin-1-states-provinces-shp.shp"
cur = gp.SearchCursor(shape)
row = cur.Next()
adminList = 
while row <> None:
row = cur.Next()
adminList = list(set(adminList))
for admin in adminList:
report.write(admin + "\n")
Change only the portions marked in red to make this work with any data layer. They represent, in order from top to bottom:
- The output text file
- The input data layer
- The name of the attribute table field to summarize
The text file generated by this code is a list of each unique entry in the ENGTYPE_1 field (all duplicates removed) — one per line, in alphabetical order. It’s the exact same thing you would see in the table of contents if you symbolized by category on that field. Only now it’s in a form you can copy and paste into other places, like a metadata record.
Python makes everything better.
- Problem #1: You are merging layers that have different attribute table structures and you don’t want to lose any important information.
- Problem #2: You want to use Python to return a list of shapefiles that match a search phrase.
- Problem #3: You want to use Python to compare two lists and return the differences.
The following will solve all three!
I learned a lot from taking Introduction to Geoprocessing Scripts Using Python. When I returned to work, I got the grand idea that I was ready to write a script that would make one looming task more manageable.
As is always the case, the solution seemed easier in my head than it turned out to be when I sat down to write it. The ESRI course had given me just enough information to be dangerous. I had most of the pieces I needed to put together this puzzle, but not all. Those missing pieces are what I would like to share with you today.
I had created a geodatabase feature class and I needed to append and bunch of shapefiles to it. All of these shapefiles had attribute tables that were close to each other, but not exactly alike. Some had extra fields. I needed to make a list of all of those extra fields so that I could add them to the feature class before I started loading shapefiles.
The Python arcgisscripting module has a ListFields() function that forms the basis of my solution. But before I could get to that point, I needed to make a list of the shapefiles whose fields I wanted made into a list.
Depending on your work flow, that might be all the shapefiles in a particular directory. For me, it was all the shapefiles in a particualr directory that matched a file name search phrase. One example: I wanted to merge together all the shapefiles that had “bridge” in the name. My teacher introduced us to the os.walk() function, but I needed to modify it slightly to do exactly what I wanted. Here’s how:
# import modules and create the geoprocessor object
# (some of these you won't use until later)
import arcgisscripting, os, fnmatch, sys
gp = arcgisscripting.create(9.3)
# returns a list of shapefiles in the specified folder
# that match the specified search pattern
folder = r"F:\StagingArea\shapefiles"
pattern = sys.argv
shpList = 
for path, dirs, files in os.walk(folder): for filename in fnmatch.filter(files, pattern): shpList.append(os.path.join(path, filename))
In my setup, the search folder is hard-coded and the search pattern is a variable that the user inputs. This worked for my purposes because I was searching through the same folder on a lot of different file names. Modify according to your purposes. Note: The search pattern needs to be in the form *bridge*.shp. You can use * wildcards and you need to put the .shp on the end.
The code so far has only created the list. To make it visible, use this:
gp.AddMessage("Shapefiles returned:") for shp in shpList: gp.AddMessage(shp)
Both segments of code assume you will be running the script from within ArcToolbox. Use a print statement if you will be running it from within IDLE or PythonWin. Note that this list returns not just the shapefile names, but their full paths. This makes the list usable as input into the next function.
The code to create a list of attribute field names is pretty simple. Here’s how it looks for my single target feature class (the one I’m going to be appending everything to):
targetLayer = sys.argv targetList = gp.ListFields(targetLayer) targetNames =  for field in targetList: targetNames.append(field.Name)
To create a list of attribute field names for a list of shapefiles, you add another loop. Here’s how it looks using the shpList I created earlier.
x = 0 fieldNames =  for shps in shpList: fieldList = gp.ListFields(shpList[x]) for field in fieldList: fieldNames.append(field.Name) x = x + 1
Now we have two lists. The first list is all the field names in the target layer and the second list is all the field names in every layer we want to add on to the target. What we really need is the difference between these two lists. In other words: we need to know which fields appear in the second list and not in the first list.
That sounds pretty complicated, but guess what, you can do it with two lines of code:
diffList = [fname for fname in fieldNames if not fname in targetNames]
diffList = list(set(diffList))
The first line compares the two lists. The second line’s list(set()) function removes any duplicates and alphabetizes the final list of differences.
Finally, to display your results:
gp.AddMessage("Missing fields:") for field in diffList: gp.AddMessage(field)
My finished code does a few extra things, like:
- eliminating unmatching geometry types from the shapefile list
- eliminating empty fields from the field name list
- listing the field length and type in the final output
Perhaps I will talk about those in a later entry. What I’ve shown here are the parts that were the most difficult to figure out.
Problem: You need to summarize the contents of a dataset without wasting time typing a long list.
Natural Earth is a crisp basemap dataset available for the world at 1:10m, 1:50m, and 1:110m scales. I downloaded one of their most detailed layers, the 1:10m Admin 1 – States, Provinces shapefile for use as my example.
If you were asked to write a metadata record or some other type of report about this dataset, you would want your description to be as complete as possible. You might find it necessary to generate a long list of place or topic keywords. Often, the attribute table already includes this information. You just need a good way to extract it.
For instance, if I symbolize by category on the ENGTYPE_1 field of my natural earth layer, I get a long list of administrative area types. There’s way more than just states and provinces here! If only I could select this list and paste it somewhere else to have in a usable format. Unfortunately, it’s not that easy.
The solution I have come up with is not perfect. I keep thinking there must be a better way. If any of my readers know of one, I would love the feedback. For now, this method certainly beats having to type the entire list:
First, open attribute table and right click on the field you want to turn in to a keyword list. Then, select Summarize…
Chose one summary statistic field. It doesn’t matter which, but works a little better if you chose something that’s always going to be the same. I chose “CheckMe” because that’s basically a flag (yes or no) field. It doesn’t matter which summary statistic (minimum, maximum, first, last etc.) you pick either. You won’t be using any of this information–it’s just a way to get ArcMap to make a list for you that includes each category one time.
Hit the browse button in the Specify output table section. Then, change Save as type to Text File.
Open up your text file in Notepad and remove the first line (the field names). Then, do Edit — > Replace and replace all commas, quotation marks, numerals 1-9 and whatever text phrase was in your summary statistic field with nothing.
When you are done, you end up with this. If you want to add commas and move these all into one row, it’s pretty easy from here.
This is a continuation of my last post about different ways to access flood zone data for the non-ArcGIS Desktop user.
Method 4: DFIRM Shapefiles
Digital Flood Rate Insurance Maps are available to download from FEMA for $10. They’ve offered a few free samples and Fairfax City happens to be one of them.
The data comes in several formats including shapefile. ArcGIS Explorer can read shapefiles. However, it will not let you add them to your map unless they have a defined projection.
The shapefiles in the Fairfax City DFIRM that I downloaded didn’t have their projections defined. I would assume this is the case with all of them. Luckily, they tell you the projection in the metadata. And luckily, projections can be defined with a file you can create using any text editor.
To find the projection, open the _metadata file in the Document folder. If you scroll down about 2/3 of the way you’ll find the Spatial_Reference_Information section. The most important parts are the Grid_Coordinate_System_Name, UTM_Zone_Number, and Horizontal_Datum_Name. The Fairfax City DFIRM is in Universal Transverse Mercator (UTM) Zone 18, NAD 1983 datum.
I used that information to have ArcGIS desktop create a projection definition file in the format used by all ESRI GIS software (including ArcGIS Explorer). It looks like this:
All you need to do is copy and paste that text into a text editor, remove any spaces, then save it as a .prj file. The name before the file extension should match the shapefile you are trying to use. The main DFIRM shapefile is S_Fld_Haz_Ar.shp, the flood hazard zone areas. So the projection definition file should be called S_Fld_Haz_Ar.prj
If you don’t want to copy and paste you can download it and put it the same folder with the shapefile.
FEMA uses UTM for all of its DFIRMs, but they do not always use the same datum. If you download a different one from Fairfax City, you will need to check the metadata for the UTM Zone Number and whether the datum is NAD 1983 or NAD 1927. If it’s NAD 1983, you can use the same text from above as a template to create your .prj file. Just change the two red areas to match what the metadata says:
If it’s NAD 1927, use this template and change the red areas
(Remove any line breaks that I’ve entered for readability).
Now, Go to the Add Content button and select Shapefiles… Then browse to the ArcShapes folder and add S_Fld_Haz_Ar.shp. The shapefile starts out looking like this…
…which isn’t very helpful, but just wait. If you right click on the layer in the Contents window, you can change the symbol to something with edges. Now you will be able to see the flood zone borders.
And, if you right click on the layer again and this time bring up the Properties window, you will be able to select certain attributes to show as Popup Content.
When you open this dialog box, a list of all the available attributes will come up. You will be able to select which ones you want to appear in a little pop-up window whenever you click on a feature. I picked all of them. Then, in the bottom half of the box you can select one attribute that will appear whenever you hover over a feature with your mouse. I picked FLD_ZONE because this is the most important piece of information. Now, if I type the address of City Hall into the Find box and press enter, I have everything I wanted at the beginning.
There’s a “you are here” symbol, and if I mouse-over I see it’s located in Zone X (not Flood Zone). I can mouse over other areas to see where the nearest 0.2 pct annual chance flood hazard zone is. And if I click in the zone, I can get any more information that exists about it.
Knowing how to use shapefiles in ArcGIS Explorer opens up a world of information. You can watch a free basic overview of the software at http://blogs.esri.com/Support/blogs/esritrainingmatters/archive/2009/12/03/explore-arcgis-explorer-in-a-free-training-seminar.aspx
This week I am writing about yet another plugin that has proven quite useful to me. I have only scratched the surface of its capabilities, but the few features I did try out, I found reasons to use over and over. The plugin is ET GeoWizards, developed by Ianko Tchoukanski, and available at http://www.ian-ko.com/
This tool duplicates some of the capabilities already present in ArcGIS, however it makes them all available at the ArcView license level. This can be a real boon to people who don’t have ArcEditor or ArcInfo licenses. In addition, you might just prefer the way it handles things better.
For example, my favorite function is the most basic: “Create New Shapefile.” I love it because it lets me create new shapefiles right in ArcMap! No need to disrupt my workflow to start up ArcCatalog and then drag the file into ArcMap so I can start editing it. When you initiate shapefile creation, it lets you chose a spatial reference based on your current map or other layers.
Then, it lets you add attribute fields! This too would be a separate step doing it the ArcCatalog way.
Then, it dumps your new shapefile right into the Table of Contents, ready for you to start using it. This is so much more convenient that you will never want to go back to the old way. It alone is reason to get the plugin, but there is more.
My second favorite function is “Redefine Fields,” which lets you change the length of string fields, or the precision of number fields, in your attribute table. I have run into many cases where I needed my text fields to be longer, and this is the easy answer. The only way to increase field length otherwise is to delete the field and re-add it with different definitions. This is problematic if the field is already populated with data. You end up having to create “holder cells” and migrate the data back in.
This tool removes the need for all those intermediate steps. However, it does save the results into a new shapefile, instead of updating your existing shapefile. That isn’t ideal, but I think there’s no way around it. It is still a much simpler solution overall.
My third favorite function is “Order Fields” which changes the order of the fields in your attribute table. There is no other way to do this. Sometimes, you want the most important information to be in front, especially if you have lots of fields to sort through. With ArcMap, you can drag fields to reorder them, but they snap back to their original position once you close the attribute table. This tool changes their order for good. Also, if you want to remove any fields while you’re at it, you can do so by leaving them over on the left side. Like “Redefine Fields”, it saves the results into a new shapefile.
A couple other functions that I haven’t tried, but which look really good:
- “Generalize,” which reduces the number of vertices used to represent a polyline or polygon. Sometimes you will end up with a feature you need to change the shape of that has vertices packed so tightly you are going to be there all day dragging things. Problem solved!
- “Shape to ShapeZ” conversion, which will add the Z dimension to a shapefile. Z values allow for the storage of elevation data. You can’t load Z-enabled data into non Z-enabled feature classes. You have to drill down into the Environment Settings in order to enable Z values. It’s tricky enough that I will probably blog about it at some point. This looks to be an easier way.
As I said above, I have only begun to explore the free ET GeoWizards functions. The registered version has even more capabilities. And, there are ET GeoTools for inline editing and ET Surface tools for working with raster elevation data. Plenty of reasons to see what’s at http://www.ian-ko.com
Problem: You need to consolidate datasets based on theme, region, or some other criteria
This is a comment data management task. You may receive layers that are unnecessarily split into a lot of small files. For instance, I often see roads divided into separate layers for each road type. Highway, major, minor, trail, etc. — all in their own shapefile. It is useful to be able to distinguish between road types on a map, but it is better done through the use of symbols or definition queries than by having an individual layer for each.
By the same token, themes are often split up by region. You might have all the road types together, but in a separate shapefile for each county, state or country. You might rather have a single worldwide or countrywide coverage. But as with the road types, it is a good idea to maintain the ability to distinguish between region, in case the need should arise.
The way to do this is to add a new attribute field, populate it with the characteristic you are trying to preserve, and then merge. In other words, you’re moving the distinguishing characteristic from the file name or file folder to the attribute table, so it is not lost, prior to merging.
For example, I have a list of basemap layers for the Oceanic countries of Australia, Paupa New Guinea, New Zealand and Samoa. I want to merge all of the like layers into one for the entire region, so I have an Oceanic road layer, an Oceanic river layer, etc. But before I do that, I want to add a country field and put the country name in there. This way if I need to select out just the Australian roads later, I can.
It’s pretty tedious to open each attribute table, add a field, and then run the field calculator to stick the country name in there. This task can be sped up considerably through the use of model builder. Model builder allows you to batch process ArcToolbox commands, and to chain commands so that the output of one tool into feeds into the input of the next tool. We will be able to add new “Country” fields to all our Australia layers at once, and immediately insert the value “Australia” into those fields, with a single click of mouse.
Let’s get started. Open up ArcCatalog and activate ArcToolbox. If you have never used model builder before, you will need to create a new toolbox to hold your model. Then, create a new model within that toolbox.
Now, drag the tools that you want to involve into the model screen. In this case, that’s “Add Field” and “Calculate Field.”
We’ll be starting with Add Field. It’s time to choose your input layer. By default, the tools dragged into model builder accept a single data layer as input, but you can change them to accept a list of layers (batch mode). To do this, right click on the Add Field tool to make the Input Table a variable. Setting something as a variable “exposes” it, which means it makes it editable independent of the rest of the settings that you will configure later. Model builder indicates a variable has been exposed by giving it its own symbol (an oval) in the model flowchart. The other settings remain hidden inside the tool symbol (rectangle).
Now, right click on the Input Table variable to bring up its Properties. Then, change it to a list of values.
The Input Table symbol will now change to a stack of ovals. If you double click on the stack, you will be able to cue up as many inputs as you like. The fastest way to generate a list of inputs is to drag them from windows explorer. Note: You can’t drag them from ArcCatalog because the input box maintains focus until you dismiss it.
You can navigate to the folder, and manually select all the .shp files. Or, even faster, you can do a file search for them. This gets any subfolders too! I’ve searched within my Australia folder.
If it works right, your stack of ovals will turn blue. Now it is time to set the rest of the Add Field settings. You can access them by double clicking on the Add Field rectangle. Here, you will see your list of settings. The Input Table column is filled in but the rest of the settings (field name, field type, etc) are grayed out, except the first row. Double click on that first row to edit the other settings. Here, input your field name (“Country”), field type (“TEXT”) and field length (“50″) just like you would when activating the tool from ArcToolbox. Note: If you leave the text field length blank, it defaults to 50.
Because you don’t have any of these parameters set as variables, the settings you enter here will propagate down through the entire list of inputs. When you finish, Add Field should be yellow and the output oval stack should be green. This indicates all of the settings are valid, and they are ready to run.
Next, configure the settings for the Calculate Field tool. You’ll want to set the input of this tool as the output of the Add Field tool. Just look at the name on the oval extending outwards from Add Field. (The name is generated from the first layer in your input list). In my case, the output of Add Field is called “admin_area (3)”. I’ve set that as the Input Table for Calculate Field. Then, use the dropdown list to select the Field Name you want to populate. Your new field, “Country”, will show up in the list of options because model builder runs the commands in order and knows you’re planning to add that field in the previous step. Finally, type the string you would like to use to populate the field in the Expression box, with quotes around it. In this case, “Australia.”
Now, you can run your entire model from the model menu, or one tool at a time by right clicking it. I like to do one tool at a time at first so I can be confident each step is working correctly. The model parameters will turn red while running, and will form drop shadows once they have finished.
Now, you can rerun the same model for the other 3 countries. All you need to change is the initial (blue) input list and the Expression string within Calculate Field. If you want to do even less babysitting, you can make the Expression a variable and do all four countries at once. And of course, if you want to add and calculate more fields, you’d just string them along down the chain.
Remember any ArcToolbox tool can be dragged onto Model Builder, so the potential uses are limited only by your imagination. For more information, see (both PFDs):
- Geoprocessing in ArcGIS (ESRI Tutorial)
- Model Builder: An Introduction (ESRI Conference Presentation)
The Problem: You have a geodatabase with a lot of empty feature classes
This often happens if the geodatabase has been created from a template of standard themes, but this particular version doesn’t contain information to stick into all the available slots. It can be difficult to get a handle on what’s actually there with all the empty feature classes clogging up space. It can take up extra processing time or cause errors depending on what you’re doing with the data later. And, it’s just bad organization. The best practice is to get rid of them. You can do so by moving down the feature class list in ArcCatalog, checking for empties and deleting them one by one. But with large databases (the kind that need trimming down the most) this can take a lot of time.
The slow manual method:
The fast automated method: use this python code that an associate shared with me!
Copy and paste the following into any text editor and save it as a python .py file.
############################################################## ## Delete Empty Feature Classes from Multiple Geodatabases ## ## This script loops through all geodatabases within a directory ## and removes empty featureclasses from each geodatabase. ## ## User must hardcode directory. Run this script in IDLE. ## ## Python Version 2.4 ############################################################## # Create the Geoprocessor object import arcgisscripting, os, sys, string gp = arcgisscripting.create() # Load required toolboxes... gp.AddToolbox("C:\\Program Files\\ArcGIS\\ArcToolbox\\Toolboxes\\Conversion Tools.tbx") # Set workspace gp.workspace = "C:\\Documents and Settings\\user\\Desktop\\gis\\" # list personal geodatabases in the current workspace listmdb = gp.listworkspaces("*", "access") mdb = listmdb.next() # loop through the personal geodatabases while mdb: print mdb gp.workspace = mdb # list feature datasets in personal geodatabase listfdset=gp.listdatasets() fdset=listfdset.next() if not fdset == None: gp.workspace = mdb + "\\" + fdset # list feature classes in the personal geodatabase fcs = gp.listfeatureclasses() fcs.reset() fc = fcs.next() # loop through the featureclasses while fc: # print fc # if the feature class table has no records, delete the featureclass if gp.GetCount_management(fc) == 0: gp.Delete_management(fc) print fc + "deleted from" + mdb fc = fcs.next() mdb = listmdb.next()
The only thing you will need to change is the directory under # Set workspace. Put the path to your geodatabase(s) in between the gp.workspace = “” quotes, remembering to use two slashes \\ in between folder names.
This works with personal geodatabases. If you are using file geodatabases, go to the next line under # list personal geodatabases… and replace “access” with “FileGBD” between the gp.listworkspaces(“*”, “”) quotes.
Now, start Python IDLE, which ships with the ArcGIS suite. Open up your .py file. Then, select Run –> Run Module (or hit F5). This is what you will see:
It may take some time to run, but that is time you can spend doing other things. When it’s done, those empty feature classes are history.
You can edit, save and rerun the .py file as many times as you need if your geodatabases are spread between different directories. (Or, for even more automation, copy them all into the same directory before you start, and let the script loop through them).
Python is an extremely handy tool. If you are interested in learning more about it, see the resources below.
- HowTo: Get started writing Python geoprocessing scripts in ArcGIS 9x (ESRI Support)
- A Guide to the Python Universe for ESRI Users (PDF File)