MassGIS Data — Building Structures (2-D, from 2011-2013 Ortho Imagery)


Click to see the full size image.

This dataset consists of 2-dimensional roof outlines ("roofprints") for all buildings larger than 150 square feet, as interpreted by a contractor (Rolta) for the whole area of the Commonwealth using color, 30 cm. DigitalGlobe ortho images obtained in 2011 and 2012, supplemented with LiDAR (Light Detection And Ranging) data collected from 2002 to 2011 for the eastern half of the state.

The roofprints as delivered were enhanced by MassGIS using Normalized Digital Surface Models (NDSMs) derived from the same LiDAR data. Other layers were used, including the Level 3 Parcels. to aid in review, especially where LiDAR data were not available. See the section "Roofprint Shifting" below for details on MassGIS’ work to edit the roofprints. For information on other updates see the "Maintenance" section.

Click to see the full size image.

Criteria Used for Creating a Roofprint

The following is a summary of the guidelines used in creating roofprints (as described in the Request for Response for this project):

A roofprint is a map polygon, with real world coordinates, representing the perimeter outline as it appears in aerial imagery of every structure or portion of a structure which has a roof. Roofprints shall be mapped for all structures equal to or larger than 150 square feet including the following:

  • Residential, commercial, and industrial structures (including roof over porches and decks)
  • Trailer homes and offices
  • Mobile homes
  • Garages, sheds, and other isolated structures


Features that do not have a roof covering usable areas, such as an open deck, the top surface of an electrical transmission or cell tower base, platforms for utility equipment, or other structures which do not have a usable “interior” or covered volume, shall not be interpreted for mapping. Also, vehicles, including truck trailers that are parked with or without a tractor attached, boats, airplanes, etc. should not be mapped. However, trailers with any kind of residential or business use such as temporary classrooms, construction site field offices and the like must be captured.

Greenhouses were generally considered not structures, unless attached to a roofed structure. Roofed dugouts of sufficient size were also included as structures. Tanks and covered reservoirs and pools with temporary covers were not considered structures.

Polygon creation had the following guidelines:

  • Outlines will usually be made up of orthogonal segments (all segments parallel or at right angles) unless the building is octagonal, round, triangular, etc.
  • Outlines are to be traced at the elevation of the eaves or lowest part of the roof adjacent to the exterior vertical walls. If there are multiple roof levels for a single structure then internal boundaries created by joining the separate roofprints must be dissolved. Any part of the structure which is covered by a roof is included, so two buildings connected by a covered walkway are to be represented as one polygon.
  • Any roof offset, jog or projection for which all sides are more than 3 feet in length should be captured.

In the creation of the outlines, building “lean” was not compensated for. (MassGIS addressed this issue for part of the state. See the section "Roofprint Shifting" below.) No attributes were included in the creation of the polygons.

Criteria for Acceptance

The interpretation error rate was less than 0.5%, conformance to this standard was determined as follows:

For each of the six delivery areas, MassGIS selected tiles randomly (using a ‘randomizing’ spreadsheet created within MassGIS) from the 2008/2009 ortho imagery. Tiles were selected until the total number of structures in the selected area exceeded 15,000. The roof outlines in the selected tiles were then reviewed against the DigitalGlobe imagery. Additional layers were used to supplement the review, including the LiDAR datasets, and the Level 3 Parcel dataset, especially where LiDAR data were not available.

The error rate was defined using two statistics from the review of the sample tiles for each delivery:

Eo = The number of errors of omission – structures that were missed

Ec = The number of errors of commission – structures that are not in fact structures (as defined above).

The combined error rate for interpretation was calculated to be Eo + Ec.

Elevated objects such as roof outlines in aerial imagery may appear displaced with respect to the base of the structure. In order to minimize or eliminate the effects of such displacement (often referred to as "building lean"), MassGIS undertook several automated processing steps to shift roofprint polygons as delivered by Rolta. Building lean effect may cause some buildings to cross over into adjacent parcels or overlap other features such as streets and water bodies. The shifting process was performed only in areas where MassGIS’ LiDAR Terrain Data were available (Eastern Mass. inside of I-495; for details see the section "Adjustment Method" below). As a result, many of the shifted polygons better approximate building footprints.

Background on Building Lean

Ortho image data layers are really mosaics made up of portions of many overlapping aerial photo frames. The yellow lines represent the seam lines of these photos:

The principal point of an aerial photo is the intersection of the optical axis of the camera lens and the photo image. The nadir is the point directly beneath the camera at the time of exposure. On a vertical aerial photograph (looking downward) the nadir and the principal point will be at the same location.

If a building is close to the principal point, the roof and base will appear to coincide (the base and sides of the building will not be visible; note the 26-story JFK Federal Building in Boston, at left in the image below):

If a building is far from the principal point, toward the edge of the photo, the top of the building will appear to be farther away from the principal point than the bottom of the building. The building will appear to "lean" away from the principal point:

The red line in the above image is the "seam" between two different photos. The buildings on either side of this line are from different photos, so the buildings seem to lean away from their respective principal points.

The magnitude of lean can be determined by:

  H / (D+d) = h / d

Where H is the camera height, h is the building height, D is the distance of the building from the principal point.

Or, since d is usually much smaller than D, D+d

D, so


( D / H ) x h

The average height of each building has been obtained from a LiDAR Normalized Difference Surface Model (NDSM). This raster is the difference between the LiDAR last-return elevations, and the LiDAR model of the ground.

It was assumed that the location of each principal point is at the mean center of each seam polygon, and that the aircraft altitude is 5,000 meters.

The lean also has a direction, so polygons representing "roofprints" have been moved a distance d in a direction opposite to the apparent displacement in the photo.

In the image above, the red polygon is the original roofprint; the green polygon is the rectified (shifted) roofprint.

MassGIS used five Input Datasets:

1. DigitalGlobe 2011-2012 Orthoimagery (six blocks)

- Worcester High Value Area

       Charles River



- LiDAR for the Northeast

- 2004 SE Massachusetts Pilot

- Buzzards Bay (parts of Bristol and Plymouth Counties)

[Manageable processing areas were determined based on the intersections of these regions.]

For each LiDAR project area, all LiDAR returns were filtered to create two ArcGIS Terrains:

    Any return classified as "Ground" was used in a "Bare-Earth" Terrain The last returns classified as "Ground" or as "Unclassified" went into a "Last Return" Terrain

These Terrains were then linearly interpolated to two 1.0 meter rasters. Finally, the Bare-Earth raster was subtracted from the Last Return raster, resulting in a Normalized Difference Surface Model (NDSM).

3. Orthoimage polygon tiles (irregularly-shaped "seam polygons") corresponding to each DigitalGlobe area

NDSMs were cut into smaller subimages (tiles) using the seam polygons for the corresponding area.

4. Seam center points were determined for each seam polygon.

5. Un-recified roofprint polygons (unshifted polygons as delivered by Rolta)

A model was developed in Trimble eCognition Developer 8.7.2 and run on eCognition Server that determined the distance and direction from each roofprint centroid to the tile’s seam center, as well as the mean height the building. Output was a point shapefile.

An ArcGIS Toolbox script that prepared the output points and roofprints for rectification was run, followed by an ArcGIS Python script that created a dataset of shifted roofprint polygons.

The two sets of shifted roofprints in overlapping processing areas were examined, and where there were differences, the roofprints with the more accurate shift were kept.

Roofprints straddling seam lines usually contain two (or more) points with different values for angle and distance. These roofprints were generally not moved, but were coded TOUCH_SEAM = 1 so they could be tracked after processing.

Sources of possible error in the shifting process include:

    The orthoimage used to determine the roofprint The roofprint polygon as drawn The estimated position of the principal point The estimated camera altitude The LiDAR NDSM raster The estimate of the building height derived from the LiDAR NDSM raster

Situations which may cause the roofprint to shift more or less than it should:

    The building represented by a roofprint was not built at the time of the LiDAR acquisition. Trees may overhang a building, so that the elevation obtained may be higher than the building height. Greenhouses may be represented in the roofprints layer, but not in the LiDAR. A single roofprint representing a complex roof with different elevations may be shifted based on a single elevation value.

In a small number of cases, the shifting process caused some polygons to overlap others. These were found using ArcGIS topology and the polygons were moved manually so that no overlaps were present. Once the shifting process was complete, the shifted polygons replaced those in a copy of the original Rolta deliverable. The version of the Structures dataset distributed by MassGIS, therefore, is a hybrid of as-delivered polygons and those shifted by MassGIS. Finally, MassGIS took the hybrid layer and performed an Identity operation with the Survey-based Communities layer to populate the TOWN_ID fields.


Polygons in this layer contain the following fields, all added by MassGIS:

Leave a Reply