Back to Parent

Outcome


Project Motivations

Data Category Chosen : Real Estate

Having spent most of my time here in CMU working on real estate projects, this seemed liked an obvious choice and it also aligns with my career interests.

Data Set Chosen : Market Value Assessment data 2016 for Pittsburgh

This data available on WPRDC is very valuable as it gives us an insight into cause and effects of property assessment values across the city. It could be used to also decide where (neighborhood) to put your project in and the expected future value of that property. (https://data.wprdc.org/dataset/market-value-analysis-urban-redevelopment-authority)

Area Chosen : Hill District

The Hill district is a collection of five neighborhoods and presents a unique opportunity to compare neighborhoods with widely varying income levels and housing standards.

Visualization Type Used : Data Portraits - Author Lines (https://www.connectedaction.net/tag/author-lines/)

This type of data seemed to be better visualized by Data Portraits as 3 different dimensions could be easily represented and understood. This form could also portray characteristics in addition to the two dimensional comparisons on the map. The visualization seemed interesting and at the same time, easy to understand.

Below are three images which show the raw data-set, data dictionary and simplified data respectively:

Data set.png.thumb
Show Advanced Options
Data dictionary.png.thumb
Show Advanced Options
Data set cut.png.thumb
Show Advanced Options

The Story that ‘Raw’ data tells us about Hill District!

The United States of America has roughly 132 Million housing units on a total land area of 3.5 Million square miles. This gives a housing density of 37.3 houses per square mile of land area.

We compare this data with the Hill District, Pittsburgh which is taken to be comprised of 5 neighborhoods; Polish Hill, Upper Hill, Middle Hill, Bedford Dwellings & Crawford Roberts. The Hill district has a vacancy rate of 6%-25%, compared to the city rate of 7.7%. This brings out the fact that even with an above median number of houses per unit area, there is a high vacancy rate. This data suggests shortcomings in the housing sectors in the area.

Pennsylvania has a house density of 124 and Pittsburgh has a house density of 5,521 houses per square mile. The Hill district located has a house density of 7,250 to 19,320 making it a very densely populated residential area putting it at least 35% above city median. This gives us a brief idea about how important the housing data is for this part of the city and how big an impact a successful housing analysis would create.

These can be attributed to the high concentration of houses with poor conditions (as defined by Allegheny County). The census data puts the percentage of housing in poor conditions at 5%. This translates to 500 houses per mile in poor conditions. That is a lot of housing and based on average household size could house at least an additional 1250 residents per square mile.

These problems in turn have an effect on the prices of houses in the area. As a result, the median residential sale price of a house in the area is $56,500 even as cost of new construction exceeds $100,000. 

To be Noted!!

Expected Outcomes/Preconceptions :

1.A linear relationship between the amount of vacant houses in an area and the condition of those houses
2.The effect of these vacant houses on the median sale price of houses in the area
3.To justify or dismiss the pre-conception that Owner Occupied housing is generally in a better condition than rentals and hence higher valued.

Assumptions :

1.There exists an effect-cause relationship with the data compared and the results seen from the visualizations.
2.Data given at the source is accurate and not just an estimate
3.The data is at the scale of the problem
4.All houses are of the same size

Data Collection and Analysis Process

1.Log in to Western Pennsylvania Regional Data Centre’s Website (http://www.wprdc.org/)
2.Search for Market Value Assessment Data and download the zipped shape file
3.Import the zip file into Carto and extract a csv file
4.Open the file in excel and filter out the five neighborhoods of the Hill District
5.Cut out additional data columns and save as a simplified CSV file
6.Use the data to form desired visualizations comparing one set of data to the other
7.Compare assumptions/ preconceptions to visible results.


The next three pages show visualizations based on the expected outcomes as discussed in the previous section.

S3.png.thumb
Show Advanced Options
S2.png.thumb
Show Advanced Options
S1.png.thumb
Show Advanced Options

Insights from the Visualizations

1.A seemingly linear relationship does exist between the amount of vacant houses in the Hill District and the condition of those houses.
2.The effect of these vacant houses on the median sale price of houses in the area is also visible given that neighborhoods with higher vacancy rates selling houses at lower prices and those with lower vacancy rates selling high.
3.The pre-conception that Owner Occupied housing is generally in a better condition than rentals and higher valued was not supported by the visualizations. Houses with low owner occupancy were sold both at higher and lower ends of the spectrum. This led this data visualization to be inconclusive.

Other possible visualizations from the data set:

1. Comparison of sale price/ houses in poor conditions to foreclosures?

2. Comparison of condition of houses to % of houses receiving subsidies?

Additional reflections on the use of Caricatures:

The use of graphs and plots that we as Graduate students have been used to are based greatly upon accuracy of data. With the introduction of Caricatures, for the first time, I have been introduced to this form of data highlighting. For a simple data set with one or two dimension, I think it could be a very effective way of visualization and could be made easy to understand. However, complexity is where I would refrain from using caricatures. With more than two dimensions to display, a caricature may may end up as an exaggeration of a particular dimension missing the point that the author wanted to make and causing distraction.

City data used for simple visualizations such as population densities, housing densities, number of restaurants in an area etc, can very well be visualized using caricatures. However, any data needed to perform analysis on over-lapping systems and data layers should use other forms of representations to be more accurate and simple.

Drop files here or click to select

You can upload files of up to 20MB using this form.