Collecting, Visualizing and Communicating data about gentrification in Pittsburgh
We’ve all seen data visualizations — some of them effective; many of them simply pleasing graphics. We’re going to dive into the process of conceiving and designing visuals that communicate information in ways that are useful, usable, and desirable by crafting the form in ways that intuitively match the content of the piece.
Learning goals of this project:
- Storytelling
- String pieces of data together
- Build cognitive models of data for conveying information, by understanding relationships. and then helping other people understand these
- Highlighting the patterns that are inherent in the content but not visible
- Build interaction for the data
Stacie and Brett gave us the freedom to choose our own topics from a list. I chose Gentrification because of my interest in understanding how cities evolve and the impact that it has on people.
Step 1: Finding the Data & Raising Questions
Week 1: November 07, 2019 — November 09, 2019
During the first class, the group who had all selected the same topics got together to draw a mind map. The group that is looking at Gentrification includes: Amanda Sanchez, Hannah Koenig and myself. I refined our mind map for my reference.
Assume there are 3 neighborhoods that have had new buildings built in the past 5 years. Assume that you also have data on the new businesses that have emerged in these neighborhoods. What are the other kinds of data you could look at to raise some questions:
- Income/Ethnicity/Education Levels
- Crime Rate
- Amenities
- Price of Homes
- Rents
- Evictions
If this census data is from 2015, you might need to look at these metrics in the time before 2015, to identify patterns. For example, starting off in 2010 or 2005, how did these evolve. You could decide to look at just one neighborhood and narrow it down. Comparing the data would help get closer to developing the questions that you want to ask through the data. You could also make comparisons on a larger scale, may be at the national level.
Possible lenses that interest me for Gentrification:
- Has the coming in of large tech companies in Pittsburgh caused displacement of low income groups?
- Strip district and Bakery Square have evolved as the hub for large tech companies like Google, Facebook, Apple, Uber. Is there a relationship between the evolution of the built environment in the two neighborhoods and coming in of large companies?
- Do CMU and UPitt contribute to gentrification of Pittsburgh?
- Does the use of technology to augment better infrastructure lead to gentrification?
In order to choose one of these, I brainstormed on the kind of data I might need to find for each of these.
(Add Image)
Remember: Look for 3–5 types of data and that’s enough to work with. Even within that, pick up 30-maximum 50 data points. This might mean multiple zipcodes that comprise of 3–4 neighborhoods. Also begin to frame very specific questions for the relationships you want to look for. Refer to Richard Saul Wurman’s piece and use the following as a framework to frame specific questions:
L — Location
A — Alphabet
T — Time
C — Category
H — Heirarchy
(Additonal — Parts to a whole)
As you collect data, also think about which of the following is an anchor for your communication of data:
Cartesian
Polar
Geographic
Week 2: November 10, 2019 — November 17, 2019
After the last class, I decided to first spend a bit more time on understanding Gentrification as a phenomena. My knowledge of the subject right now is limited to my vague understanding plus there might be culturally different meanings associated with the term. This little bit of digging on the term ‘gentrification’ clarified two things for me:
- that the inherent process and nature of impact on neighborhoods has little difference with respect to the exact geographical location. For example the gentrification process in Mumbai has a lot of similarities with the gentrification process that I read about for US cities.
- I came across an article which challenged my assumption that gentrification is by all means evil. This article, by City Lab, is based on recent research (2019) by Quentin Brummet and Davin Reed. This particular research was conducted as part of the Census Longitudinal Infrastructure Project (CLIP) while Brummet was an employee of the US Census Bureau. Their finding was:
“In this paper, we use new longitudinal census microdata to provide the first causal evidence of how gentrification affects a broad set of outcomes for incumbent adults and children. Gentrification modestly increases out-migration, though movers are not made observably worse off and aggregate neighborhood change is driven primarily by changes to in-migration. At the same time, many original resident adults stay and benefit from declining poverty exposure and rising house values. Children benefit from increased exposure to neighborhood characteristics known to be correlated with economic opportunity, and some are more likely to attend and complete college. Our results suggest that accommodative policies, such as increasing housing supply in high-demand urban areas, could increase the opportunity benefits we find, reduce out-migration pressure, and promote long-term affordability.”
This made me think of an additional question that I hadn’t thought of before: “How do original residents of a neighborhood benefit from the gentrification process?” Taking some feedback from Stacie I refined my research questions
Revised Possible Questions:
- How has the coming in of large tech companies to Pittsburgh impacted the demographics of existing neighborhoods and the displacement of prior residents?
- How does the evolution of the built environment in two Pittsburgh neighborhoods — The Strip District and Bakery Square — relate to Pittsburgh as a current hub for large technology companies?
- How does the student population of universities that are resident to Pittsburgh impact gentrification in local neighborhoods?
- How do original residents of a neighborhood benefit from the gentrification process?
I also looked at the projects by students from last two cohorts at CMU to understand the approach they took, the challenges they faced with data and what kind of tools they used. That got me thinking about what medium and what softwares I’d use to represent my data. In the last project I combined my comfort zone (making physical object) with exploring a new software (AfterEffects), but I’m seeing this project as an opportunity to push myself out of my comfort zone and do something purely digital. I realize the need to augment my software learning, even if I don’t end up using them on a daily basis in the future. When I mentioned this to Stacie in email for feedback, she recommended to not think about this aspect right now and let the data guide the choice. So I’m keeping my mind open.
Reading Ema’s medium post for the same project from last year, I realized that I’m thinking on similar lines: the changes in Pittburgh due to coming in of large tech companies. I’m wondering if there’s value in choosing a slightly different question than hers, but access to data is going to be a deciding factor for that. I also learnt that I could find specific data that I’m missing by writing to people, a possibility that I hadn’t given too much considerations earlier.
In order to measure the effects of gentrification, it is important to map the changes over a period of time. The challenge that I’m facing right now is finding data for the same parameter over a period of time. In the absence of that, the questions that I was thinking about become irrelevant. Additionally, I want to be mindful of the fact that the whole point of this exercise is not just data collection, but synthesizing and communicating it. So I’ve decided to let the data reform/guide my question. Going through the clean data sets that have been provided to us, I’m collating data types that generally relate to the idea of gentrification. I’ve added population change from 1960s, change in median income from 1999–2009, age group, education levels etc.
I have hit kind of a dead end with my approach because I’m trying too hard to align the questions that interest me with the data types I’ve collated above. Talking to Anuprita has helped me see this more clearly and the flaw with this approach. She recommends that I should keep my interest aside for a minute and just look at all cleaned data sets with a fresh eye to find things that could have an interesting relationship with each other. I mentioned to her about my fear of taking that approach: going for generic and obvious question, that may have already been explored as a full fledged research by someone. Her response ‘So What?’ has brought me back to realizing that there could still be interesting ways of representing a narrative that I might find boring at a glance. Since the focus is again to learn about effective communication, I think it could be a meaty challenge to distance myself from my interest and put energy into doing the visualization really well. I’ve decided to let the data guide me to formulate my question.
Fear of going for an obvious question!
In class discussion
Yau’s data points:
- Logarithmic: powers of 10, like 1,10,100,1000….
- Linear: 0,1,2,3…
- Categorical: like types of weather (cloudy, sunny, raining)
- Time: Day/Month/Year, Linear/Cyclical, seasonal
- Percentage: parts of a whole
- Ordinal: Spectrum like Bad, neutral, good
Relationship with LATCH
- C and T are the same
- Ordinal can be mapped to Hierarchy
- Linear could be mapped to Wurman’s Alphabet, Time and Hierarchy
- Location
Moving to form buckets/groups of data points
- ways to riddle down the data to bracket them
- Keep these buckets evenly spaced. If you have the data for 100 years, you can’t break them up as first 50 years and the next 5 years as 10 years bucket each. Instead break up 100 years evenly, for example, each bucket of 10 years
- the level of granularity — even within a bucket, there might be level of granularity that can be achieved. For example if the population is sorted as percentage increase or decrease, the first two buckets could be ‘positive’ and ‘negative’. But even the two buckets can be broken up further.
- Organize all the data into buckets
Coordinate Systems
- Start thinking about the coordinate system: Cartesian, Polar, Geographical
We also saw examples in class to understand the idea of good fit of form and content, and they use different coordinate systems. The ones that I liked and some learnings:
- Crayola Color chart was interesting because it shows the granularity of the content on a single page using just the cartesian coordinate system
- Front Page News coverage — Different locations as the buckets. It morphed the idea of a physical layout with an abstract representation to codify different topics given different attention in different cities
- When do things become so abstract that people might find hard to wrap their head around it, how do you facilitate that? Remember, you can leverage sound channel too, as we learned in the last project
- You could also do experiential and tactile visualization, like the New York High School Dropouts. This was more of an art piece, but it is possible to do experiential representation of data
- For the positive and negative changes in population, you could show pieces that go up and down. May be they are pieces that can be added or taken away, like Josh Lefevre’s project from last year
- State of the Union — taking a cognitive leap. Using the colors and graphics that can be recognized as a flag and using that to weave the data together for things that popped up over time in the American history
- What the Planet looks like breathing — I like this one particularly because it did something so simple (overlaying images of the earth from different seasons) but told such an interesting story. This reminds me that for this project the way the story is being told is more important than the story itself
- The Causes of Mass Gun Violence Identified — the interesting aspect of this was that the climax was effective in indicating the huge disparity between the assumed gun violence by terrorists and all cases other than. The comparison was really powerful and helped me learn that boring data has the potential to be represented in a powerful manner
- You can show distance in terms of time — The Abortion Clinic example
Tasks for Thursday: Map each of your columns to scale (Latch and Yao) and then through each piece of information, what are buckets that you are proposing, and the lead to choosing 50 data points. Lastly refine the question if it needs re-iteration.
Thursday is here and I have made some progress, yayy!!
After Tuesday’s class I have a better understanding of what is the next step in the process. I’m keeping that in mind when going through the cleaned data sets to formulate my question. The idea of bucketing data and apples to apples comparison is helping me filter what I should pick up. For example, I’m looking for data sets that have neighborhoods so that I don’t spend time on matching zipcodes to neighborhoods.
Keeping Neighborhoods as one of the data types, I have identified 3 other data sets that could have an interesting co-relation: Education levels, Employment and Transportation. While going through the Education Data set provided to us I noticed that there was information on different levels of education of residents mapped to the neighborhoods. This seems interesting and I have decided to map the relationship of differing level of education of residents to the transportation choices.
To build on this idea further and narrow down my topic, I’m making some assumptions based on secondary research:
- Transportation access: A study conducted in August 2018 by Annie Barreck of the University of Montreal’s School of Industrial Relations, concluded that more than 20 minutes of commute time to work increases burnout. Since my question is about the post gentrification scenario, I have assumed that the residents who moved to these neighborhoods decided to do so because of the presence of a good level of infrastructure, including transportation. Access to a good transport network to get to work within the above mentioned healthy time range is often a huge factor that people consider while deciding on a new neighborhood to move to.
- Environment Friendly transportation: The various modes of transportation are put on a spectrum of environment friendly commute options to see if there might be any relationship of those choices with respect to neighborhoods.
- Driving as a means of commute: The data for commute to work by ‘driving alone’ does not include electric and hybrid cars
- Narrowing down on my neighborhoods: I’m choosing to look at the neighborhoods that have been identified as gentrifying neighborhoods in different sources. Based on Pittsburgh City Paper, National Community Reinvestment Coalition, and the locations of large tech companies and big startups, following names come up often:
- Downtown (caused by Black displacement)
- Mount Oliver
- Mt. Washington
- Lawrenceville
- Polish Hill
- Bloomfield
- Garfield
- parts of the North Side (caused by Black displacement)
- Polish Hill
- St. Clair. (caused by Black displacement)
- Bakery Square — Autodesk, Google
- Oakland — Facebook was here earlier, Philips
- Strip District — Apple, Argo AI, Facebook, Uber ATG
- East liberty — Duolingo, AlphaLab
Questions & Doubts: The year of the cleaned data sets, since some of the tech giants came to Pittsburgh after 2010.
The question I’ve narrowed down to:
In Gentrifying neighborhoods of Pittsburgh, what is the relationship between level of education of residents and the transportation choices they make?
In class discussion
Data Types
- What type of data have you collected that you believe would be good to use for this project?
Research Questions:
- What is the question you aim to investigate through the data you collected?
- What types of relationships exist between Pittsburgh neighborhood residents and
Visual Cues:
- What representations communicate the information as well as possible without needing a key?
Tasks for Tuesday: Map all your data type with respect to Scales and Range. Post the buddy work here. After that, based on the visual cues discussed in class, make options for visual cues for your data set. Think about the cognitive connections and layering. How many of them can you combine together? Only think about the layering
The in class buddy activity was really helpful in refining my Research question. I have revised the list of Data types with respect to this.
“In gentrified neighborhoods of Pittsburgh, is there a relationship between education and income level of residents with the transportation that they use?”
Data sets to look at:
- Neighborhoods
- Education Levels
- Income levels — I need to narrow down whether I want to use multiple buckets (like range of income levels) or use two buckets (like below and above poverty line). Using only one bucket (median income) would not be helpful since this is a parameter for drawing comparison
- Modes of transportation
- Access to Transportation
- Place of Work (still questioning if this is relevant)
I also realized that I need to revisit the assumptions that I made earlier and if they warrant modification. The first is the one regarding the access to means of public transport. Based on the discussion in class, I have decided to add data about access to transport: through number of bus stops and number of bike stops.
For the number of bus stops: The data of around 8000 bus stops mapped with respect to latitude and longitude was converted to respective neighborhoods by Christianne. I requested her if she’d be willing to share that with me.
For the number of bike stops: I have two options to do this:
- overlay the neighborhood map of Pittsburgh with the map of bike stops for Healthy ride (https://healthyridepgh.com/stations/)
- I’ve written to Pittsburgh department of City planning to check if they have any data for the bike stations in the city
I have spent some time engaging with my data to see what are the preliminary patterns I can see. I’m hoping this would help me get some ideas about the visual cues I could use.
- Around 40% of population of Pittsburgh lives in Gentrified areas
- Most of population in these areas are High School graduates
- Four neighborhoods have residents with Bachelors degree as their highest demographic: Allegheny West, Friendship, Shadyside, Strip District. For these neighborhoods, the population under poverty line is between 2%-20%
- North Oakland’s most residents hold a Post graduate degree, but 39.2% of them are under poverty line
- 52.6% of the resident of North Oakland walk to work
- Central Oakland has the highest number of people who walk to work (62.4%), which is probably because 64.8% of them are under poverty line
- Central Oakland has the highest number of people under poverty followed by California Kirkbride and Northview Heights.
- Maximum number of people Driving alone as their primary mode of transport
- The neighborhoods where maximum number of residents use public transport do not even have the highest number of bus stops as compared to the others. These neighborhoods are Manchester, Northview Heights, St. Clair
Some initial sketches:
In class discussion
Stacie realized that a lot of us could benefit from individual discussion and guidance at this point. This was really helpful since I was feeling stuck to move from the data to visualization. I felt that I had understood the process conceptually but couldn’t really figure how to proceed with my project. This was a weird space to be in, where I understood something yet it didn’t connect and tie together for what I was trying to do.
Talking to Stacie has helped me move forward, although it still took me some time and multiple questions to get there. I have realized that I need to break down each of my data types into smaller components. These smaller components can then be easily thought of as visual cues and then I can layer them with more complexity. For example, each set of the level of education can be divided into smaller equally sized buckets of percentage. Since all of this data is in terms of percentage, different buckets could be 0–10%, 10–20%, 20–30%….and so on. Each bucket of 10% can then be represented through one visual cue (for example line, circle, shape of book, scroll etc.). These can then be mapped on to the neighborhoods in terms of amounts or scales more easily to draw a visual connection.
I have done a similar task for transportation and starting to think about visual cues for both these. I still don’t have a clear image in my head about how to layer different data types to a coherent visualization or what I’m going to find through my data at the end, but I’m sticking to Stacie’s advice, ‘Just take baby steps’.
I have taken a lot of pictures for inspiration from the amazing books that Stacie had brought to class. The ones that I find really interesting are the ones that do not have a literal representation of the elements. For example, in the following visualization is about parking violations by Diplomats between 1998 and 2005 with respect to the neighborhoods around United Nations Headquarters in New York. It uses polar coordinate system instead of the obvious geographic coordinate system and just lines instead of literal representation of anything related to parking: cars, no parking signs, violation symbol etc.
This also reminds of Giorgia Lupi’s visualizations which I love, and I feel this might be a way to do something in line with the visual style of my artwork. The following work of her has a very human and artistic quality to it.
Tasks for Thursday: The Shedroff reading, start moving from sketches to digital and prototyping, take a look at the prototyping tools.
Based on my discussion with Stacie and Brett in class, I have done two things:
- tried some visual cues for all my data types. This has helped me lay out some options right in front of me to mix and match.
- added a column for income and separated it into buckets similar to the smaller range separation for Education and Transportation.
It is time to move to digital to start visualizing different layers as a coherent for. I think much better through hand sketches, so I have started thinking of overall visualization through preliminary sketches. Alongside, I’m looking at different prototyping tools to gauge which one would allow me to achieve the result I intend to have.
In class discussion
Goodness of fit between the form and the content: Do a visualization, put it in front of someone else and don’t tell them all details. Give them a broad context and ask them if they can guess what that content represents. If tey are close, the form is working fine.
Stacie asked the following questions as a recent-ring exercise for us during the class.
Question 1: What question or questions are you exploring?
‘In gentrified neighborhoods, how does the education and income levels of residents relate to the forms of transportation that they predominately use?
Question 2: What types of data are you using?
- Neighborhoods: 36 neighborhoods identified as ‘gentrified’ based on various resources
- Population (Still questioning this)
- Education levels
- Income levels
- Types of transportation: 9 categories that sum up to 100% for each neighborhood
Question 3: What coordinate system are you using and Why?
I’m using geographic coordinate system because it helps to show all these parameters on the same plane. This might be absolute positions or relative to each other with a general sense of geographic direction (north, south, east, west)
Question 4: What scales do you plan to use for each type of data?
- Neighborhood: Location / Categorical / Alphabet
- Education: Linear / Alphabet / Percentage / Categorical / Ordinal
- Income level: Categorial / Linear / Ordinal
- Transportation: Categorial / Parts of a whole since the sum of all categories is 100%
Question 5: What are the ranges you plan to use for each scale? (These are the buckets)
- Neighborhoods
- Education: 5 categories: high School drop out, High school, Associate degree, Bachelors degree, Post-graduate degree. This data is in the form of percentage of total population of the neighborhood
- Income levels: Median income levels per neighborhood divided into categories separated by 10,000$
- Types of transportation: 9 categories of transportation
Question 6: What would be the indexical structure of your narrative?
This question relates to how the layering of information would happen in the visualization. Would some things be turned off and some turned on? How would you tell a coherent story through these data types.
Think about what is the projected pathway through your data? What would someone learn and discover at each step?
- Start by showing neighborhoods
Question 7: What visual/oral/temporal cues do you imagine to use for your content and Why?
What is your projected pathway through your data (diagram)? What will they see/uncover/learn at each step?
Do you propose using a narrative and/or indexical structure for your visualization? Why? What data will viewers be able to see simultaneously (layer)?
8. What variables (visual, temporal, aural) do you plan to use for each piece of data? How are they good cognitive matches?
- Neighborhood: Position on map
- Education: Color and Length
- Income level: Shape
- Transportation: Color type or pattern
Learnings from Interactive Sensory Patterns by Stacie Rohrbach:
Pattern Detection: Number Hierarchy + Temporal Building
- People can usually detect around 7 different
- Whatever you are making for this visualization is not a static piece, there’s a sequence to introducing parts. in order for your audience to deeply engage, it is important to build things over time and not show everything at once.
Representation: Number Hierarchy + Temporal Building
- Categorization: how are you building the categories
- Pacing + Simultaneity: wherever you want relationships to be made between different types of data, or wherever it is where important to see information at the same time as opposed to seeing everything
- Narrative+Indexical Structures
- Expectation & Perception: The appropriateness principle by Donald Norman. In simple words what is the goodness of fit for form and content? One good idea is to take two opposite adjectives and make a spectrum, see where do you want to have your form lie on this.
Interaction:
- Customization: You really have to consider the role of your audience. How much do you want your audience to engage with the data? How much choice or freedom do they have?
- Mimicking Known behaviors: When is valuable to use known conventions and when is it detrimental to not be pushing the common conventions?
Experience
- Recall and Engagement: setting the context
- Discovery and Critical Thinking:
Tasks for Tuesday: Answer 2 main questions: 1) Narrative and/or indexical structure, 2)The visual/oral/temporal cues and layering. 3) Your core question can be broken up into smaller questions. This might also help you arrive at the narrative structure.
In class presentation
Feedback and Learning:
- An overarching feedback was on the use of additional shape for each of the data type (spectrum for transportation, coins for income and words for education). The choice of visuals for each of these was based on the close cognitive connections of form with the subject. I’m trying to figure out how to layer the data types without losing theses close cognitive connections of form.
- Visual style: In line with my idea of making an artistic quality visualization, my first attempt was to create a collage like visual style. From the feedback I received, this seemed to have been successful. Collage, by its nature, means a collection of multiple elements in a composition, hence my challenge right now is to figure how to achieve this character in layering of elements as opposed to using addition of shapes.
- Research Question: From the beginning, I’ve intended to look at the choice of transportation from the perspective of environmental impact, and hence if there is a relationship between that choice and the education level. Trying the visualization from that perspective has helped me convey that slightly more effectively than earlier. This also means that my question probably needs re-adjustment too.
- I’m still questioning whether I need the geographic coordinate system or if I can use the cartesian coordinate system.
Other Notes for self:
- Make a clear table for Data — Scale — Range
- What are the toggles I want to use?
- Start layering visual and aural cues
In class discussion
4 things that need to be done and thought of simultaneously:
- Overall Framework of how you are presenting the patterns
- Setting a context/ premise for your topic
- Representation
- Layering
Transportation spectrum: each value is scaled through a line instead of an area, since scaling area leads to distortion of data
List of References for the project: