Cities are always rich in citizen data that can be used to improve programs and services, says Luca and his research colleagues.
by Michael Blanding
Although citizens are generating more data than ever before, most cities have not fully used this flow of information to improve their services and become more efficient. “In the past, cities were developing in analog mode, trying to measure incomplete data with incomplete data in low-information environments,” says Harvard Business School Assistant Professor Michael Luca.
This can change. Through the Internet, mobile apps and a variety of useful online programs, residents enrich the information pool with every hit on their computer or smartphone. In addition, cities are developing their own data collection and processing capabilities through advances such as sensor networks and sophisticated modeling software.
And if cities could use all of this data to give the residents what they need – for example, Google Street View’s economic development or reviews of yelp restaurants to target hygiene inspections?
“SO THERE IS NOW DATA THAT IS BEAUTIFUL AND BRILLIANT”
“There’s all sorts of data now,” says Luca, “and if you use it carefully, you can change the way every rule is evaluated and every operation performed.”
In a recent discussion paper titled Big Data and Big Cities: The Promises and Limits of Better Measuring Urban Life, Luca and Three Collaborators say cities have never been better able to use large volumes of data generated around the world. The key is how to use it. Luca, Edward L. Glaeser and Scott Duke Kominers (PhDBE 2011) at Harvard University and Nikhil Naik, a Ph.D. candidate at the Massachusetts Institute of Technology’s Media Institute, cite three trends that make cities particularly eager to capitalize on big data.
First, the shift in open data has led cities to digitize more personal information, which puts everything online in the tax records and public health check results. “They take a dataset that was in an obscure database or on paper, and the public can now be innovative,” says Luca.
The Google Street View camera is moved to remote Wyoming. Source: Kevin Dooley
Second, citizens generate what Luca calls “digital exhaustion,” the data generated online as part of their daily activities. These could be captured by cities to provide clues to the behavior of their citizens. “Yelp is used to help people find restaurants, not to tell cities where to go, but it could be used for that purpose,” says Luca. Similarly, Google search in different geographic regions could give decision makers an overview of their citizens’ concerns.
After all, private companies are more than ever ready to share their own internal data with the government to better understand their employees. For example, they gain insight into the health behavior of workers in different neighborhoods.
“There’s so much data now, it’s exciting and scary,” says Luca. “Cities need to think carefully about what data should be used, how it will be used, and when it will not be used.”
Tame the data flow
To master all this data and better predict policy outcomes, Luca believes cities need to develop algorithms to coordinate their own data with online information.
For example, in earlier work, his colleague Nikhil Naik used machine learning to analyze images. Using these techniques, he recorded about 3,600 block images of New York City, which he had received from the Google Street View Image API, and learned from the computer to recognize various features, including streets, sidewalks, buildings, and trees.
In their study, the Luca team linked the images of Naik with the household income of about 2,400 blocks provided by the city online. “The revenue serves as a label for the images, and the machine then discovers the connection between the functions and the revenue,” explains Naik.
After some “training,” the computer uses the algorithm it generates to predict revenue based on images for which data is not available. When researchers analyzed these numbers, they found that image analysis can deliver a much more accurate income forecast than other metrics. Statistically, 77% of possible income variations were attributed to the images. On the other hand, other measures such as race and education account for only 25% of the deviation.
All the more interesting, the Luca team was able to apply the algorithm developed in New York to street-view imagery in Boston, stating that income in the city was accurately predicted with a variation of 86%. By using such an algorithm, cities can more effectively determine the impact of economic development initiatives on a block-by-block basis without having to wait for annual income surveys.
For example, if city officials wanted to know how to license three new companies for a particular block, they could monitor changes to Google Street Views by referencing that data to online analytics in the neighborhood. and real estate valuations of real estate locations like Zillow.
“They can then create an algorithm to continually evaluate the quality of life of the neighborhood and see how that quality changes over time,” says Luca. “This can be particularly useful for cities that have detailed survey data every few years, but are interested in policy changes that are much more common.”
Yelp used to target dirty restaurants
In his own work, Luca used techniques that were similar to the textual analysis of Yelp reviews to help cities determine which restaurants should be inspected. “Cities are currently sampling,” he says. If you theoretically divide the restaurants into two halves with the hygiene ratings (given by yelp users), with the top half clean and the bottom half dirty, then “they have a 50% chance of finding a dirty spot”.
To improve that percentage, Luca trained the computer to review Yelp’s ratings. It compared certain word combinations with the number of violations that restaurants had received to teach the computer to identify factors that could indicate a dirty restaurant. When he applied the algorithm to a second group of restaurants, the chances of finding a “dirty” restaurant increased by 80%.
In a future article, graduate students Luca, Glaeser, Kominers, and Harvard, Andrew Hillis, further refined the algorithm to develop a tool for the city of Boston. Asked about the most worrying health issues, the researchers and Boston organized an open tournament to do crowdsourcing using the best Boston-specific rating algorithm. Over 700 people participated in the competition.
The research team tested algorithms among 23 finalists. The tournament was won by a London statistician. “Using a Boston-specific algorithm, we found you can reduce the number of inspectors by 40% and see the same number of violations,” says Luca. In other words, cities could be much more efficient with existing resources – achieving the same results with less money or improving their performance at no extra cost. The researchers are currently working with the city to develop and test this algorithm to implement a version for practical use.
Basically, according to Luca, many of the city’s activities could be more effective through such an approach. Using Google searches that connect to specific geographic areas, government analysts could identify the type of jobs people are looking for and use that information over time to predict unemployment and the nature of training programs.
In the same way that New York’s Street View data was used in Boston, policymakers were able to apply algorithms created in places with large data for those with insufficient data to better control the policies. Naik is working on a way to apply Street View technology in rural areas, such as villages in Indonesia, where it is expensive and difficult to conduct accurate surveys.
Results may vary
Of course, the application of such data has some important drawbacks: An algorithm created for the entities found in New York may not be easily transferable to small Indonesian villages that have radically different types of characteristics.
Creating algorithms that can make accurate predictions in a variety of environments requires a lot of work and experimentation.
“People believe big data can solve any problem,” warns Luca. “Basically, we believe that the right questions and tools need to be linked to the right data.”
However, with a bit of creativity and ingenuity, Big Data could provide cities with detailed information that helps city authorities improve the quality of life in the city.
“You could know street by street how people are doing,” says Luca. “Then the operations could come and say,” How can we do it better? “