why Pilgrims | what we do | who we are | our work | contact


For each project we gather a team with complementary skills and varying backgrounds. We prefer to work together with passionate, involved, flexible and independent professionals. 




Technology review

Four new hacking groups have joined an ongoing offensive against Microsoft’s email servers

A Chinese government-linked hacking campaign revealed by Microsoft this week has ramped up rapidly. At least four other distinct hacking groups are now attacking critical flaws in Microsoft’s email software in a cyber campaign the US government describes as “widespread domestic and international exploitation” with potential impact on hundreds of thousands of victims worldwide.

Beginning in January 2021, Chinese hackers known as Hafnium began exploiting vulnerabilities in Microsoft Exchange servers. But since the company publicly revealed the campaign on Tuesday, four more groups have joined in, and the original Chinese hackers have dropped the pretense of stealth and increased the number of attacks they’re carrying out. The growing list of victims includes tens of thousands of US businesses and government offices targeted by the new groups. 

“There are at least five different clusters of activity that appear to be exploiting the vulnerabilities,” says Katie Nickels, who leads an intelligence team at the cybersecurity firm Red Canary that is investigating the hacks. When tracking cyberthreats, intelligence analysts group clusters of hacking activity by the specific techniques, tactics, procedures, machines, people, and other characteristics they observe. It’s a way to track the hacking threats they face. 

Hafnium is a sophisticated Chinese hacking group that has long run cyber-espionage campaigns against the United States, according to Microsoft. They are an apex predator—exactly the sort that is always followed closely by opportunistic and smart scavengers.

Activity quickly kicked into higher gear once Microsoft made its announcement on Tuesday. But exactly who these hacking groups are, what they want, and how they’re accessing these servers remain unclear. It’s possible that the original Hafnium group sold or shared their exploit code or that other hackers reverse-engineered the exploits based on the fixes that Microsoft released, Nickels explains.

“The challenge is that this is all so murky and there is so much overlap,” Nickels says. “What we’ve seen is that from when Microsoft published about Hafnium, it’s expanded beyond just Hafnium. We’ve seen activity that looks different from tactics, techniques, and procedures from what they reported on.” 

By exploiting vulnerabilities in Microsoft Exchange servers, which organizations use to operate their own email services, hackers are able to create a web shell—a remotely accessible hacking tool that easily enables back-door access and control of the infected machine—that allows them to control the compromised server over the internet and then pivot to steal data from throughout their target’s network. The web shell means that even though Microsoft has issued fixes for the flaws—which only 10% of Exchange customers had applied by Friday, according to the company—the adversary still has back-door access to their targets. 

Applying Microsoft’s software fixes is a crucial first step, but the total cleanup effort is going to be much more complicated for many potential victims, especially when the hackers move freely to other systems on the network.

“We are working closely with CISA [the Cybersecurity and Infrastructure Security Agency], other government agencies, and security companies, to ensure we are providing the best possible guidance and mitigation for our customers,” a Microsoft spokesperson says. “The best protection is to apply updates as soon as possible across all impacted systems. We continue to help customers by providing additional investigation and mitigation guidance. Impacted customers should contact our support teams for additional help and resources.” 

With multiple groups now attacking the vulnerabilities, the hacks are expected to disproportionately affect organizations that can least afford to defend against them, like small businesses, schools, and local governments, said former US cybersecurity official Chris Krebs. 

“Why, though?” Krebs asked on Twitter. “Is this a flex in the early days of the Biden admin to test their resolve? Is it an out of control cybercrime gang? Contractors gone wild?”

With potentially hundreds of thousands of victims worldwide, this Exchange hacking campaign has affected more targets than the SolarWinds hack that the US government is currently struggling to clean up. But, as with the SolarWinds hack, numbers aren’t everything: the Russian hackers behind SolarWinds were highly disciplined and went after specific high-value targets even though they had potential access to many thousands.

The same is true here. Even if the total numbers are alarming, not all compromises are catastrophic. 

“All of these are not created equal,” Nickels says. “There are vulnerable Exchange servers where the door is open but we don’t know if an adversary has gone through it. There are slightly compromised servers; maybe a web shell is dropped, but nothing beyond that. Then there is the other end of the spectrum, where adversaries had follow-on activity and moved to other systems.”

It’s rare for the White House to comment on cybersecurity issues, but the Biden administration has had cause to talk a lot about hacking in its first two months in office, between the SolarWinds hack and this latest incident.

“We are concerned that there are a large number of victims and are working with our partners to understand the scope of this,” White House press secretary Jen Psaki said during a Friday afternoon press conference. “Network owners also need to consider whether they have already been compromised and should immediately take appropriate steps.”

Four new hacking groups have joined an ongoing offensive against Microsoft’s email servers 2021/03/06 21:27

As the Texas power crisis shows, our infrastructure is vulnerable to extreme weather

On Valentine’s Day, a rare burst of Arctic air spread across the central US and into Texas, dropping temperatures there into the single digits and nearly causing the state’s power grid to collapse. A state known for its abundant energy resources saw widespread failures of natural-gas and electricity systems that left more than four million Texans without power for days.

The proximate cause of Texas’s grid failure is now well understood. Frigid temperatures drove electricity demand to a new winter record that exceeded even the “extreme” demand scenario considered by the state’s power grid operator, the Electric Reliability Council of Texas, or ERCOT. Then dozens of natural-gas power plants and some wind turbines rapidly went offline, plunging the Texas grid into crisis. To prevent the whole grid from going down, ERCOT ordered utilities to initiate emergency blackouts and disconnect millions of customers. 

Scientists are still working to determine whether the fast-warming Arctic is driving more frequent breakdowns of the “polar vortex,” which precipitated the Texas freeze. But we know that climate change is making extreme weather like heat waves, droughts, wildfires, and flooding more frequent and more severe. Any of these events can push our critical infrastructure to the breaking point, as happened in Texas. How can we prepare?

Climate resilience will require investment of up to $100 billion per year globally in our infrastructure and communities. But careful planning can help our scarce resources go further. 

Looking back, Texas’s troubles offer several key lessons for how to make both critical infrastructure and vulnerable communities everywhere more resilient to climate extremes. 

Assessing future risks

First, it’s worth noting that grid failure alone did not lead to the intense suffering and loss of life Texas residents faced.

Natural-gas wells and gathering lines also froze, cutting gas production and supply for the state’s pipelines and power plants in half just as demand soared. Elsewhere, water treatment plants lost power, and frozen pipes caused water distribution networks to lose pressure. Frozen roadways prevented residents from traveling safely.

Building resilient infrastructure means paying close attention to extreme events that can slam large parts of the system all at once.

The connections between these infrastructure systems keep the lights on and taps flowing in good times but can compound failure when things go bad.

Extreme weather also tends to cause multiple parts of critical systems to fail at the same time. These kinds of simultaneous failures are far more probable than one might think. If 10 power plants each have a 10% chance of failure but these probabilities are all independent, the chance that they all fail simultaneously is infinitesimal (0.00000001%).

A 1% chance that 10 power plants all fail at once is far more worrisome. So building resilient infrastructure means paying close attention to extreme events that can slam large parts of the system all at once, whether that’s a winter storm, wildfire, hurricane, or flood.

Lastly, the worst human impacts of any infrastructure failure don’t come from the outage itself. They come from exposure to freezing temperatures, a lack of clean water to drink, dwindling food supplies, and the fear that help may not come soon enough. So the magnitude of suffering is determined not only by the magnitude of the infrastructure failure but also by each community’s ability to weather the storm.

Historically marginalized communities usually have the fewest resources to protect against the human costs of infrastructure failures. In Texas, people experiencing homelessness were the most likely to be exposed to freezing temperatures. Shelters, limited by social-distancing requirements, quickly reached capacity. Many low-income neighborhoods were among the first to face power outages. And people of color are disproportionately represented in both groups in Texas.

In light of what happened in Texas and the ongoing threat of climate change everywhere, how can communities shore up their local resources and critical systems to prevent the same thing from happening where they live?

Tomorrow’s resiliency starts now

We should start with the weakest links in our infrastructure. Energy systems can and should be made resilient to extreme weather. Wind turbines operate in Antarctica, gas plants in Alberta, and gas wells in Alaska. Weatherization can be costly, but the most affordable steps, such as winterizing wind turbines or using heat tracing and insulation to keep pressure sensors from freezing up at natural-gas or nuclear power plants, can be well worth it.

Deciding how much to invest to reduce the impact of rare events is a tricky calculus, but it’s one that should hinge not just on the likelihood of an event, but on the severity of its consequences.

At the same time, we can never protect every inch of our infrastructure against the full range of possible disasters. So we should also diversify the supply of critical resources such as electricity wherever possible. Natural-gas power plants, which make up two-thirds of Texas’s generation capacity, were the most significant contributor to supply shortages there. If the grid has a mix of generation sources in different locations, each of which is susceptible to different types of extremes, it will be more resilient to any single event. 

Going forward, any new infrastructure we invest in has to be prepared for not only today’s climate, but also the climate we’ll have decades into the future. For each upgrade we make, we must decide what range of climate extremes it should be able to withstand—and recognize that the past is no longer a safe guide to future extremes.

For things like pipelines, which are expensive to upgrade once they’re in the ground but relatively inexpensive to weatherize at the outset, new projects should plan for the worst-case scenario based on climate projections over their expected lifetime.

For components that are easier to replace or retrofit, or for operational changes like altering reservoir operations at hydropower dams, we can take a wait-and-see approach. In these cases, we may spend less on upgrades now but should still put processes in place today that will allow us to make them when it becomes clear they’re needed. Smart preparation and adaptation can reduce the cost of resilience.

Our planning also cannot end with physically hardening our systems. No matter what improvements we make to the electric grid, we have to also be prepared for the reality that it will, at some point, fail again. 

The past is no longer a safe guide to future extremes.

Being prepared requires a thorough accounting of all the possible reasons power grids and other vital systems might fail. For each reason, we should map out how compounding simultaneous failures could affect other infrastructure systems and communities. Grid failure will have different effects depending on whether it’s driven by extreme cold or extreme heat. There’s no one-size-fits-all solution.

For cold weather, preparedness means spending the money to weatherize homes so people can stay warm. It means insulating and replacing water pipes to withstand the cold. It means making plans to open warming centers and distribute bottled water. It means providing emergency transportation for people who rely on electricity for medical treatments like oxygen, and having a strategy to reach and help those who are homeless. And it has to start with the most vulnerable communities, who have the most to lose.

Resilience is more than just preparing for disasters. It’s an opportunity to invest in our communities for fair weather as well as foul. Climate adaptation comes with a hefty price tag, but it can make our cities more livable, our water cleaner, and our homes safer. The cost of inaction—in both dollars and lives—is far greater.

Sarah Fletcher is an assistant professor at Stanford University. She studies water resources, infrastructure planning, and climate adaptation. She tweets at @SFletcherH2O.

Jesse Jenkins is an assistant professor at Princeton University. He studies macro-scale energy systems engineering and policy and tweets at @JesseJenkins.

As the Texas power crisis shows, our infrastructure is vulnerable to extreme weather 2021/03/06 14:00

How to poison the data that Big Tech uses to surveil you

Every day, your life leaves a trail of digital breadcrumbs that tech giants use to track you. You send an email, order some food, stream a show. They get back valuable packets of data to build up their understanding of your preferences. That data is fed into machine-learning algorithms to target you with ads and recommendations. Google cashes your data in for over $120 billion a year of ad revenue.

Increasingly, we can no longer opt out of this arrangement. In 2019 Kashmir Hill, then a reporter for Gizmodo, famously tried to cut five major tech giants out of her life. She spent six weeks being miserable, struggling to perform basic digital functions. The tech giants, meanwhile, didn’t even feel an itch.

Now researchers at Northwestern University are suggesting new ways to redress this power imbalance by treating our collective data as a bargaining chip. Tech giants may have fancy algorithms at their disposal, but they are meaningless without enough of the right data to train on.

In a new paper being presented at the Association for Computing Machinery’s Fairness, Accountability, and Transparency conference next week, researchers including PhD students Nicholas Vincent and Hanlin Li propose three ways the public can exploit this to their advantage:

  • Data strikes, inspired by the idea of labor strikes, which involve withholding or deleting your data so a tech firm cannot use it—leaving a platform or installing privacy tools, for instance.
  • Data poisoning, which involves contributing meaningless or harmful data. AdNauseam, for example, is a browser extension that clicks on every single ad served to you, thus confusing Google’s ad-targeting algorithms.
  • Conscious data contribution, which involves giving meaningful data to the competitor of a platform you want to protest, such as by uploading your Facebook photos to Tumblr instead.

People already use many of these tactics to protect their own privacy. If you’ve ever used an ad blocker or another browser extension that modifies your search results to exclude certain websites, you’ve engaged in data striking and reclaimed some agency over the use of your data. But as Hill found, sporadic individual actions like these don’t do much to get tech giants to change their behaviors.

What if millions of people were to coordinate to poison a tech giant’s data well, though? That might just give them some leverage to assert their demands.

There may have already been a few examples of this. In January, millions of users deleted their WhatsApp accounts and moved to competitors like Signal and Telegram after Facebook announced that it would begin sharing WhatsApp data with the rest of the company. The exodus caused Facebook to delay its policy changes.

Just this week, Google also announced that it would stop tracking individuals across the web and targeting ads at them. While it’s unclear whether this is a real change or just a rebranding, says Vincent, it’s possible that the increased use of tools like AdNauseam contributed to that decision by degrading the effectiveness of the company’s algorithms. (Of course, it’s ultimately hard to tell. “The only person who really knows how effectively a data leverage movement impacted a system is the tech company,” he says.)

Vincent and Li think these campaigns can complement strategies such as policy advocacy and worker organizing in the movement to resist Big Tech.

“It’s exciting to see this kind of work,” says Ali Alkhatib, a research fellow at the University of San Francisco’s Center for Applied Data Ethics, who was not involved in the research. “It was really interesting to see them thinking about the collective or holistic view: we can mess with the well and make demands with that threat, because it is our data and it all goes into this well together.”

There is still work to be done to make these campaigns more widespread. Computer scientists could play an important role in making more tools like AdNauseam, for example, which would help lower the barrier to participating in such tactics. Policymakers could help too. Data strikes are most effective when bolstered by strong data privacy laws, such as the European Union’s General Data Protection Regulation (GDPR), which gives consumers the right to request the deletion of their data. Without such regulation, it’s harder to guarantee that a tech company will give you the option to scrub your digital records, even if you remove your account.

And some questions remain to be answered. How many people does a data strike need to damage a company’s algorithm? And what kind of data would be most effective in poisoning a particular system? In a simulation involving a movie recommendation algorithm, for example, the researchers found that if 30% of users went on strike, it could cut the system’s accuracy by 50%. But every machine-learning system is different, and companies constantly update them. The researchers hope that more people in the machine-learning community can run similar simulations of different companies’ systems and identify their vulnerabilities.

Alkhatib suggests that scholars should do more research on how to inspire collective data action as well. “Collective action is really hard,” he says. “Getting people to follow through on ongoing action is one challenge. And then there’s the challenge of how do you keep a group of people who are very transient—in this case it might be people who are using a search engine for five seconds—to see themselves as part of a community that actually has longevity?”

These tactics might also have downstream consequences that need careful examination, he adds. Could data poisoning end up just adding more work for content moderators and other people tasked with cleaning and labeling the companies’ training data?

But overall, Vincent, Li, and Alkhatib are optimistic that data leverage could turn into a persuasive tool to shape how tech giants treat our data and our privacy. “AI systems are dependent on data. It’s just a fact about how they work,” Vincent says. “Ultimately, that is a way the public can gain power.”

How to poison the data that Big Tech uses to surveil you 2021/03/05 17:18

Why reopening US schools is so complicated

Across the country, schools are wrestling with the difficult choice of whether to reopen, and how to do it with reduced risk. In Kalamazoo, Michigan—not far from one the main sites where Pfizer is frantically manufacturing vaccines—they plan to stay virtual through the end of the school year. In Iowa, a state without a mask mandate, kids can now go back to in-person learning full time. Meanwhile, in a school district in San Mateo County, California, that borders Silicon Valley, there’s no clear decision—and low-income and affluent parents are clashing over what to do

It’s been a difficult journey. Since March 2020, when most schools closed, districts have been asked to adjust over and over—to new science about how the virus behaves, new policy recommendations, and the different needs of families, kids, teachers, and staff. 

Now, as President Biden forges ahead with his promise to reopen most schools within his first 100 days, the debates sound as complicated as ever—and offer a glimpse into many of the difficulties of reopening society at large. 

The limits of “guidance”

Schools across the country have looked to the Centers for Disease Control and Prevention for guidance on how to operate in the pandemic. In its latest recommendations, the CDC says a lot of the things we’ve heard all year: that everyone in a school building should wear masks, stay at least six feet apart, and wash their hands frequently. But schools have found that even when guidelines seem relatively straightforward on paper, they are often much harder—or downright impossible—to put into practice. 

“There’s a difference between public health mitigation policies when we think them through and when we write them down, and then when we try to implement them,” says Theresa Chapple, an epidemiologist in Washington, DC. “We see that there are barriers at play.”

Chapple points to a recent study by the CDC that looked at elementary schools in Georgia. After just 24 days of in-person learning, the researchers found nine clusters of covid-19 cases that could be linked back to the school. In all, about 45 students and teachers tested positive. How did that happen? Classroom layouts and class sizes meant physical distancing wasn’t possible, so students were less than three feet apart, separated only by plastic dividers. And though students and teachers mostly wore masks, students had to eat lunch in their classrooms. 

Researchers also note that teachers and students may have infected each other “during small group instruction sessions in which educators worked in close proximity to students.”

Following the CDC’s best practices might be inherently difficult, but it’s also complicated by the fact that they are just guidelines: states and other jurisdictions make the rules, and those often conflict with what the CDC says to do. Since February 15, Iowa schools have been required to offer fully in-person learning options that some school officials say make distancing impossible. Because the state no longer has a mask mandate, students aren’t required to wear masks in school.

Jurisdictions following all these different policies have one thing in common: although case totals have dipped since their peak in January, the vast majority of the US still has substantial or high community spread. A big takeaway from the CDC’s latest guidance is that high community transmission is linked to increased risk in schools. 

“If we are opening schools,” Chapple says, “we are saying that there’s an acceptable amount of spread that we will take in order for children to be educated.”

Meeting different needs

Some schools are trying alternative tactics that they hope will reduce the risks associated with in-person learning. 

In Sharon, a Massachusetts town just south of Boston where about 60% of public school students are still learning remotely, pods of students and staff are called down to a central location in their school building twice a week for voluntary covid-19 testing. One by one, children as young as five turn up, sanitize their hands, lower their mask, swab their own nostrils, and place their swab in a single test tube designated for their whole cohort. To make room for everyone, sometimes even the principal’s office becomes a testing site: one person in, one person out. The tubes are then sent to a lab for something called “pooled testing.”

After just 24 days of in-person learning, the researchers found nine clusters of covid-19 cases that could be linked back to the school. 

Pooled testing allows a small group of samples to be tested for covid all at once. In Sharon, each tube holds anywhere from 5 to 25 samples. If the test for that small group comes back negative, the whole group is cleared. If it’s positive, each group member is tested until the positive individual is found. Meg Dussault, the district’s acting superintendent, says each pool test costs the school between $5 and $50, and over a third of Sharon Public Schools students and staff participate. 

“I’ve seen the benefits of this,” she says “And I believe it’s essential.”

Because schools are funded unequally and largely through taxes, access to resources is a common theme in discussions of school reopening. The state paid for Sharon’s pilot period, but not every district or school has the money or staffing to mount large-scale programs—and Dussault says the district will need to foot the bill for any testing once this program ends in April. It will also need to keep relying on the goodwill of the parent volunteers who wrangle students and swabs for testing each week. 

In the seven weeks since pooled testing began, Dussault says, only one batch has come back positive. It’s given her peace of mind.

And even with mitigation measures in place, there are stark demographic differences in opinion on reopening. A recent Pew study found that Black, Asian, and Hispanic adults are more likely to support holding off until teachers have access to vaccines. Those groups are also more likely than white adults to say that the risk of covid-19 transmission “should be given a lot of consideration” when weighing reopening.

Chapple worries that these parents’ concerns will be overlooked, or that funds for remote learning will dwindle because some districts decide to move to in-person learning.

She says: “School districts need to keep in mind that if they’re reopening but a small percentage of their minority students are coming back, what does that look like in terms of equity?” 

Balancing different needs can be particularly difficult in larger, more diverse districts, says Thomas Friedrich, a professor at the University of Wisconsin–Madison School of Veterinary Medicine, part of a team at the school’s AIDS Vaccine Research Laboratory that is sequencing virus samples from Wisconsin.

“The burden of disease and death has fallen very unequally, disproportionately affecting people of high socioeconomic vulnerability, people of color,” Friedrich says. People who have already seen a lot of loss and disease among loved ones, he says, may see more of a risk in quickly reopening schools and other places the virus could circulate.

After all, even the most rigorous efforts have holes and gaps—the human factor. This week, Biden announced his plan to prioritize vaccines for educators, something Dussault says is her number one priority even with the new information testing has brought.

“All of our collective energy is going toward trying to make sure that we have the vaccine for our staff,” Dussault says.

Strain on the system

There’s one more layer of complexity that concerns experts and school districts: the spread of variants. 

For example, the B.1.1.7 strain, originally discovered in the UK last year, is still relatively rare in the US, but experts estimate it could take over by the end of March. Scientists think it’s more transmissible and possibly more deadly. That could affect not just how schools reopen, but how long the reopening lasts. 

The US can look to Europe for how this played out: European countries tried in-person learning last fall but began closing schools as B.1.1.7 swept through the continent. By December, countries including the Netherlands and Germany had shut down their schools in the face of rising case numbers. The CDC says it may need to update school reopening guidelines in light of new information about variants. 

This task is made more difficult because tracking the spread of variants in the US is tough right now. Compared with other countries, it has very few labs doing this work, and while more funding will help, Friedrich says there will still be a gap.

“If B.1.1.7 becomes the dominant strain by the end of March, then even if $2 billion in additional funds for genomic surveillance is enacted tomorrow, we may not be able to ramp up capacity to … detect its displacement of other strains in real time across the United States,” he says. 

In the absence of clear-cut answers on what variants are spreading in the US, Chapple says it’s important for schools to monitor community spread as much as possible—and to plan carefully, to avoid spiraling into a new crisis. In fact, she recommends that when schools create their reopening plans, “they also create their closing plans.”

“What are they going to be looking for, to know if this is not working?” she says. It’s advice that could apply to all public places and institutions.

This story is part of the Pandemic Technology Project, supported by the Rockefeller Foundation.

Why reopening US schools is so complicated 2021/03/05 12:00

I asked an AI to tell me how beautiful I am

I first came across Qoves Studio through its popular YouTube channel, which offers polished videos like “Does the hairstyle make a pretty face?,”What makes Timothée Chalamet attractive?,” and “How jaw alignment influences social perceptions” to millions of viewers.

Qoves started as a studio that would airbrush images for modeling agencies; now it is a “facial aesthetics consultancy” that promises answers to the “age-old question of what makes a face attractive.” Its website, which features chalky sketches of Parisian-looking women wearing lipstick and colorful hats, offers a range of services related to its plastic surgery consulting business: advice on beauty products, for example, and tips on how to enhance images using your computer. But its most compelling feature is the “facial assessment tool”: an AI-driven system that promises to look at images of your face to tell you how beautiful you are—or aren’t—and then tell you what you can do about it.

Last week, I decided to try it. Following the site’s instructions, I washed off the little makeup I was wearing and found a neutral wall brightened by a small window. I asked my boyfriend to take some close-up photos of my face at eye level. I tried hard to not smile. It was the opposite of glamorous.

I uploaded the most bearable photo, and within milliseconds Qoves returned a report card of the 10 “predicted flaws” on my face. Topping the list was a 0.7 probability of nasolabial folds, followed by a 0.69 probability of under-eye contour depression, and a 0.66 probability of periocular discoloration. In other words, it suspected (correctly) that I have dark bags under my eyes and smile lines, both of which register as problematic with the AI.

My results from the Qoves facial assessment tool

The report helpfully returned recommendations that I might take to address my flaws. First, a suggested article about smile lines informed me that they “may need injectable or surgical intervention.” If I wished, I could upgrade to a fuller report of surgical recommendations, written by doctors, at tiers of $75, $150, and $250. It also suggested five serums I could try first, each featuring a different skin-care ingredient—retinol, neuropeptides, hyaluronic acid, EGF, and TNS. I’d only heard of retinol. Before bed that night I looked through the ingredients of my face moisturizer to see what it contained.

I was intrigued. The tool had broken my appearance down into a list of bite-size issues—a laser trained on what it thought was wrong with my appearance.

Qoves, however, is just one small startup with 20 employees in an ocean of facial analysis companies and services. There is a growing industry of facial analysis tools driven by AI, each claiming to parse an image for characteristics such as emotions, age, or attractiveness. Companies working on such technologies are a darling of venture capital, and such algorithms are used in everything from online cosmetic sales to dating apps. These beauty scoring tools, readily available for purchase online, use face analysis and computer vision to evaluate things like symmetry, eye size, and nose shape to sort through and rank millions of pieces of visual content and surface the most attractive people.

These algorithms train a sort of machine gaze on photographs and videos, spitting out numerical values akin to credit ratings, where the highest scores can unlock the best online opportunities for likes, views, and matches. If that prospect isn’t concerning enough, the technology also exacerbates other problems, say experts. Most beauty scoring algorithms are littered with inaccuracies, ageism, and racism—and the proprietary nature of many of these systems means it is impossible to get insight into how they really work, how much they’re being used, or how they affect users.

Qoves recommended certain actions to fix my “predicted flaws”

“Mirror, mirror on the wall …”

Tests like the ones available from Qoves are all over the internet. One is run by the world’s largest open facial recognition platform, Face++. Its beauty scoring system was developed by the Chinese imaging company Megvii and, like Qoves, uses AI to examine your face. But instead of detailing what it sees in clinical language, it boils down its findings into a percentage grade of likely attractiveness. In fact, it returns two results: one score that predicts how men might respond to a picture, and the other that represents a female perspective. Using the service’s free demo and the same unglamorous photo, I quickly got my results. “Males generally think this person is more beautiful than 69.62% of persons” and “Females generally think this person is more beautiful than 73.877%”.

It was anticlimactic, but better than I had expected. A year into the pandemic, I can see the impact of stress, weight, and closed hair salons on my appearance. I retested the tool with two other photos of myself from Before, both of which I liked. My scores improved, nudging me near the top 25th percentile.

Beauty is often subjective and personal: our loved ones appear attractive to us when they are healthy and happy, and even when they are sad. Other times it’s a collective judgment: ranking systems like beauty pageants or magazine lists of the most beautiful people show how much we treat attractiveness like a prize. This assessment can also be ugly and uncomfortable: when I was a teenager, the boys in my high school would shout numbers from one to 10 at girls who walked past in the hallway. But there’s something eerie about a machine rating the beauty of somebody’s face—it’s just as unpleasant as shouts at school, but the mathematics of it feel disturbingly un-human.

My beauty score results from Face++

Under the hood

Although the concept of ranking people’s attractiveness is not new, the way these particular systems work is a relatively fresh development: Face++ released its beauty scoring feature in 2017.

When asked for detail on how the algorithm works, a spokesperson for Megvii would only say that it was “developed about three years ago in response to local market interest in entertainment-related apps.” The company’s website indicates that Chinese and Southeast Asian faces were used to train the system, which attracted 300,000 developers soon after it launched, but there is little other information.

A spokesperson for Megvii says that Face++ is an open-source platform and it cannot control the ways in which developers might use it, but the website suggests “cosmetic sales” and “matchmaking” as two potential applications.

The company’s known customers include the Chinese government’s surveillance system, which blankets the country with CCTV cameras, as well as Alibaba and Lenovo. Megvii recently filed for an IPO and is currently valued at $4 billion. According to reporting in the New York Times, it is one of three facial recognition companies that assisted the Chinese government in identifying citizens who might belong to the Uighur ethnic minority.

Qoves, meanwhile, was more forthcoming about how its face analysis works. The company, which is based in Australia, was founded as a photo retouching firm in 2019 but switched to a combination of AI-driven analysis and plastic surgery in 2020. Its system uses a common deep-learning technique known as a convolutional neural network, or CNN. The CNNs used to rate attractiveness typically train on a data set of hundreds of thousands of pictures that have already been manually scored for attractiveness by people. By looking at the pictures and the existing ratings, the system infers what factors people consider attractive so that it can make predictions when shown new images.

Other big companies have invested in beauty AIs in recent years. They include the American cosmetics retailer Ulta Beauty, valued at $18 billion, which developed a skin analysis tool. Nvidia and Microsoft backed a “robot beauty pageant” in 2016, which challenged entrants to develop the best AI to determine attractiveness.

According to Evan Nisselson, a partner at LDV Capital, vision technology is still in its early stages, which creates “significant investment opportunities and upside.” LDV estimates that there will be 45 billion cameras in the world by next year, not including those used for manufacturing or logistics and claims that visual data will be the key data input for AI systems in the near future. Nisselson says facial analysis is “a huge market” that will, over the course of time, involve “re-invention of the tech stack to get to the same or closer to or even better than a human’s eye.”

Qoves founder Shafee Hassan claims that beauty scoring might be even more widespread. He says that social media apps and platforms often use systems that scan people’s faces, score them for attractiveness, and give more attention to those who rank higher. “What we’re doing is doing something similar to Snapchat, Instagram, and TikTok,” he says. “but we’re making it more transparent.”

He adds: “They’re using the same neural network and they’re using the same techniques, but they’re not telling you that [they’ve] identified that your face has these nasolabial folds, it has a thin vermilion, it has all of these things, therefore [they’re] going to penalize you as being a less attractive individual.”

I reached out to a number of companies—including dating services and social media platforms—and asked whether beauty scoring is part of their recommendation algorithms. Instagram and Facebook have denied using such algorithms. TikTok and Snapchat declined to comment on the record.

conceptual illustration showing many crops of different faces

“Big black boxes”

Recent advances in deep learning have dramatically changed the accuracy of beauty AIs. Before deep learning, facial analysis relied on feature engineering, where a scientific understanding of facial features would guide the AI. The formula for an attractive face, for example, might be set to reward wide eyes and a sharp jaw. “Imagine looking at a human face and seeing a Leonardo da Vinci–style depiction of all the proportions and the spacing between the eyes and that type of thing,” says Serge Belongie, a computer vision professor at Cornell University. With the advent of deep learning, “it became all about big data and big black boxes of neural net computation that just crunched on huge amounts of labeled data,” he says. “And at the end of the day, it works better than all the other stuff that we toiled on for decades.”

But there’s a catch. “We’re still not totally sure how it works,” says Belongie. “Industry’s happy, but academia is a little puzzled.” Because beauty is highly subjective, the best a deep-learning beauty AI can do is to accurately regurgitate the preferences of the training data used to teach it. Even though some AI systems now rate attractiveness as accurately as the humans in a training set, that means the systems also display an equal amount of bias. And importantly, because the system is inscrutable, placing guardrails on the algorithm that might minimize the bias is a difficult and computationally costly task.

Belongie says there are applications of this sort of technology that are more anodyne and less problematic than scoring a face for attractiveness—a tool that can recommend the most beautiful photograph of a sunset on your phone, for example. But beauty scoring is different. “That, to me, is a very scary endeavor,” he says.

Even if training data and commercial uses are as unbiased and safe as possible, computer vision has technical limitations when it comes to human skin tones. The imaging chips found in cameras are preset to process a particular range of them. Historically “some skin tones were simply left off the table,” according to Belongie, “which means that the photos themselves may not have even been developed with certain skin tones in mind. Even the noblest of ambitions in terms of capturing all forms of human beauty may not have a chance because the brightness values aren’t even represented accurately.”

And these technical biases manifest as racism in commercial applications. In 2018, Lauren Rhue, an economist who is an assistant professor of information systems at the University of Maryland, College Park, was shopping for facial recognition tools that might aid her work studying digital platforms when she stumbled on this set of unusual products.

“I realized that there were scoring algorithms for beauty,” she says. “And I thought, that seems impossible. I mean, beauty is completely in the eye of the beholder. How can you train an algorithm to determine whether or not someone is beautiful?” Studying these algorithms soon became a new focus for her research.

Looking at how Face++ rated beauty, she found that the system consistently ranked darker-skinned women as less attractive than white women, and that faces with European-like features such as lighter hair and smaller noses scored higher than those with other features, regardless of how dark their skin was. The Eurocentric bias in the AI reflects the bias of the humans who scored the photos used to train the system, codifying and amplifying it—regardless of who is looking at the images. Chinese beauty standards, for example, prioritize lighter skin, wide eyes, and small noses.

A comparison of two photos of Beyonce Knowles from Lauren Rhue’s research using Face++. Its AI predicted the image on the left would rate at 74.776% for men and 77.914% for women. The image on the right, meanwhile, scored 87.468% for men and 91.14% for women in its model.

Beauty scores, she says, are part of a disturbing dynamic between an already unhealthy beauty culture and the recommendation algorithms we come across every day online. When scores are used to decide whose posts get surfaced on social media platforms, for example, it reinforces the definition of what is deemed attractive and takes attention away from those who do not fit the machine’s strict ideal. “We’re narrowing the types of pictures that are available to everybody,” says Rhue.

It’s a vicious cycle: with more eyes on the content featuring attractive people, those images are able to gather higher engagement, so they are shown to still more people. Eventually, even when a high beauty score is not a direct reason a post is shown to you, it is an indirect factor.

In a study published in 2019, she looked at how two algorithms, one for beauty scores and one for age predictions, affected people’s opinions. Participants were shown images of people and asked to evaluate the beauty and age of the subjects. Some of the participants were shown the score generated by an AI before giving their answer, while others were not shown the AI score at all. She found that participants without knowledge of the AI’s rating did not exhibit additional bias; however, knowing how the AI ranked people’s attractiveness made people give scores closer to the algorithmically generated result. Rhue calls this the “anchoring effect.”

“Recommendation algorithms are actually changing what our preferences are,” she says. “And the challenge from a technology perspective, of course, is to not narrow them too much. When it comes to beauty, we are seeing much more of a narrowing than I would have expected.”

“I didn’t see any reason for not evaluating your flaws, because there are ways you can fix it.”

Shafee Hassan, Qoves Studio

At Qoves, Hassan says he has tried to tackle the issue of race head on. When conducting a detailed facial analysis report—the kind that clients pay for—his studio attempts to use data to categorize the face according to ethnicity so that everyone won’t simply be evaluated against a European ideal. “You can escape this Eurocentric bias just by becoming the best-looking version of yourself, the best-looking version of your ethnicity, the best-looking version of your race,” he says.

But Rhue says she worries about this kind of ethnic categorization being embedded deeper into our technological infrastructure. “The problem is, people are doing it, no matter how we look at it, and there’s no type of regulation or oversight,” she says. “If there is any type of strife, people will try to figure out who belongs in which category.”

“Let’s just say I’ve never seen a culturally sensitive beauty AI,” she says.

Recommendation systems don’t have to be designed to evaluate for attractiveness to end up doing it anyway. Last week, German broadcaster BR reported that one AI used to evaluate potential employees displayed biases based on appearance. And in March 2020, the parent company of TikTok, ByteDance, came under criticism for a memo that instructed content moderators to suppress videos that displayed “ugly facial looks,” people who were “chubby,” those with “a disformatted face” or “lack of front teeth,” “senior people with too many wrinkles,” and more. Twitter recently released an auto-cropping tool for photographs that appeared to prioritize white people. When tested on images of Barack Obama and Mitch McConnell, the auto-cropping AI consistently cropped out the former president.

“Who’s the fairest of them all?”

When I first spoke to Qoves founder Hassan by video call in January, he told me, “I’ve always believed that attractive people are a race of their own.”

When he started out in 2019, he says, his friends and family were very critical of his business venture. But Hassan believes he is helping people become the best possible version of themselves. He takes his inspiration from the 1997 movie Gattaca, which takes place in a “not-too-distant future” where genetic engineering is the default means of conception. Genetic discrimination segments society, and Ethan Hawke’s character, who was conceived naturally, has to steal the identity of a genetically perfected person in order to get around the system.

It’s usually considered a deeply dystopian film, but Hassan says it left an unexpected mark.

“It was very interesting to me, because the whole idea was that a person can determine their fate. The way they want to look is part of their fate,” he says. “With how far modern medicine has come, I didn’t see any reason for not evaluating your flaws, because there are ways you can fix it.”

His clients seem to agree. He claims that many of them are actors and actresses, and that the company receives anywhere from 50 to 100 orders for detailed medical reports each day—so many it is having trouble keeping up with demand. For Hassan, fighting the coming “classism” between those who are deemed beautiful and those society thinks are ugly is core to his mission. “What we’re trying to do is help the average person,” he told me.

There are other ways to “help the average person,” however. Every expert I spoke to said that disclosure and transparency from companies that use beauty scoring are paramount. Belongie believes that pressuring companies to reveal the workings of their recommendation algorithms will help keep users safe. “The company should own it and say yes, we are using facial beauty prediction and here’s the model. And here’s a representative gallery of faces that we think, based on your browsing behavior, you find attractive. And I think that the user should be aware of that and be able to interact with it.” He says that features like Facebook’s ad transparency tool are a good start, but “if the companies are not doing that, and they’re doing something like Face++ where they just casually assume we all agree on beauty … there may be power brokers who simply made that decision.”

Of course, the industry would have to first confess that it uses these scoring models in the first place, and the public would have to be aware of the issue. And though the past year has brought attention and criticism to facial recognition technology, several researchers I spoke with said that they were surprised by the lack of awareness about this use of it. Rhue says the most surprising thing about beauty scoring has been how few people are examining it as a topic. She is not persuaded that the technology should be developed at all.

As Hassan reviewed my own flaws with me, he assured me that a good moisturizer and some weight loss should do the trick. And though the aesthetics of my face won’t determine my career trajectory, he encouraged me to take my results seriously.

“Beauty,” he reminded me, “is a currency.”

I asked an AI to tell me how beautiful I am 2021/03/05 11:00

1 / 2