Monday, September 25, 2017

"How Vector Space Mathematics Reveals the Hidden Sexism in Language"


"the relationships between words could be captured by simple vector algebra. For example, “man is to king as woman is to queen” or, using the common notation, “man : king :: woman : queen.” Other relationships quickly emerged too such as “sister : woman :: brother : man,” and so on. These relationships are known as word embeddings.

his data set is called Word2vec and is hugely powerful. Numerous researchers have begun to use it to better understand everything from machine translation to intelligent Web searching.

But today Tolga Bolukbasi at Boston University and a few pals from Microsoft Research say there is a problem with this database: it is blatantly sexist.

And they offer plenty of evidence to back up the claim. This comes from querying the vector space to find word embeddings. For example, it is possible to pose the question: “Paris : France :: Tokyo : x” and it will give you the answer x = Japan.

But ask the database “father : doctor :: mother : x” and it will say x = nurse. And the query “man : computer programmer :: woman : x” gives x = homemaker.

In other words, the word embeddings can be dreadfully sexist. This happens because any bias in the articles that make up the Word2vec corpus is inevitably captured in the geometry of the vector space. Bolukbasi and co despair at this. “One might have hoped that the Google News embedding would exhibit little gender bias because many of its authors are professional journalists,” they say."


FB: The cool thing is that a team at BU found a way to fix it, an anti-bias transformation of the vector space.


"That has important applications. Any bias contained in word embeddings like those from Word2vec is automatically passed on in any application that exploits it. One example is the work using embeddings to improve Web search results. If the phrase “computer programmer” is more closely associated with men than women, then a search for the term “computer programmer CVs” might rank men more highly than women. “Word embeddings not only reflect stereotypes but can also amplify them,” say Bolukbasi and co."

No comments:

Post a Comment