Tuesday, January 31, 2017

Time To Upgrade Your Python: TLS v1.2 Will Soon Be Mandatory

If you're using an older Python without the most secure TLS implementation, this is the year to get serious about upgrading. Otherwise next June you may not be able to "pip install" packages from PyPI.

PyPI's maintainer Donald Stufft recently announced that python.org and related sites will begin disabling the old TLS versions 1.0 and 1.1. This change was imposed on us by our content delivery network, Fastly, in response to a change imposed on them by the Payment Card Industry Security Standards Council. In order to continue serving websites that take credit card payments, Fastly is required to disable the old, insecure versions of TLS. Since the PSF's servers, including PyPI, use Fastly, the old versions of TLS will be disabled as well.

Fastly wrote in October 2015,
There have been serious and systemic security issues with earlier versions of TLS and its predecessor, SSL, including POODLE, Heartbleed, and LOGJAM. These threatened to break trust in fundamental methods of secure communication, exposing both you and your customers to breaches in security. The actions of the PCI DSS Council to maintain a high minimum bar are a step towards ensuring the security of all online business transactions.
There are two deadlines to upgrade your Python to a version with the latest TLS. The first comes soon, on April 30, 2017, when python.org sites without Extended Validation Certificates will stop supporting TLS 1.0 and 1.1. These sites include:

  • testpypi.python.org
  • test.pypi.org
  • files.pythonhosted.org

Warehouse, the future successor to PyPI, will also be affected by April's deadline, since Warehouse serves files from files.pythonhosted.org.

The more crucial deadline comes June 30, 2018. On that date all remaining python.org sites, including PyPI, will no longer support TSL 1.0 and 1.1. Older Python versions that do not implement TLSv1.2 will be prohibited from accessing PyPI.

See below for instructions to check your interpreter's TLS version. 1

Stufft writes, "I am going to see about possibly organizing some scheduled 'brown outs' of TLSv1.0 and TLSv1.1 prior to the cut off dates to try and help folks find places that will need updates. Any scheduled brownouts will be posted to status.python.org prior to happening."

Mac users should pay special attention. So far, the system Python shipped with MacOS does not yet support TLSv1.2 in any MacOS version; beginning next June these system Pythons will no longer be able to "pip install" packages. 2 Fortunately, it's easy to install a modern Python alongside the MacOS system Python. Either download Python 3.6 from python.org, or for Python 2.7 with the latest TLS, use Homebrew. Both methods of installing Python will continue working after June 2018.

1. To check your Python interpreter's TLS version, install the "requests" package and run a command. For example, for Python 2:

python2 -m pip install --upgrade requests
python2 -c "import requests; print(requests.get('https://www.howsmyssl.com/a/check', verify=False).json()['tls_version'])"

Or Python 3:

python3 -m pip install --upgrade requests
python3 -c "import requests; print(requests.get('https://www.howsmyssl.com/a/check', verify=False).json()['tls_version'])"

If you see "TLS 1.2", your interpreter's TLS is up to date. If you see "TLS 1.0" or an error like "tlsv1 alert protocol version", then you must upgrade.

2. The reason Python's TLS implementation is falling behind on macOS is that Python continues to use OpenSSL, which Apple has stopped updating on macOS. In the coming year, the Python Packaging Authority team will investigate porting pip to Apple's own "SecureTransport" library as an alternative to OpenSSL, which would allow old Python interpreters to use modern TLS with pip only. "This is a non-trivial amount of effort," writes Stufft, "I'm not sure it's going to get done."

In the long run, the Python interpreter itself would easily keep up with TLS versions, if it didn't use OpenSSL on platforms like macOS and Windows where OpenSSL is not shipped with the OS. Cory Benfield and Christian Heimes propose to redesign the standard library's TLS interfaces to make it easier to swap OpenSSL with platform-native TLS implementations.

Thursday, January 26, 2017

“I use Python to help build the kind of world I want to live in” - Shannon Turner, Community Service Award Winner Q4

Shannon Turner has been fascinated with programming since she was a child, thanks in part to her grandmother, who loved video games. Watching her grandmother play, Turner would draw pictures on paper and ask, 'Wouldn't this be cool if this were part of the game?". “Yes,” her grandmother would agree, “so you’ll need to get very good with computers if you want to make games of your own someday.” 

As an adult, Turner’s interest in programming grew. She taught herself to program and attended tech events but it didn't feel right. She grew frustrated at being one of the only women in the room, being talked down to, and not taken seriously. Then, after speaking with other women at the events, she would realize that it wasn’t just her, “...[that] we all had this shared experience of being talked down to and not taken seriously. That's when I decided to start teaching other women what I'd taught myself.” This is what motivated Turner to start Hear Me Code (HMC), a group that offers free, beginner-friendly coding classes for women in the Washington DC area.

The Python Software Foundation awards the 4th Quarter 2016 Community Service Award to Shannon Turner for her work on Hear Me Code:
RESOLVED, that the Python Software Foundation award the 4th Quarter 2016 Community Service Award to Shannon Turner. Shannon is the founder of Hear Me Code, an organization offering free, beginner-friendly Python coding classes for over 2000 women in Washington, DC. She teaches all the classes with the help of women who have previously taken the classes. She empowers hundreds of women to code with Python by lowering barriers to entry. More than just a class where women learn to build websites, Hear Me Code focuses on leadership development, peer mentoring, and turning students into teachers.

Hear Me Code

What started in 2013 as an informal class with a few friends seated at the kitchen table has grown to a group of over 2000 in the Washington DC area. Turner developed the curriculum, slides, and exercises for five lessons, making incremental changes and improvements each time she taught. In the beginning she taught all the classes herself, but quickly realized she would do even more good by helping her students become instructors themselves. To date, over 100 women who started as students have moved on to be teaching assistants and teachers in the group. “In our first two years,” says Turner, “over two dozen women credited Hear Me Code with providing them the skills and experience they needed to land a job in tech.”

At HMC, programming courses are taught with Python. Why Python? As Turner was teaching herself to code with a variety of languages, Python felt different. “I still struggled to learn it,” Turner recalls, “but it was much more intuitive than other languages I'd used.”

Helping Female Developers

HMC student Sonia Hinson started taking classes in January 2014. Since then she has completed most of the courses and moved on to being a teaching assistant and teacher. She says Turner encourages her students to become teachers by “promoting the idea that you learn best from teaching someone else programming and working with your neighbors to solve bugs and problems in code.”

Student Haynes Bunn would agree. She values Turner’s ability to identify people’s strengths and encourage students to get involved in teaching positions. By doing this, says Haynes, Turner is not just teaching women to code, “she’s also helping them to teach, to help others, and to be leaders.”

Turner would rather spend her time helping women through her networks than seek praise for all of her work. That’s not what motivates her, says Stephanie Nguyen, “her impact in the Python community and the women who she has empowered to code are all examples that speak loudly for her.”

Other Projects

“Now, in addition to running Hear Me Code,” says Turner, “I use Python to help build the kind of world I want to live in.” Some of her other projects include a visualization of 500 schools that aren't taking campus sexual assault seriously and a searchable database of 6000 museums across the US.

Shannon Turner, CSA Winner Q4
Turner lives in Washington DC with her pet bird, who she keeps tabs on with her Raspberry Pi.

Thursday, January 19, 2017

Sheila Miguez and Will Kahn-Greene and their love for the Python Community: Community Service Award Quarter 3 2016 Winners

There are two elements which make Open Source function:
  1. Technology
  2. An active community.
The primary need for a successful community is a good contributor base. The contributors are our real heroes, who work persistently, on many (if not most) occasions without any financial benefits, just for the love of the community. The Python Community is blessed with many such heroes. The PSF's quarterly Community Service Award honors these heroes for their notable contributions and dedication to the Python ecosystem.

The PSF is delighted to give the 2016 Third Quarter Community Service Award to Sheila Miguez and Will Kahn-Greene:
Sheila Miguez and William Kahn-Greene for their monumental work in creating and supporting PyVideo over the years.

Community Service Award for 3rd Quarter

Will Kahn-Greene
Taken by Erik Rose, June 2016
The PSF funds a variety of conferences and workshops throughout the year worldwide to educate people about Python. But, not everyone can attend all of these events. Two people, Sheila Miguez and Will Kahn-Greene wanted to resolve this problem for the Pythonistas. Will came up with a brilliant idea of PyVideo and Sheila later joined the mission. PyVideo works as the warehouse of videos from Python conferences, local user groups, screencasts, and tutorials.

The Dawn of PyVideo

Back in 2010, Will started a Python video site using the Miro Community video-sharing platform. PSF encouraged his work with an $1800 grant the following year. As Will recalls, "I was thinking there were a bunch of Python conferences putting out video, but they were hosting the videos in different places. Search engines weren't really finding it. It was hard to find things even if you knew where to look." He started with Miro Community, and later wrote a whole new codebase for generating the data and another codebase for the front end of the website.
With these tools he started PyVideo.org. "This new infrastructure let me build a site closer to what I was envisioning."

When Sheila joined the project she contributed both to its technology and by helping the community find Python videos easier. Originally, she intended to only work on the codebase, but found herself dedicating a lot of time to adding content to the site.

What is PyVideo?
PyVideo is a repository that indexes and links to thousands of Python videos. It also provides a website pyvideo.org where people can browse the collection, which is more than 5000 Python videos and growing. The goals for PyVideo are:

  1.  Help people get to Python presentations easier and faster
  2.  Focus on education
  3.  Data collection and categorization.
  4.  Aim to give people an easy, enjoyable experience contributing to open source on PyVideo's GitHub repo

The Community Response

The Python community has welcomed Will and Sheila's noble endeavor enthusiastically. Pythonistas around the world never have to miss another recorded talk or tutorial. Sheila and Will worked relentlessly to give shape to their mammoth task. When I asked Will about the community’s response, he said, "Many learned Python by watching videos they found on pyvideo.org. Many had ideas for different things we could do with the site and other related projects. I talked with some folks who later contributed fixes and corrections to the data."

Will and Sheila worked on pyvideo.org only in their spare time, but it has became a major catalyst in the growth of the Python community worldwide. According to Will, pyvideo.org has additional, under publicized benefits:

  • PyVideo is a primary source to survey diversity trends among Python conference speakers around the globe.
  • Since its videos are solely Python, it is easily searchable and provides more helpful results than other search engines.
  • It offers a preview of conferences: By watching past talks people can choose if they want to go.

PyVideo : The End?

With a blog post Will and Sheila announced the end of pyvideo.org. "I'm pretty tired of working on pyvideo and I haven't had the time or energy to do much on it in a while," Will wrote.

Though they were shutting down the site, they never wanted to lose or waste the valuable data. Will says, "In February 2016 or so, Sheila and I talked about the state of things and I just felt bad about everything. So we decided to focus on extracting the data from PyVideo and make sure that even if the site didn't live on, the data did. We wrote a bunch of tools and
infrastructure for a community of people to add to, improve and otherwise work on the data. We figured someone could take the data and build a static site around it." Will did a blog post about the status of the data of pyvideo.org, and invited new maintainers to replace the site.

The end of pyvideo.org broke the hearts of many Pythonistas, including Paul Logston. Paul’s mornings used to begin by watching a talk on the site, and he couldn't renounce his morning entertainment.  He resolved to replace pyvideo.org. To begin, he wrote his project called "PyTube" for storing videos. Though initially his interest was personal, its educational outreach aspect drove him to finish and publicize the project. Sheila remembers noticing Paul for the first time when she noticed his fork of the pyvideo data repository. She was excited to see that he'd already built a static site generator based on PyVideo data. She read Paul’s development philosophy and felt he was the right person to carry on the mission.

In May 2016, at PyCon US,  there was a lightning talk on PyVideo and its situation. Paul met some fellow PyVideo followers who, just like him, did not want to lose the site. They decided to work on it during the Sprints. Though the structure of the website was ready, there were a lot of things that needed to be done like data gathering, curating data, and the design of the website. So, the contributors divided the works between them.

Both Sheila and Will were committed to PyVideo's continued benefit for the community, while passing PyVideo to new hands. They were satisfied by Paul's work and transferred the domain to his control. Paul's PyTube code became the replacement of pyvideo.org on August 13, 2016.

Emergence of the Successor : The Present Status of PyVideo

Now the project has 30 contributors, with Paul serving as project lead. These contributors have kept the mission alive. Though PyVideo's aim is still the same, there is a difference in its technology. The old Django app is replaced with a static site generated with Pelican, and it now has a separate repository for data in JSON files. The team's current work emphasizes making the project hassle-free to maintain.

Listen to Paul talking about PyVideo and its future on Talk Python to Me.

The Wings to Fly

Every community needs someone with a vision for its future. Will and Sheila had showed us a path to grow and help the community. It is now our responsibility to take the new PyVideo further. Paul describes its purpose beautifully: "PyVideo's deeper 'why' is the desire to make educating oneself as easy, affordable, and available as possible." Contributors: please come and join the project, give a hand to Paul and the team to help move this great endeavor forward.

Wednesday, January 04, 2017

"Weapons of Math Destruction" by Cathy O'Neil

In a 1947 lecture on computing machinery, Alan Turing made a prediction: "The new machines will in no way replace thought, but rather they will increase the need for it."

Someday, he said, machines would think for themselves, but the computers of the near future would require human supervision to prevent malfunctions:
"The intention in constructing these machines in the first instance is to treat them as slaves, giving them only jobs which have been thought out in detail, jobs such that the user of the machine fully understands in principle what is going on all the time." 1
It is unclear now whether machines remain slaves, or if they are beginning to be masters. Machine-learning algorithms pervasively control the lives of Americans. We do not fully understand what they do, and when they malfunction they harm us, by reinforcing the unjust systems we already have. Usually unintentionally, they can make the lives of poor people and people of color worse.

In "Weapons of Math Destruction", Cathy O'Neil identifies such an algorithm as a "WMD" if it satisfies three criteria: it makes decisions of consequence for a large number of people, it is opaque and unaccountable, and it is destructive. I interviewed O'Neil to learn what data scientists should do to disarm these weapons.

Automated Injustice

Recidivism risk models are a striking example of algorithms that reinforce injustice. These algorithms purport to predict how likely a convict is to commit another crime in the next few years. The model described in O'Neil's book, called LSI-R, assesses offenders with 54 questions, then produces a risk score based on correlations between each offender's characteristics and the characteristics of recidivists and non-recidivists in a sample population of offenders.

Some of LSI-R's factors measure the offender's past behavior: Has she ever been expelled from school, or violated parole? But most factors probably aren't under the individual's control: Does she live in a high-crime neighborhood? Is she poor? And many factors are not under her control at all: Has a family member been convicted of any crimes? Did her parents raise her with a "rewarding" parenting style?

Studies of LSI-R show it gives worse scores to poor black people. Some of its questions directly measure poverty, and others (such as frequently changing residence) are proxies for poverty. LSI-R does not know the offender's race. It would be illegal to ask, but, O'Neil writes, "with the wealth of detail each prisoner provides, that single illegal question is almost superfluous." For example, it asks the offender's age when he was first involved with the police. O'Neil cites a 2013 New York Civil Liberties Union study that young black and Hispanic men were ten times as likely to be stopped by the New York City police, even though only a tiny fraction were doing anything criminal.

So far, the LSI-R does not automatically become destructive. If it is accurate, and used for benign choices like spending more time treating and counselling offenders with high risk scores, it could do some good. But in many states, judges use the LSI-R and models like it to decide how long the offender's sentence should be. This is not LSI-R's intended use, and it is certainly not accurate enough for it: a study this year found that LSI-R misclassified 41% of offenders. 2

Success, According to Whom?

O'Neil told me that whether an algorithm becomes a WMD depends on who defines success, and according to whom. "Over and over again, people act as if there's only one set of stakeholders."

When a recidivism risk model is used to sentence someone to a longer prison term, the sole stakeholder respected is law enforcement. "Law enforcement cares more about true positives, correctly identifying someone who will reoffend and putting them in jail for longer to keep them from committing another crime." But our society has a powerful interest in preventing false positives. Indeed, we were founded on a constitution that considered a false positive—that is, being punished for a crime you did not commit—to be extremely costly. Principles including the presumption of innocence, the requirement that guilt is proven beyond reasonable doubt, and so on, express our desire to avoid unjust punishment, even at the cost of some criminals being punished too little or going free.

However, this interest is ignored when an offender is punished for a bad LSI-R score. His total sentence accounts not only for the crime he committed, but also for future crimes he is thought likely to commit. Furthermore, he is punished for who he is: Being related to a criminal or being raised badly are circumstances of birth, but for many people facing sentencing, such circumstances are used to add years to their time behind bars.

Statistically Unsound

Cathy O'Neil says weapons of math destruction are usually caused by two failures. The first is when only one stakeholder's interests define success. LSI-R is an example of this. The other is a lack of actual science in data science. For these algorithms, she told me, "We actually don't have reasonable ways of checking to see whether something is working or not."

A New York City public school program begun in 2007 assessed teachers with a "value added model", which estimated how much a teacher affected each student's progress on standardized tests. To begin, the model forecast students' progress, given their neighborhood, family income, previous achievement, and so on. At the end of the year their actual progress was compared to the forecast, and the difference was attributed to the teacher's effectiveness. O'Neil tells the story of Tim Clifford, a public school teacher who scored only 6 out of 100 the first year he was assessed, then 96 out of 100 the next year. O'Neil writes, "Attempting to score a teacher's effectiveness by analyzing the test results of only twenty-five or thirty students is statistically unsound, even laughable." One analysis of the assessment showed that a quarter of teachers' scores swung by 40 points in a year. Another showed that, with such small samples, the margin of error made half of all teachers statistically indistinguishable.

Nevertheless, the score might determine if the teacher was given a bonus, or fired. Although its decision was probabilistic, appealing it required conclusive evidence. O'Neil points out that time and again, "the human victims of WMDs are held to a higher standard of evidence than the algorithms themselves." The model is math so it is presumed correct, and anyone who objects to its scores is suspect.

New York Governor Andrew Cuomo put a moratorium on these teacher evaluations in 2015. We are starting to see that some questions require too subtle an intelligence for our current algorithms to answer accurately. As Alan Turing said, "If a machine is expected to be infallible, it cannot also be intelligent."

Responsible Data Science

I asked Cathy O'Neil about the responsibilities of data scientists, both in their daily work and as reformers of their profession. Regarding daily work, O'Neil drew a sharp line: "I don't want data scientists to be de facto policy makers." Rather, their job is to explain to policy makers the moral tradeoffs of their choices. The same as any programmer gathers requirements before coding a solution, data scientists should gather requirements regarding the relative cost of different kinds of errors. Machine learning algorithms are always imperfect, but they can be tweaked for either more false positives or more false negatives. When the stakes are high, the choice between the two is a moral one. Data scientists must pose these questions frankly to policy makers, says O'Neil, and "translate moral decisions into code."

Tradeoffs in the private sector often pit corporate interests against human ones. This is especially dangerous to the poor because, as O'Neil writes, "The privileged are processed more by people, the masses by machines." She told me that when the boss asks for an algorithm that optimizes for profit, it is the data scientist's duty to mention that the algorithm should also consider fairness.

"Weapons of Math Destruction" tells us how to recognize a WMD once it is built. But how can we predict whether an algorithm will become a WMD? O'Neil told me, "The biggest warning sign is if you're choosing winners and losers, and if it's a big deal for losers to lose. If it's an important decision and it's a secret formula, then that's a set-up for a weapon of math destruction. The only other ingredient you need in that setup is actually making it destructive."


Cathy O'Neil says the top priority, for data scientists who want to disarm WMDs, is to develop tools for analyzing them. For example, any EU citizen harmed by an algorithmic decision may soon have the legal right to an explanation, but so far we lack the tools to provide one. We also need tools to measure disparate impact and unfairness. O'Neil says, "We need tools to decide whether an algorithm is being racist."

New data scientists should enter the field with better training in ethics. Curricula usually ignore questions of justice, as if the job of the data scientist were purely technical. Data-science contests like Kaggle also encourage this view, says O'Neil. "Kaggle has defined the success and the penalty function. The hard part of data science is everything that happens before Kaggle." O'Neil wants more case studies from the field, anonymized so students can learn from them how data science is really practiced. It would be an opportunity to ask: When an algorithm makes a mistake, who gets hurt?

If data scientists take responsibility for the effects of their work, says O'Neil, they will become activists. "I'm hoping the book, at the very least, gets people to acknowledge the power that they're wielding," she says, "and how it could be used for good or bad. The very first thing we have to realize is that well-intentioned people can make horrible mistakes."

1. Quoted in "Alan Turing: The Enigma", by Andrew Hodges. Princeton University Press.

2. See also ProPublica's analysis of bias in a similar recidivism model, COMPAS.