Julien Maudet & Franck Ngamkan
First look at this video of Oriel Ceballos selling art in the subway:
and this photo:
We had the idea of conducting such a project, that may sound a bit odd in the beginning, when we met Oriel Ceballos at an art show in Harlem earlier this year. After a successful career as a professor, he decided to take an early exit and started a life as a full time artist, collector and curator. More info on him can be found on his Instagram page: https://www.instagram.com/or1el/?hl=fr
In order to broaden his audience, engage with people and sell his artworks, Oriel regularly - several days a week - goes to a subway station in either Manhattan or Brooklyn, displays his artworks and paints live.
What about his station selection process? He just tries stations with traffic and where there is enough space to display the pieces. However, it rang a bell in our data science-sensitized ears.
The next step for us is to gather data about subway stations that are relevant to our use case and try to come out with a way for artists to optimally select the subway station that best suits their requirements!!
In order to go from our envy and inspiration to a data visualization task, we needed to gather datasets. But before gathering datasets, we needed to know what kind of data we were looking for. In particular, what features of subway stations were relevant to our analysis.
Here are our hypothesis on the features that matter, and that are not too complicated to access:
Is there a lot of traffic in the station?
How easy and convenient is the access to the station?
Are the people commuting here interested in Art?
Are they wealthy?
We are not stating that the best station is the station with most traffic, in a very arty place, and with very rich people. The point here is to be able to discuss those variables in order to find the best match between an artist and a subway station.
We got our data from different sources: MTA turnstile data, NYC Open Data platform, and by crunching some information manually.
In this task, we started from the great work by Henri Dwyer, that can be found here: https://henri.io/posts/new-york-subway-traffic-data-part-1.html. The original data is here: http://web.mta.info/developers/turnstile.html
In its final format, for each subway station, it includes the mean daily traffic, as well as the daily traffic for 6 consecutive days in April 2017.
In order to link a subway station with an appeal for art, we decided to count the number of art galleries in a radius of 0.2 miles around the subway station. This would be a great indicator of the artiness of the zone.
We found the data to do so on NYC OpenData: https://data.cityofnewyork.us/Recreation/New-York-City-Art-Galleries/tgyc-r5jh/data
The dataset includes all art galleries in New York, many information on the galleries such as name, telephone.. and the GPS coordinates.
For each station, we needed the GPS coordinates, to link them to the art galleries and the neighborhood, the name of the station and the type of Entrance.
We found these information on NYC OpenData: https://data.cityofnewyork.us/Transportation/Subway-Stations/arq3-7z49/data
In order to have insights on the wealth of commuters at each station, one indicator is the median income in the neighborhood of the station. We found this information on this website: http://statisticalatlas.com/county-subdivision/New-York/New-York-County/Manhattan/Household-Income#figure/neighborhood and scrapped manually.
We then collected the GPS coordinates of the centroid of each neighborhood using Google Maps, in order to link each station to the neighborhood of the centroid it is closest to.
We've had to go through the following preprocessing steps:
Prepare all GPS coordinates to the same format
Transform the traffic data in a usable format. From turnstile events to daily traffic, per station
Format all Subway station names - Entity recognition problem - Mapping between datasets
Deduplicate the subway stations, in the case where there are different entrances and entrance types
These steps are not in the following report as they don't include visualization but are available on the github repository.
Below is a snapshot of the different datasets in their preprocessed format, before merging.
from IPython.display import HTML
HTML('''<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from sklearn.preprocessing import StandardScaler
import math
import operator
from geopy.distance import vincenty
import distance
import json
import colorlover as cl
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
nei = pd.read_csv('data/neighborhoods.csv', sep=';')
nei['X'] = nei['X'].apply(lambda x: "-{:.6f}".format(x))
nei['Y'] = nei['Y'].apply(lambda x: "{:.6f}".format(x))
meancome = np.mean(nei['Median Income'])
nei.index = nei['Neighborhood']
nei.head()
gal = pd.read_csv('data/galleries_untouched.csv', sep=',')
gal['Y'] = gal['the_geom'].apply(lambda x: x.split(' ')[2].strip(')')[:9])
gal['X'] = gal['the_geom'].apply(lambda x: x.split(' ')[1].strip('(')[:10])
del gal['the_geom']
del gal['ADDRESS2']
gal = gal[['NAME','Y','X','TEL','URL','ADDRESS1','CITY','ZIP']]
gal.head()
sta = pd.read_csv('data/stations_coord_entrance.csv', sep=';')
types={'Stair':0,'Door':1,'Walkway':2,'Ramp': 3, 'Easement':4, 'Escalator':5, 'Elevator': 6}
sta['Entrance'] = sta['Entrance'].map(types)
types_rev={'0':'Stair','1':'Door','2':'Walkway','3':'Ramp','4':'Easement','5':'Escalator','6':'Elevator'}
types_col={'Stair':'r','Door':'g','Walkway':'b','Ramp': 'y', 'Easement':'b', 'Escalator':'r', 'Elevator': 'g'}
ent = {nam:[] for nam in sta['Name']}
for k in range(len(sta)):
ent[sta['Name'][k]].append(sta['Entrance'][k])
ent = {sta: types_rev[str(max(ent[sta]))] for sta in ent.keys()}
del sta['Entrance']
sta = sta.groupby('Name').mean()
sta['Name'] = sta.index
sta.index = range(len(sta))
sta['Entrance'] = sta['Name'].apply(lambda x: ent[x])
sta['X'] = sta['X'].apply(lambda x: "{:.6f}".format(x))
sta['Y'] = sta['Y'].apply(lambda x: "{:.6f}".format(x))
sta = sta[['Name','Y','X','Entrance']]
sta.head()
For each station, we have the traffic (sum of entries and exits per day) for 6 consecutive days:
April 8th 2017
April 9th 2017
April 10th 2017
April 11th 2017
April 12th 2017
April 13th 2017
sta_traffic = pd.read_csv('data/station_traffic.csv')
#Map those station names to the ones in the DataFrame sta, that has all information on stations
sta_traffic['Name'] = sta_traffic['Name'].apply(lambda x:str.lower(x))
with open('data/map_station_names.json','r') as f:
map_names = json.load(f)
map_names_rev = {str(bad):str(good) for good,bad in map_names.iteritems()}
map_names_normal = {orig_name: orig_name.lower() for orig_name in sta['Name']}
map_names_normal_rev = {low:orig for orig, low in map_names_normal.iteritems()}
map_final = {bad: map_names_normal_rev[good] for bad, good in map_names_rev.iteritems()}
sta_traffic['Name'] = sta_traffic['Name'].map(map_final)
sta_traffic = sta_traffic.dropna()
sta_traffic['traffic_mean'] = (sta_traffic['traffic_april8']+sta_traffic['traffic_april9']+sta_traffic['traffic_april10']+sta_traffic['traffic_april11']+sta_traffic['traffic_april12']+sta_traffic['traffic_april13'])/6
sta_traffic.head()
The final dataset is a dataset where for each subway station, we have all required information:
Name
Number of art galleries
Median Income
Traffic
Entrance Type
Basically, we started with the dataset where each subway station is described and went through the following steps:
Compute the number of art galleries within 0.2 miles of the station, using GPS Coordinates of the galleries
Assign a neighborhood and a median income, using GPS Coordinates of the neighborhood centroids
Join the obtained dataset with the dataset containing the traffic data, using a mapping between two different formats for the names of the stations, that we computed using Stemming and Levenstein Distance
Note that some stations don't have a neighborhood as we focused on neighborhoods near Manhattan. We only kept those stations, afterwards. We also filtered out the stations that have less than two galleries around, in order to make the visualizations more readable and because those stations are not very interesting, based on our assumptions above.
#This function computes the distance, in miles, between two GPS points.
def dist(x1, x2, y1, y2):
dist = vincenty((x1, y1), (x2, y2)).miles
return dist
sta_nbgal={sta_name: 0 for sta_name in sta['Name']}
sta_neighborhood={sta_name: '' for sta_name in sta['Name']}
for k in range(len(sta)):
x1 = sta['X'][k]
y1 = sta['Y'][k]
station = sta['Name'][k]
for j in range(len(gal)):
dz = dist(x1, gal['X'][j], y1, gal['Y'][j])
if dz < 0.2:
sta_nbgal[station] += 1
min_dz = 999
for l in range(len(nei)):
dz = dist(x1, nei['X'][l], y1, nei['Y'][l])
if dz < min_dz and dz < 4:
min_dz = dz
sta_neighborhood[station]=nei['Neighborhood'][l]
sta['Nb_gal'] = sta['Name'].map(sta_nbgal)
sta['Neighborhood'] = sta['Name'].map(sta_neighborhood)
Below is a snapshot of the dataset of the Subway stations, before merging the traffic data, but after computing the number of galleries and joining with the neighborhood dataset
sta = sta.join(nei, on='Neighborhood', how='left', lsuffix='', rsuffix='_nei')
sta = sta[['Name','Y','X','Entrance','Nb_gal','Neighborhood','Median Income']]
sta.head()
sta_traffic.index = sta_traffic['Name']
sta = sta.join(sta_traffic, on='Name', how='left', lsuffix='', rsuffix='_nei')
sta = sta.dropna()
del sta['Name_nei']
sta.index = sta['Name']
sta = sta[sta['Nb_gal']>2]
Here is a snapshot of the final dataset
sta.head()
import plotly.plotly as py
import cufflinks as cf
import plotly.graph_objs as go
from plotly.graph_objs import *
import plotly
plotly.offline.init_notebook_mode()
cf.set_config_file(offline=False, world_readable=True, theme='ggplot')
Type of Entrance
Density in Art Galleries
Median Income of the neighborhood
Traffic
data = [go.Bar(
x=sta['Entrance'].value_counts().index,
y=list(sta['Entrance'].value_counts()),
marker=dict(color='rgb(62,57,193)')
)]
layout = go.Layout(
title="Type of Entrance - Frequency"
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
Most stations have stairs, which can be a problem if the artworks are very heavy or large for instance. An artist may want to select a station that has an elevator.
data = [go.Bar(
y=sta['Nb_gal'].sort_values(ascending=False)[:30].index[::-1],
x=sta['Nb_gal'].sort_values(ascending=False)[:30][::-1],
marker=dict(color='rgb(62,57,193)'),
text= ['galleries around<br><b>'+sta['Neighborhood'][sta['Nb_gal'].sort_values(ascending=False)[:30].index[::-1][k]]+'</b>' for k in range(30)],
orientation = 'h'
)]
layout = go.Layout(
autosize=False,
width=1000,
height=700,
margin=go.Margin(
l=170,
r=30,
b=100,
t=100,
pad=4
),
title="Number of galleries around each Subway Station",
yaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
autotick=False,
),
xaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showgrid=True,
showticklabels=True,
ticks='outside',
title='Number of art galeries within 0.2 miles'
),
bargap=0.4
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
We find reassuring insights: 68th St-Hunter College, Lexington Av for instance are in the Upper East Side, in the Museum area - MET, Guggenheim... - where the art gallery density is indeed very high.
Spring St, Canal St, Prince St are in very arty areas downtown Manhattan, it is thus a good thing to find them at the top of our ranking.
data = [go.Bar(
y=sta['traffic_mean'].sort_values(ascending=False)[:30].index[::-1],
x=sta['traffic_mean'].sort_values(ascending=False)[:30][::-1],
marker=dict(color='rgb(62,57,193)'),
text= ['<b>'+sta['Neighborhood'][sta['traffic_mean'].sort_values(ascending=False)[:30].index[::-1][k]]+'</b>' for k in range(30)],
orientation = 'h'
)]
layout = go.Layout(
autosize=False,
width=900,
height=700,
margin=go.Margin(
l=210,
r=0,
b=100,
t=100,
pad=4
),
title="Mean daily traffic for each Subway Station",
yaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
autotick=False,
),
xaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showgrid=True,
showticklabels=True,
ticks='outside',
title='Mean daily traffic'
),
bargap=0.4
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
There is not much surprise on the Traffic either. Grand Central, 34th St and 42nd St are known to be the main train stations in Manhattan, with a massive daily traffic. We still see that the slope is pretty high at the top of the ranking, meaning that there are a few stations with a massive traffic and a lot of stations with a more homogeneous traffic, around 50k people per day.
data = [go.Bar(
y=sta['Median Income'].sort_values(ascending=False)[:30].index[::-1],
x=sta['Median Income'].sort_values(ascending=False)[:30][::-1],
marker=dict(color='rgb(62,57,193)'),
text= ['<b>'+sta['Neighborhood'][sta['Median Income'].sort_values(ascending=False)[:30].index[::-1][k]]+'</b>' for k in range(30)],
orientation = 'h'
)]
layout = go.Layout(
autosize=False,
width=900,
height=700,
margin=go.Margin(
l=210,
r=0,
b=100,
t=100,
pad=4
),
title="Median Income around each Subway Station (k$)",
yaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
autotick=False,
),
xaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showgrid=True,
showticklabels=True,
ticks='outside',
title='Median Income'
),
bargap=0.4
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
The horizontal bars here are obviously grouped by neighborhood. This is due to the way we computed the median income for each station, as we assigned the income of its neighborhood to each station.
We find the stations in the richest neighborhoods on top (Chambers St in Tribeca, Whitehall St in Battery Park, Lexington Av in N. Sutton Area...)
In the following charts, we combine the different variables, in order to gain insights on the subway stations to pick.
In this first scatter plot,
dot: a subway station
y coordinate: number of art galleries around
x coordinate: mean traffic
data = [go.Scatter(
x = sta['traffic_mean'],
y = sta['Nb_gal'],
text = sta['Name'],
mode = 'markers',
name = 'Subway station',
marker = dict(
size = 13,
color = 'rgb(62,57,193)'
)
)]
layout = go.Layout(
hovermode="closest",
autosize=False,
width=1000,
height=700,
margin=go.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
),
title="Scatter plot of the Subway Stations",
yaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
title='Density of art Galleries'
),
xaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
title='Mean Traffic'
),
bargap=0.4
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
# IPython notebook
# py.iplot(fig, filename='pandas/multiple-scatter')
Based on this plot, we can combine our observations on the traffic and the density of art galleries. Typically the subway stations are polarized along the axis. There are no stations with a massive traffic as well as a large number of galleries. There is mostly a trade off between an arty station and a station with a lot of traffic, except for a few stations such as Canal St or West 4. We will go deeper in the analysis in the charts below.
However, we have absolute values for both features, which is not optimal for our task of selecting the subway station that would match with an artist, based on their criteria.
By scaling the variables Number of art galleries and Traffic (retrieveing the mean and dividing by the standard deviation), we will be able to see what stations are more arty than the majority, and which ones have more traffic than the mean!
In the following plot, we have scaled the data. As a consequence of that, the dot in the upper right part of the graph have more traffic and more galleries than the majority of stations, those in the upper left part have more galleries but less traffic, those in the lower left part have less galleries and less traffic and those in the lower right part have more traffic but less galleries.
These four groups have been assigned different colors
sta['traffic_scaled'] = sta['traffic_mean']
sta['traffic_scaled'] = sta['traffic_scaled'].apply(lambda x: (x-np.mean(sta['traffic_mean']))/np.std(sta['traffic_mean']))
sta['nbgal_scaled'] = sta['Nb_gal']
sta['nbgal_scaled'] = sta['nbgal_scaled'].apply(lambda x: (x-np.mean(sta['Nb_gal']))/np.std(sta['Nb_gal']))
sta['quad'] = 0
for k in range(len(sta['quad'])):
if sta['traffic_scaled'][k]<0:
if sta['nbgal_scaled'][k]<0:
sta['quad'][k]=1
else:
sta['quad'][k]=2
else:
if sta['nbgal_scaled'][k]<0:
sta['quad'][k]=3
else:
sta['quad'][k]=4
data = [go.Scatter(
x = sta[sta['quad']==1]['traffic_scaled'],y = sta[sta['quad']==1]['nbgal_scaled'],
text = sta[sta['quad']==1]['Name'],mode = 'markers',
name = 'Low art / Low traffic',marker = dict(size = 13,color = 'rgb(255,215,0)',opacity=0.9)),
go.Scatter(
x = sta[sta['quad']==2]['traffic_scaled'],y = sta[sta['quad']==2]['nbgal_scaled'],
text = sta[sta['quad']==2]['Name'],mode = 'markers',
name = 'High art / Low traffic',marker = dict(size = 13,color = 'rgb(34,139,34)',opacity=0.9)),
go.Scatter(
x = sta[sta['quad']==3]['traffic_scaled'],y = sta[sta['quad']==3]['nbgal_scaled'],
text = sta[sta['quad']==3]['Name'],mode = 'markers',
name = 'Low art / High traffic',marker = dict(size = 13,color = 'rgb(240,128,128)',opacity=0.9)),
go.Scatter(
x = sta[sta['quad']==4]['traffic_scaled'],y = sta[sta['quad']==4]['nbgal_scaled'],
text = sta[sta['quad']==4]['Name'],mode = 'markers',
name = 'High art / High traffic',marker = dict(size = 13,color = 'rgb(100,149,237)',opacity=0.9)),
]
layout = go.Layout(
hovermode="closest",
autosize=False,
width=1000,
height=700,
margin=go.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
),
title="Scatter plot of the Subway Stations",
yaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
title='Density of art Galleries'
),
xaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showgrid=True,
showticklabels=True,
ticks='outside',
title='Mean Traffic'
),
showlegend=True
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
As described above and in the legend of the plot, we have defined four groups of art galleries.
Below is an example of how to read this plot:
An artist who doesn't want to engage with arty people but wants to reach the larger audience as possible would probably pick a station in the red group, such as 42nd St.
An artist who wants a lot of traffic as well as engaging with arty people would try a station in the blue group, such as West 4 or Canal St.
An artist who wants to be in a rather calm station - low traffic - but with arty people may want to go to Spring St or Lexington Av!
And an artist who wants neither traffic nor arty commuters may pick a station in the yellow group!
Yet, this plot doesn't talk about the mean income of people living near the station, and thus likely to commute through the station. We will add this feature in the next plot.
In this plot, the size of the dot is correlated to the median income of the neighborhood the station is located in.
def bin_income(x):
if x<50:
return 11
elif 50<x<100:
return 15
elif 100<x<150:
return 20
else:
return 27
sta['income_binned'] = sta['Median Income']
sta['income_binned'] = sta['income_binned'].apply(lambda x: bin_income(x))
data = [go.Scatter(
x = sta[sta['quad']==1]['traffic_scaled'],y = sta[sta['quad']==1]['nbgal_scaled'],
text = sta[sta['quad']==1]['Name'],mode = 'markers',name = 'Low art / Low traffic',
marker = dict(size = sta[sta['quad']==1]['income_binned'],color = 'rgb(255,215,0)',opacity=0.9)),
go.Scatter(
x = sta[sta['quad']==2]['traffic_scaled'],y = sta[sta['quad']==2]['nbgal_scaled'],
text = sta[sta['quad']==2]['Name'],mode = 'markers',name = 'High art / Low traffic',
marker = dict(size = sta[sta['quad']==2]['income_binned'],color = 'rgb(34,139,34)',opacity=0.9)),
go.Scatter(
x = sta[sta['quad']==3]['traffic_scaled'],y = sta[sta['quad']==3]['nbgal_scaled'],
text = sta[sta['quad']==3]['Name'],mode = 'markers',name = 'Low art / High traffic',
marker = dict(size = sta[sta['quad']==3]['income_binned'],color = 'rgb(240,128,128)',opacity=0.9)),
go.Scatter(
x = sta[sta['quad']==4]['traffic_scaled'],y = sta[sta['quad']==4]['nbgal_scaled'],
text = sta[sta['quad']==4]['Name'],mode = 'markers',name = 'High art / High traffic',
marker = dict(size = sta[sta['quad']==4]['income_binned'],color = 'rgb(100,149,237)',opacity=0.9)),
]
layout = go.Layout(
hovermode="closest",
autosize=False,
width=1000,
height=700,
margin=go.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
),
title="Scatter plot of the Subway Stations",
yaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
title='Density of art Galleries'
),
xaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showgrid=True,
showticklabels=True,
ticks='outside',
title='Mean Traffic'
),
showlegend=True
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
We can now complete our analysis!
Let's say the artist really wants to sell his artworks and not only display it.
If he finds himself in the blue group, he may prefer West 4th over Canal St.
If he chose the red group, he may stay away from Bedford Ave and go to 59th Columbus Circle or 42nd st.
If the green group took his preference, Lexington Av would be a better option than Prince St for instance!
We replace the feature 'Income' by the feature 'Entrance' to maintain a great readability of the graph. As there are 5 types of Entrance, we assign a color to each type of Entrance, as descirbed in the legend.
data = [go.Scatter(
x = sta[sta['Entrance']=='Stair']['traffic_scaled'],y = sta[sta['Entrance']=='Stair']['nbgal_scaled'],
text = sta[sta['Entrance']=='Stair']['Name'],mode = 'markers',name = 'Stair',
marker = dict(size = 15,color = 'rgb(255,192,203)',opacity=0.9)),
go.Scatter(
x = sta[sta['Entrance']=='Door']['traffic_scaled'],y = sta[sta['Entrance']=='Door']['nbgal_scaled'],
text = sta[sta['Entrance']=='Door']['Name'],mode = 'markers',name = 'Door',
marker = dict(size = 15,color = 'rgb(0,0,205)',opacity=0.9)),
go.Scatter(
x = sta[sta['Entrance']=='Easement']['traffic_scaled'],y = sta[sta['Entrance']=='Easement']['nbgal_scaled'],
text = sta[sta['Entrance']=='Easement']['Name'],mode = 'markers',name = 'Easement',
marker = dict(size = 15,color = 'rgb(138,43,226)',opacity=0.9)),
go.Scatter(
x = sta[sta['Entrance']=='Escalator']['traffic_scaled'],y = sta[sta['Entrance']=='Escalator']['nbgal_scaled'],
text = sta[sta['Entrance']=='Escalator']['Name'],mode = 'markers',name = 'Escalator',
marker = dict(size = 15,color = 'rgb(139,0,139)',opacity=0.9)),
go.Scatter(
x = sta[sta['Entrance']=='Elevator']['traffic_scaled'],y = sta[sta['Entrance']=='Elevator']['nbgal_scaled'],
text = sta[sta['Entrance']=='Elevator']['Name'],mode = 'markers',name = 'Elevator',
marker = dict(size = 15,color = 'rgb(255,20,147)',opacity=0.9))
]
layout = go.Layout(
hovermode="closest",
autosize=False,
width=1000,
height=700,
margin=go.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
),
title="Scatter plot of the Subway Stations - Entrance Type",
yaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showticklabels=True,
ticks='outside',
title='Density of art Galleries'
),
xaxis=dict(
titlefont=dict(
family='Arial, sans-serif',
size=18,
color='lightgrey'
),
showgrid=True,
showticklabels=True,
ticks='outside',
title='Mean Traffic'
),
showlegend=True
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
This artist should look for the Light Pink dots, in the upper left corner, as he doesn't particularly want to sell.
We would recommend him to try Lexington Av or 34 St Hudson Yards!
In the nearest future, build a recommandation algorithm, that would recommand a subway station to an artist, based on its preferences. We would then add this recommandation system to this web app :)