{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Visualización Declarativa"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Es un paradigma de visualización en donde se busca preocuparse de los datos y sus relaciones, más que en detalles sin mayor importancia. Algunas características son:\n",
"\n",
"* Se especifica lo que se desea hacer.\n",
"* Los detalles se determinan automáticamente.\n",
"* Especificación y Ejecución están separadas.\n",
"\n",
"A modo de resumen, se refiere a construir visualizaciones a partir de los siguientes elementos:\n",
"\n",
"* _Data_\n",
"* _Transformation_\n",
"* _Marks_\n",
"* _Encoding_\n",
"* _Scale_\n",
"* _Guides_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para una visualización declarativa adecuata, los datos deben encontrarse en el formato _*Tidy*_, es decir:\n",
"\n",
"* Cada variable corresponde a una columna.\n",
"* Cada observación corresponde a una fila.\n",
"* Cada tipo de unidad de observación corresponde a una tabla.\n",
"\n",
"Más detalles puedes ser encontrados en el siguiente [link](http://vita.had.co.nz/papers/tidy-data.pdf)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Un ejemplo de datos _Tidy_:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" ID \n",
" Color \n",
" Duración \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 0 \n",
" Azul \n",
" 1 \n",
" \n",
" \n",
" 1 \n",
" 1 \n",
" Rojo \n",
" 1 \n",
" \n",
" \n",
" 2 \n",
" 2 \n",
" Azul \n",
" 3 \n",
" \n",
" \n",
" 3 \n",
" 3 \n",
" Azul \n",
" 3 \n",
" \n",
" \n",
" 4 \n",
" 4 \n",
" Rojo \n",
" 3 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ID Color Duración\n",
"0 0 Azul 1\n",
"1 1 Rojo 1\n",
"2 2 Azul 3\n",
"3 3 Azul 3\n",
"4 4 Rojo 3"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.DataFrame({\n",
" \"ID\": range(5),\n",
" \"Color\": [\"Azul\", \"Rojo\", \"Azul\", \"Azul\", \"Rojo\"],\n",
" \"Duración\": [1, 1, 3, 3, 3]\n",
" })\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Un ejemplo de datos __NO__ _Tidy_"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Color \n",
" Azul \n",
" Rojo \n",
" \n",
" \n",
" ID \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 1.0 \n",
" NaN \n",
" \n",
" \n",
" 1 \n",
" NaN \n",
" 1.0 \n",
" \n",
" \n",
" 2 \n",
" 3.0 \n",
" NaN \n",
" \n",
" \n",
" 3 \n",
" 3.0 \n",
" NaN \n",
" \n",
" \n",
" 4 \n",
" NaN \n",
" 3.0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Color Azul Rojo\n",
"ID \n",
"0 1.0 NaN\n",
"1 NaN 1.0\n",
"2 3.0 NaN\n",
"3 3.0 NaN\n",
"4 NaN 3.0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.pivot(index=\"ID\", columns=\"Color\", values=\"Duración\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"No es _Tidy_ puesto que la variable \"Color\" utiliza más de una columna."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Idea:__ Buenas implementaciones pueden influir en buenas conceptualizaciones."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diferencias entre enfoques\n",
"\n",
"| Imperativa | Declarativa | \n",
"| ------|------------ | \n",
"| Especificar _cómo_ se debe hacer algo | Especificar _qué_ se quiere hacer |\n",
"| Especificación y ejecución entrelazadas | Separar especificación de ejecución |\n",
"| _Colocar un círculo rojo aquí y un círculo azul acá_ | _Mapear `x` como posición e `y` como el color_ |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El _Iris dataset_ es un conjunto de datos famoso por ser un buen ejemplo, por lo que nos servirá para mostrar una de las mayores diferencias entre una visualización imperativa (como `matplotlib`) versus una declarativa (como `altair`)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ThemeRegistry.enable('opaque')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import altair as alt\n",
"from vega_datasets import data # Una librería con muchos datasets\n",
"alt.themes.enable('opaque') # Para quienes utilizan temas oscuros en Jupyter Lab"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\u001b[0;31mSignature:\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0miris\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0muse_local\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mType:\u001b[0m Dataset\n",
"\u001b[0;31mString form:\u001b[0m \n",
"\u001b[0;31mFile:\u001b[0m /opt/conda/lib/python3.8/site-packages/vega_datasets/core.py\n",
"\u001b[0;31mDocstring:\u001b[0m \n",
"Loader for the iris dataset.\n",
"\n",
"This classic dataset contains lengths and widths of petals and sepals\n",
"for 150 iris flowers, drawn from three species. It was introduced\n",
"by R.A. Fisher in 1936 [1]_.\n",
"\n",
"This dataset is bundled with vega_datasets; it can be loaded without web access.\n",
"Dataset source: https://vega.github.io/vega-datasets/data/iris.json\n",
"\n",
"Usage\n",
"-----\n",
"\n",
" >>> from vega_datasets import data\n",
" >>> iris = data.iris()\n",
" >>> type(iris)\n",
" \n",
"\n",
"Equivalently, you can use\n",
"\n",
" >>> iris = data('iris')\n",
"\n",
"To get the raw dataset rather than the dataframe, use\n",
"\n",
" >>> data_bytes = data.iris.raw()\n",
" >>> type(data_bytes)\n",
" bytes\n",
"\n",
"To find the dataset url, use\n",
"\n",
" >>> data.iris.url\n",
" 'https://vega.github.io/vega-datasets/data/iris.json'\n",
"\n",
"Attributes\n",
"----------\n",
"filename : string\n",
" The filename in which the dataset is stored\n",
"url : string\n",
" The full URL of the dataset at http://vega.github.io\n",
"format : string\n",
" The format of the dataset: usually one of {'csv', 'tsv', 'json'}\n",
"pkg_filename : string\n",
" The path to the local dataset within the vega_datasets package\n",
"is_local : bool\n",
" True if the dataset is available locally in the package\n",
"filepath : string\n",
" If is_local is True, the local file path to the dataset.\n",
"\n",
"References\n",
"----------\n",
".. [1] R. A. Fisher (1936). 'The use of multiple measurements in\n",
" taxonomic problems'. Annals of Eugenics. 7 (2): 179-188.\n",
"\u001b[0;31mClass docstring:\u001b[0m Class to load a particular dataset by name\n",
"\u001b[0;31mCall docstring:\u001b[0m \n",
"Load and parse the dataset from remote URL or local file\n",
"\n",
"Parameters\n",
"----------\n",
"use_local : boolean\n",
" If True (default), then attempt to load the dataset locally. If\n",
" False or if the dataset is not available locally, then load the\n",
" data from an external URL.\n",
"**kwargs :\n",
" additional keyword arguments are passed to data parser (usually\n",
" pd.read_csv or pd.read_json, depending on the format of the data\n",
" source)\n",
"\n",
"Returns\n",
"-------\n",
"data :\n",
" parsed data\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Una breve descripción\n",
"data.iris?"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" sepalLength \n",
" sepalWidth \n",
" petalLength \n",
" petalWidth \n",
" species \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 5.1 \n",
" 3.5 \n",
" 1.4 \n",
" 0.2 \n",
" setosa \n",
" \n",
" \n",
" 1 \n",
" 4.9 \n",
" 3.0 \n",
" 1.4 \n",
" 0.2 \n",
" setosa \n",
" \n",
" \n",
" 2 \n",
" 4.7 \n",
" 3.2 \n",
" 1.3 \n",
" 0.2 \n",
" setosa \n",
" \n",
" \n",
" 3 \n",
" 4.6 \n",
" 3.1 \n",
" 1.5 \n",
" 0.2 \n",
" setosa \n",
" \n",
" \n",
" 4 \n",
" 5.0 \n",
" 3.6 \n",
" 1.4 \n",
" 0.2 \n",
" setosa \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sepalLength sepalWidth petalLength petalWidth species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris = data.iris()\n",
"iris.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 150 entries, 0 to 149\n",
"Data columns (total 5 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 sepalLength 150 non-null float64\n",
" 1 sepalWidth 150 non-null float64\n",
" 2 petalLength 150 non-null float64\n",
" 3 petalWidth 150 non-null float64\n",
" 4 species 150 non-null object \n",
"dtypes: float64(4), object(1)\n",
"memory usage: 6.0+ KB\n"
]
}
],
"source": [
"iris.info()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" sepalLength \n",
" sepalWidth \n",
" petalLength \n",
" petalWidth \n",
" species \n",
" \n",
" \n",
" \n",
" \n",
" count \n",
" 150.000000 \n",
" 150.000000 \n",
" 150.000000 \n",
" 150.000000 \n",
" 150 \n",
" \n",
" \n",
" unique \n",
" NaN \n",
" NaN \n",
" NaN \n",
" NaN \n",
" 3 \n",
" \n",
" \n",
" top \n",
" NaN \n",
" NaN \n",
" NaN \n",
" NaN \n",
" virginica \n",
" \n",
" \n",
" freq \n",
" NaN \n",
" NaN \n",
" NaN \n",
" NaN \n",
" 50 \n",
" \n",
" \n",
" mean \n",
" 5.843333 \n",
" 3.057333 \n",
" 3.758000 \n",
" 1.199333 \n",
" NaN \n",
" \n",
" \n",
" std \n",
" 0.828066 \n",
" 0.435866 \n",
" 1.765298 \n",
" 0.762238 \n",
" NaN \n",
" \n",
" \n",
" min \n",
" 4.300000 \n",
" 2.000000 \n",
" 1.000000 \n",
" 0.100000 \n",
" NaN \n",
" \n",
" \n",
" 25% \n",
" 5.100000 \n",
" 2.800000 \n",
" 1.600000 \n",
" 0.300000 \n",
" NaN \n",
" \n",
" \n",
" 50% \n",
" 5.800000 \n",
" 3.000000 \n",
" 4.350000 \n",
" 1.300000 \n",
" NaN \n",
" \n",
" \n",
" 75% \n",
" 6.400000 \n",
" 3.300000 \n",
" 5.100000 \n",
" 1.800000 \n",
" NaN \n",
" \n",
" \n",
" max \n",
" 7.900000 \n",
" 4.400000 \n",
" 6.900000 \n",
" 2.500000 \n",
" NaN \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sepalLength sepalWidth petalLength petalWidth species\n",
"count 150.000000 150.000000 150.000000 150.000000 150\n",
"unique NaN NaN NaN NaN 3\n",
"top NaN NaN NaN NaN virginica\n",
"freq NaN NaN NaN NaN 50\n",
"mean 5.843333 3.057333 3.758000 1.199333 NaN\n",
"std 0.828066 0.435866 1.765298 0.762238 NaN\n",
"min 4.300000 2.000000 1.000000 0.100000 NaN\n",
"25% 5.100000 2.800000 1.600000 0.300000 NaN\n",
"50% 5.800000 3.000000 4.350000 1.300000 NaN\n",
"75% 6.400000 3.300000 5.100000 1.800000 NaN\n",
"max 7.900000 4.400000 6.900000 2.500000 NaN"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.describe(include=\"all\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El ejemplo clásico consiste en graficar _sepalWidth_ versus _petalLength_ y colorear por especie. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Imperativo\n",
"\n",
"En `matplotlib` sería algo así:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"color_map = dict(zip(iris[\"species\"].unique(), \n",
" [\"blue\", \"green\", \"red\"]))\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"\n",
"for species, group in iris.groupby(\"species\"):\n",
" plt.scatter(group[\"petalLength\"], \n",
" group[\"sepalWidth\"],\n",
" color=color_map[species],\n",
" alpha=0.3,\n",
" edgecolor=None,\n",
" label=species,\n",
" )\n",
" \n",
"plt.legend(frameon=True, title=\"species\")\n",
"plt.xlabel(\"petalLength\")\n",
"plt.ylabel(\"sepalWidth\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Declarativo\n",
"\n",
"En `altair` sería algo así:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(iris).mark_point().encode(\n",
" x=\"petalLength\",\n",
" y=\"sepalWidth\",\n",
" color=\"species\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Spoiler: Solo bastan un par de líneas extras para crear un gráfico interactivo!"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(iris).mark_point().encode(\n",
" x=\"petalLength\",\n",
" y=\"sepalWidth\",\n",
" color=\"species\",\n",
" tooltip=\"species\"\n",
").interactive()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Altair \n",
"\n",
"_Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub._\n",
"\n",
"_With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data\n",
"Los datos en Altair son basados en Dataframe de Pandas, los cuales deben ser _Tidy_ para una mejor experiencia.\n",
"\n",
"El objeto _*Chart*_ es el fundamental, pues tiene como argumento los datos."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# Utilizaremos estos datos como ejemplo\n",
"import pandas as pd\n",
"df = pd.DataFrame({'a': list('CCCDDDEEE'),\n",
" 'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"import altair as alt\n",
"chart = alt.Chart(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mark\n",
"\n",
"¿Cómo queremos que se vean los datos? La respuesta está en los _marks_, que en Altair corresponden a un método de un objeto _Chart_. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df).mark_point()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La representación anterior consiste en un solo punto, pues aún no se ha especificado las posiciones de los puntos."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Encoding\n",
"\n",
"Canales asociados a columnas de los datos con tal de separar visualmente los elementos (que para estos datos se están graficando puntos). \n",
"\n",
"Por ejemplo, es posible codificar la variable `a` con el canal `x`, que representa el eje horizontal donde se posicionan los puntos. Esto es posible mediante el método `encode` de los objetos _Charts_."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df).mark_point().encode(\n",
" x='a',\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Los principales canales de _encoding_ son `x`, `y`, `color`, `shape`, `size`, etc. los cuales se pueden designar utilizando el nombre de la columna asociada a los datos."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finalmente, separemos la posición vertical asignando el canal `y`, que como te imaginas, corresponde al eje vertical."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df).mark_point().encode(\n",
" x='a',\n",
" y='b'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformación\n",
"\n",
"Altair permite incluso transformar datos con tal de entregar mayor flexibilidad, para ello dispone de una sintaxis incorporada para `Agregaciones`. Por ejemplo, para calcular el promedio."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df).mark_point().encode(\n",
" x='a',\n",
" y='average(b)'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Aunque en realidad es más acertado utilizar gráficos de barra para mostrar agregaciones. Es tan fácil como cambiar el método `mark_point()` por `mark_bar()`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df).mark_bar().encode(\n",
" x='a',\n",
" y='average(b)'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Personalización\n",
"\n",
"Por defecto, Altair a través de Vega-Lite realiza algunas elecciones sobre las propiedades por defecto en cada visualización. Sin embargo, Altair también provee una API para personalizar los gráficos. Por ejemplo, es posible especificar el título de cada eje utilizando los atributos de los canales. Inclusive es posible escoger el color de los _marks_."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df).mark_bar(color='firebrick').encode(\n",
" y=alt.Y('a', axis=alt.Axis(title='category')),\n",
" x=alt.X('average(b)', axis=alt.Axis(title='avg(b) by category'))\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"En el ejemplo anterior no basta con solo el nombre de la columna, es necesario crear el objeto `alt.__` correspondiente a los canales."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Otro ejemplo útil consiste en juntar dos gŕaficos en una misma figura."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.HConcatChart(...)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vertical_chart = alt.Chart(df).mark_bar().encode(\n",
" x='a',\n",
" y='average(b)'\n",
")\n",
"\n",
"horizontal_chart = alt.Chart(df).mark_bar(color='firebrick').encode(\n",
" y=alt.Y('a', axis=alt.Axis(title='category')),\n",
" x=alt.X('average(b)', axis=alt.Axis(title='avg(b) by category'))\n",
")\n",
"\n",
"vertical_chart | horizontal_chart "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Inclusive se puden sumar!"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vertical_chart_point = alt.Chart(df).mark_point(color='firebrick').encode(\n",
" x='a',\n",
" y='average(b)'\n",
")\n",
"\n",
"vertical_chart + vertical_chart_point"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Gráfico a Gráfico"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gráfico de Barras"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = pd.DataFrame({\n",
" 'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],\n",
" 'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]\n",
"})\n",
"\n",
"alt.Chart(source).mark_bar().encode(\n",
" x='a',\n",
" y='b'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Añadir una capa de complejidad, ya sea diferenciar por color, es tan simple como:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.barley() # Datos de cultivos\n",
"\n",
"alt.Chart(source).mark_bar().encode(\n",
" x='year:O',\n",
" y='sum(yield):Q',\n",
" color='year:N',\n",
" column='site:N'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Histogramas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muesrta un histograma con una línea superpuesta indicando la media global."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.cars()\n",
"\n",
"base = alt.Chart(source)\n",
"\n",
"hist = base.mark_bar().encode(\n",
" x=alt.X('Horsepower:Q', bin=True),\n",
" y='count()'\n",
")\n",
"\n",
"rule = base.mark_rule(color='red').encode(\n",
" x='mean(Horsepower):Q',\n",
" size=alt.value(5)\n",
")\n",
"\n",
"hist + rule\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gráfico de Líneas\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"x = np.arange(100)\n",
"source = pd.DataFrame({\n",
" 'x': x,\n",
" 'f(x)': np.sin(x / 5)\n",
"})\n",
"\n",
"alt.Chart(source).mark_line().encode(\n",
" x='x',\n",
" y='f(x)'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra un gráfico de líneas de series múltiples de los precios de cierre diarios de las acciones de AAPL, AMZN, GOOG, IBM y MSFT entre 2000 y 2010."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.stocks()\n",
"\n",
"alt.Chart(source).mark_line().encode(\n",
" x='date',\n",
" y='price',\n",
" color='symbol'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scatter Plot\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.cars()\n",
"\n",
"alt.Chart(source).mark_circle(size=60).encode(\n",
" x='Horsepower',\n",
" y='Miles_per_Gallon',\n",
" color='Origin',\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Un scatter plot con una superposición media continua. En este ejemplo, se utiliza una ventana de 30 días para calcular la media de la temperatura máxima alrededor de cada fecha."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.seattle_weather()\n",
"\n",
"line = alt.Chart(source).mark_line(\n",
" color='red',\n",
" size=2\n",
").transform_window(\n",
" rolling_mean='mean(temp_max)',\n",
" frame=[-15, 15]\n",
").encode(\n",
" x='date:T',\n",
" y='rolling_mean:Q'\n",
")\n",
"\n",
"points = alt.Chart(source).mark_point().encode(\n",
" x='date:T',\n",
" y=alt.Y('temp_max:Q',\n",
" axis=alt.Axis(title='Max Temp'))\n",
")\n",
"\n",
"points + line"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gráfico de Barras de Error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra barras de error con desviación estándar utilizando diferentes datos de rendimiento de cultivos en la década de 1930."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.barley()\n",
"\n",
"error_bars = alt.Chart(source).mark_errorbar(extent='stdev').encode(\n",
" x=alt.X('yield:Q', scale=alt.Scale(zero=False)),\n",
" y=alt.Y('variety:N')\n",
")\n",
"\n",
"points = alt.Chart(source).mark_point(filled=True, color='black').encode(\n",
" x=alt.X('yield:Q', aggregate='mean'),\n",
" y=alt.Y('variety:N'),\n",
")\n",
"\n",
"error_bars + points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gráficos de Area\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra como hacer un gráfico de área apilado y normalizado."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.iowa_electricity()\n",
"\n",
"alt.Chart(source).mark_area().encode(\n",
" x=\"year:T\",\n",
" y=alt.Y(\"net_generation:Q\"),\n",
" color=\"source:N\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra como hacer un gráfico de área apilado y normalizado."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.iowa_electricity()\n",
"\n",
"alt.Chart(source).mark_area().encode(\n",
" x=\"year:T\",\n",
" y=alt.Y(\"net_generation:Q\", stack=\"normalize\"),\n",
" color=\"source:N\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mapas de Calor\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra un mapa de calor simple para mostrar datos cuadriculados."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Compute x^2 + y^2 across a 2D grid\n",
"x, y = np.meshgrid(range(-5, 5), range(-5, 5))\n",
"z = x ** 2 + y ** 2\n",
"\n",
"# Convert this grid to columnar data expected by Altair\n",
"source = pd.DataFrame({'x': x.ravel(),\n",
" 'y': y.ravel(),\n",
" 'z': z.ravel()})\n",
"\n",
"alt.Chart(source).mark_rect().encode(\n",
" x='x:O',\n",
" y='y:O',\n",
" color='z:Q'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra un gráfico de texto en capas sobre un mapa de calor utilizando el conjunto de datos de automóviles."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source = data.cars()\n",
"\n",
"# Configure common options\n",
"base = alt.Chart(source).transform_aggregate(\n",
" num_cars='count()',\n",
" groupby=['Origin', 'Cylinders']\n",
").encode(\n",
" alt.X('Cylinders:O', scale=alt.Scale(paddingInner=0)),\n",
" alt.Y('Origin:O', scale=alt.Scale(paddingInner=0)),\n",
").properties(\n",
" width=400,\n",
" height=400\n",
")\n",
"\n",
"# Configure heatmap\n",
"heatmap = base.mark_rect().encode(\n",
" color=alt.Color('num_cars:Q',\n",
" scale=alt.Scale(scheme='viridis'),\n",
" legend=alt.Legend(direction='horizontal')\n",
" )\n",
")\n",
"\n",
"# Configure text\n",
"text = base.mark_text(baseline='middle').encode(\n",
" text='num_cars:Q',\n",
" color=alt.condition(\n",
" alt.datum.num_cars > 100,\n",
" alt.value('black'),\n",
" alt.value('white')\n",
" )\n",
")\n",
"\n",
"# Draw the chart\n",
"heatmap + text"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mapas\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra un mapa coroplético de la tasa de desempleo por condado en los EE. UU."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counties = alt.topo_feature(data.us_10m.url, 'counties')\n",
"source = data.unemployment.url\n",
"\n",
"alt.Chart(counties).mark_geoshape().encode(\n",
" color='rate:Q'\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(source, 'id', ['rate'])\n",
").project(\n",
" type='albersUsa'\n",
").properties(\n",
" width=500,\n",
" height=300\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este ejemplo muestra una visualización geográfica en capas que muestra las posiciones de los aeropuertos de EE. UU. En un contexto de estados de EE. UU."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"airports = data.airports.url\n",
"states = alt.topo_feature(data.us_10m.url, feature='states')\n",
"\n",
"# US states background\n",
"background = alt.Chart(states).mark_geoshape(\n",
" fill='lightgray',\n",
" stroke='white'\n",
").properties(\n",
" width=500,\n",
" height=300\n",
").project('albersUsa')\n",
"\n",
"# airport positions on background\n",
"points = alt.Chart(airports).transform_aggregate(\n",
" latitude='mean(latitude)',\n",
" longitude='mean(longitude)',\n",
" count='count()',\n",
" groupby=['state']\n",
").mark_circle().encode(\n",
" longitude='longitude:Q',\n",
" latitude='latitude:Q',\n",
" size=alt.Size('count:Q', title='Number of Airports'),\n",
" color=alt.value('steelblue'),\n",
" tooltip=['state:N','count:Q']\n",
").properties(\n",
" title='Number of airports in US'\n",
")\n",
"\n",
"background + points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### El límite es la imaginación??\n",
"\n",
"Todos los ejemplos anteriores fueron recopilados de la Galería de Ejemplos de Altair ([link](https://altair-viz.github.io/gallery/index.html)), como podrás darte cuenta, son muchos menos que los ofrecidos por matplotlib. Altair es una librería nueva, alrededor de 3 años, versus los 17 años de matplotlib. \n",
"\n",
"Si bien crear gráficos \"comunes\" es mucho menos verboso, al querer realizar gráficos de bajo nivel matplotlib es el claro ganador. Dependiendo de tus necesidades del momento, debes saber escoger una u otra librería, __son complementarias y no enemigas!__\n",
"\n",
"Sin embargo, eso no quita el hecho que en Altair se puedan realizar gráficos \"poco convencionales\"."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"\"\"\n",
"Isotype Visualization shows the distribution of animals across UK and US, \n",
"using unicode emoji marks rather than custom SVG paths \n",
"(see https://altair-viz.github.io/gallery/isotype.html). \n",
"This is adapted from Vega-Lite example https://vega.github.io/vega-lite/examples/isotype_bar_chart_emoji.html.\n",
"\"\"\"\n",
"\n",
"source = pd.DataFrame([\n",
" {'country': 'Great Britain', 'animal': 'cattle'},\n",
" {'country': 'Great Britain', 'animal': 'cattle'},\n",
" {'country': 'Great Britain', 'animal': 'cattle'},\n",
" {'country': 'Great Britain', 'animal': 'pigs'},\n",
" {'country': 'Great Britain', 'animal': 'pigs'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'Great Britain', 'animal': 'sheep'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'cattle'},\n",
" {'country': 'United States', 'animal': 'pigs'},\n",
" {'country': 'United States', 'animal': 'pigs'},\n",
" {'country': 'United States', 'animal': 'pigs'},\n",
" {'country': 'United States', 'animal': 'pigs'},\n",
" {'country': 'United States', 'animal': 'pigs'},\n",
" {'country': 'United States', 'animal': 'pigs'},\n",
" {'country': 'United States', 'animal': 'sheep'},\n",
" {'country': 'United States', 'animal': 'sheep'},\n",
" {'country': 'United States', 'animal': 'sheep'},\n",
" {'country': 'United States', 'animal': 'sheep'},\n",
" {'country': 'United States', 'animal': 'sheep'},\n",
" {'country': 'United States', 'animal': 'sheep'},\n",
" {'country': 'United States', 'animal': 'sheep'}\n",
" ])\n",
"\n",
"\n",
"alt.Chart(source).mark_text(size=45, baseline='middle').encode(\n",
" alt.X('x:O', axis=None),\n",
" alt.Y('animal:O', axis=None),\n",
" alt.Row('country:N', header=alt.Header(title='')),\n",
" alt.Text('emoji:N')\n",
").transform_calculate(\n",
" emoji=\"{'cattle': '🐄', 'pigs': '🐖', 'sheep': '🐏'}[datum.animal]\"\n",
").transform_window(\n",
" x='rank()',\n",
" groupby=['country', 'animal']\n",
").properties(width=550, height=140)"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"toc-autonumbering": false,
"toc-showcode": false,
"toc-showmarkdowntxt": false,
"toc-showtags": true
},
"nbformat": 4,
"nbformat_minor": 4
}