top of page

IT Salaries Survey Cleaning

Sysarmy is a prominent technical community in Argentina that promotes collaboration and knowledge-sharing among IT professionals. The organization has several initiatives, including its biannual Salary Survey, a crowdsourcing project designed to provide transparency and insights into salaries and working conditions in the tech industry.
The Salary Survey collects anonymous data from IT workers, primarily in Argentina and other Spanish-speaking regions. Its main goal is to equip professionals with tools to compare working conditions, identify market trends, and detect inequalities, such as gender gaps or geographic disparities.
In my position as a data analysis intern at Fillex, an Argentine IT solutions firm, I have been assigned the task of cleaning, transforming, and analyzing the 2023.2 edition of the Sysarmy Salary Survey. This assignment aims to evaluate my technical skills and determine my potential role within the team.

Goals

This project aims to conduct an exploratory data analysis (EDA) of Sysarmy's 2023.2 Salary Survey and extract meaningful insights about the IT market in Argentina.

The following specific sub-objectives are outlined:

     1. Data Cleaning and Transformation

  • Detect and correct missing values, inconsistencies, and errors in the dataset.

  • Standardize formats to facilitate analysis.

     2. Exploratory Data Analysis (EDA)

  • Identify patterns and trends in salaries, roles, and technologies.

  • Analyze the distribution of demographic and labor data.

  • Assess inequalities, such as salary gaps by gender or geographic location.

     3. Process Documentation

  • Record each stage of the analysis, from data preparation to final findings.

  • Create clear and visual reports to communicate the results effectively.

     4. Technical Skills Evaluation

  • Demonstrate proficiency in data handling and analysis using learned tools and techniques.

  • Define a potential role within the team based on the project's outcomes.

Stakeholders

  • Fillex SA | Data Team

    • Role: They act as mentors and supervisors of the project, reviewing each stage of the analysis to ensure the quality and accuracy of the results. They are also responsible for providing feedback to the intern.

    • Interest: Obtain a well-structured and documented project that reflects the intern's technical skills and knowledge of the processes involved. This project could serve as a reference to define the intern's role within the team and organization and highlight their strengths and weaknesses.

  • Fillex SA | Intern Simón Zanetti

    • Role: Execute the technical tasks and effectively document the analysis. Act as a bridge between the raw data and the final insights, working with the learned tools and techniques.

    • Interest: Demonstrate skills in data cleaning, exploratory analysis, and visualization while gaining practical experience in a realistic project. Documenting the process is also crucial to measuring professional growth and establishing a potential role within the team.

  • IT Community

    • Role: Indirect beneficiaries of the analysis, as the insights and findings can add extra value to the open results of the Salary Survey.

    • Interest: Access new insights that can help IT professionals make informed decisions about their careers, salaries, and working conditions. This also highlights the importance of the survey as a transparency tool in the industry.

Data

Data Specifications

Dataset Dictionary

 

 

Tools

We’ll be handling the analysis and data cleanup for Sysarmy’s 2023.2 Salary Survey using a mix of Python tools and libraries. All tasks will be performed in an interactive Jupyter Notebook environment, which allows us to document the analysis process in an organized and reproducible way.

Below are the main tools and libraries used:

  • re: Used to apply cleaning patterns through regular expressions, making it easier to work with text data and tricky formats.

  • warnings: Employed to filter out unnecessary warning messages, especially those generated by visualizations in matplotlib, ensuring a cleaner working environment.

  • pandas: The core library for data manipulation. It allows for efficient data transformation, cleaning, and preliminary analysis.

  • seaborn: A key tool for data visualization. It helps identify data distribution and patterns, guiding decisions during cleaning.

  • matplotlib: Complements seaborn by offering greater control and customization over graphical visualizations.

  • scipy: Essential for performing statistical analysis during data cleaning, ensuring that adjustments are based on solid foundations.

  • display: Enhances the presentation of datasets in the notebook, allowing them to visualize the changes made in the analysis.

These tools ensure an efficient and reproducible workflow, playing a crucial role in the analysis and cleaning process, allowing the project to be approached with technical rigor and resulting in reliable, clearly visualized outcomes.

​​

Methodology

The project follows a structured methodology, divided into two main stages: data cleaning and exploratory data analysis (EDA). These stages are organized into separate working files to ensure traceability and modularity.

Data Cleaning
Cleaning is essential to guarantee the quality and reliability of the analysis results. The main steps include:

  • Renaming Columns:
    The dataset columns will be renamed using a clear, descriptive, and consistent naming convention, reflecting the content and purpose of each one.

  • Reordering Columns:
    Columns will be reorganized to prioritize the most relevant information, facilitating the understanding and analysis of the dataset.

  • Removing Unnecessary Columns:
    Identification and removal of columns that do not contribute to the analysis, such as those with redundant or irrelevant information.

  • Handling Missing Values (NaNs):
    The missing values will be handled according to the context. Some were 
    modified in reasonable cases, and others were removed.

  • Reviewing and Removing Duplicates:
    Verification of duplicate entries and their removal to preserve the dataset's integrity.

  • Normalizing Values:
    Data is standardized and transformed as needed to ensure consistency. For example, salaries are converted to a common currency, and date and time formats are adjusted.

Exploratory Data Analysis (EDA)
In this stage, the clean dataset generated in the previous phase will be used to extract meaningful insights that provide a better understanding of the IT market dynamics reflected in the survey.

This analysis will be conducted in a separate notebook, organized to facilitate the interpretation and documentation of results.

The main EDA approaches will include:

  • Distribution of Key Variables:
    Variables such as salaries, years of experience, age, and educational level will be analyzed to understand their distribution and range.
    Graphs such as histograms, box plots, and violin plots will be used to identify possible outliers and general patterns.

  • Relationship Between Demographic and Labor Variables:
    Exploration of the relationship between personal characteristics (gender, age, location) and work variables (role, contract type, salary).
    Visualizations like scatter plots, heat maps, and stacked bar charts will be used to detect correlations and significant differences.

  • Identifying Relevant Trends and Patterns:
    Analysis of how variables change based on different factors, such as salary evolution according to years of experience or the most in-demand roles in the sector.
    Identification of the most common technologies in specific salary ranges or roles.

  • Evaluating Inequalities:
    Study of possible salary gaps by gender and region.
    Analysis of disparities in working conditions, such as access to benefits depending on location or company type.
    Visual representation of these inequalities through comparative graphs.

The decision to divide the project into two notebooks — cleaning and analysis — stems from the need for modularity, which not only simplifies project maintenance but also enables the reuse of clean data for future analyses. Additionally, this structure allows for clearer documentation of each stage, which is key for both evaluating the intern's performance and ensuring the process's reproducibility.

Results

Summary

The analysis of the Sysarmy 2023.2 Salary Survey provided several important insights into the IT market in Argentina:

  • Cerca del 50% de los encuestados viven en la Ciudad Autonoma de Buenos Aires y ​otro 20% lo hace en la provincia de Buenos Aires, por lo que 7 de cada 10 encuestados se concentran en esa region.

  • La edad de los encuestados se ubica entre los 18 y los 55 años, concentrando a la mayoria dentro del rango de los 28 a los 39 años, con una media de 33 años.

  • 75% de los encuestados son hombres y 18% son mujeres. Solo un 2% de los encuestados no se sienten representados con

  • La mayoria de los encuestados estan conformes con su lugar de trabajo​

Salary Insights

Tools Insights

Studies Insights

75% de los encuestados son hombres​, lo que coincide con las estadisticas del sector, compuesto mayoritariamente por hombres, aunque año a año la inclusion de mujeres a los puestos IT se ve fortalecida. Se decidio incluir a las minorias sexuales que no se sentian identificadas con los generos bajo la etiqueta 'Otros'.

Tools

Specification
Details
Shape
5805 rows x 43 columns
Format
CSV
Size
3 MB
Codification
UTF-8
Data Range
01/07/23 - 31/12/2023
Geographical Location
Argentina
Licence
Public Domain
Source
https://github.com/simonzanetti/2023.2-SysArmy-IT-Salaries-Survey/blob/main/dataset.csv
Column Name
Type
Example
Estoy trabajando en
object
Argentina
Dónde estás trabajando
object
Chaco
Dedicación
object
Full-Time
Tipo de contrato
object
Contractor
Último salario mensual o retiro BRUTO (en tu moneda local)
float64
345000
Último salario mensual o retiro NETO (en tu moneda local)
float64
330000
Pagos en dólares
object
Cobro todo el salario en dólares
Si tu sueldo está dolarizado ?Cuál fue el último valor del dólar que tomaron?
object
490
Recibís algún tipo de bono
object
No
A que está atado el bono
object
No recibo bono
Tuviste actualizaciones de tus ingresos laborales durante 2023?
object
No
De que % fue el ajuste total acumulado?
float64
0
En que mes fue el último ajuste?
object
No tuve
Cómo considerás que están tus ingresos laborales comparados con el semestre anterior
int64
3
Contás con beneficios adicionales?
object
Capacitaciones y/o cursos, Clases de idiomas, ...
Qué tan conforme estás con tus ingresos laborales?
int64
3
Trabajo de
object
Developer
Años de experiencia
float64
1
Antiguedad en la empresa actual
float64
2
Tiempo en el puesto actual
float64
1
Cuántas personas a cargo tenés?
int64
0
Plataformas que utilizas en tu puesto actual
object
Docker, Linux
Lenguajes de programación o tecnologías que utilices en tu puesto actual
object
PHP
Frameworks, herramientas y librerías que utilices en tu puesto actual
object
Laravel
Bases de datos
object
MySQL
QA / Testing
object
PHPUnit
Cantidad de personas en tu organización
object
De 11 a 50 personas
Modalidad de trabajo
object
100% remoto
Si trabajós bajo un esquema híbrido Cuántos días a la semana vas a la oficina?
int64
0
La recomendás como un buen lugar para trabajar?
int64
9
Qué tanto estás usando Copilot, ChatGPT u otras herramientas de IA para tu trabajo?
int64
4
Salir o seguir contestando?
object
Responder sobre mis estudios
Máximo nivel de estudios
object
Secundario
Estado
object
Completo
Carrera
object
Ingeniería en Sistemas de Información
Institución educativa
object
UTN - Universidad Tecnológica Nacional
Salir o seguir contestando sobre las guardias?
object
Responder sobre guardias
Tenés guardias?
object
No
Cuánto cobrás por guardia
float64
0
Aclará el número que ingresaste en el campo anterior
object
Porcentaje de mi sueldo bruto
Salir o seguir contestando sobre estudios?
object
Terminar encuesta
Tengo (edad)
int64
25
Me identifico (género)
object
Varón Cis
sdfsdsafas.png
sdfsdsafas.png
image (1).png
bottom of page