- Ability to design and implement workflows of Linear and Logistic Regression, Ensemble Models (Random Forest, Boosting) using R/Python
- Demonstrable competency in Probability and Statistics, ability to use ideas of Data Distributions, Hypothesis Testing and other Statistical Tests.
- Must have experience in dealing with outliers, denoising data and handling the impact of pandemic like situations.
- Should be able to perform EDA of raw data & feature engineering wherever applicable
- Demonstrable competency in Data Visualisation using the Python/R Data Science Stack.
- Should be able to leverage cloud platforms for training and deploying large scale solutions.
- Should be able to train and evaluate ML model using various machine learning and deep learning algorithm.
- Retrain and maintain model accuracy in deployment.
- Should be able to package & deploy large scale models on on-premise systems using multiple approaches including docker.
- Should be able to take complete ownership of the assigned project
- Experience of working in Agile environments
- Well versed with JIRA or equivalent project tracking tool
- Knowledge of cloud platforms (AWS, Azure and GCP)
- Exposure to No SQL databases (MongoDB, Cassandra, Cosmos DB, HBase)
- Forecasting experience in products like SAP, Oracle, Power BI, Qlik, etc.
- Proficiency in Excel (Power Pivot, Power Query, Macros, Charts)
- An experience with large data sets and distributed computing (Hive/Hadoop/Spark)
- Transfer learning using state of art models in different spaces – vision, NLP and speech. – Integration with external services and Cloud API.
- Working with data annotation approaches and tools for text, images and videos