Job Description
You will be responsible for all the processes, from data collection and preprocessing
to training models and deploying them to production.
➔ Understand the business objectives; design and deploy scalable ML models/ NLP applications to meet those objectives
➔ Use of NLP techniques for text representation, semantic analysis, and information extraction to meet the business objectives efficiently, along with
metrics to measure progress
➔ Extend existing ML libraries and frameworks and use effective text representations to transform natural language into valuable features
➔ Defining and supervising the data collection process, verifying data quality, and employing data augmentation techniques
➔ Defining the pre-processing or feature engineering to be done on a given dataset
➔ Analyze the errors of the model and design strategies to overcome them
➔ Research and implement the suitable algorithms and tools for ML/ NLP tasks
➔ Collaborate with engineering and product development teams
➔ Represent Contify in external ML industry events and publish thought leadership articles
Requirements
➔ Skills and Experience
To succeed in this role, you should possess outstanding skills in statistical analysis, machine learning methods, and text representation techniques.
● Deep understanding of text representation techniques (such as n-grams, bag of words, sentiment analysis, etc.), statistics and classification algorithms
● Hands-on experience in feature extraction techniques for text classification and topic mining
● Knowledge of text analytics with a strong understanding of NLP algorithms and models (GLMs, SVM, PCA, NB, Clustering, DTs) and their underlying computational and probabilistic statistics:
○ Word Embedding like Tfidf, Word2Vec, GLove, FastText, etc.
○ Language models like Bert, GPT, RoBERTa, XLNet
○ Neural networks like RNN, GRU, LSTM, Bi-LSTM
○ Classification algorithms like LinearSVC, SVM, LR
○ XGB, MultinomialNB, etc.
○ Other Algos- PCA, Clustering methods, etc
● Excellent knowledge and experience in NLP packages such as NLTK, Word2Vec, SpaCy, Gensim, Standford CoreNLP, TensorFlow/ PyTorch.
● Experience in setting up supervised & unsupervised learning models, including data cleaning, data analytics, feature creation, model selection & ensemble methods, performance metrics and visualization
● Evaluation Metrics- Root Mean Squared Error, Confusion Matrix, F Score, AUC – ROC, etc
● Understanding of knowledge graph will be a plus
Qualifications
● Education: Bachelors or Masters in Computer Science, Mathematics, Computational Linguistics, or a similar field
● At least 4 years of experience building Machine Learning & NLP solutions over open-source platforms such as SciKit-Learn, Tensorflow, SparkML, etc.
● At least 2 years of experience in designing and developing enterprise-scale NLP solutions in one or more of: Named Entity Recognition, Document
Classification, Feature Extraction, Triplet Extraction, Clustering, Summarization, Topic Modelling, Dialog Systems, Sentiment Analysis
● Self-starter who can see the big picture and prioritize your work to make the most significant impact on the business and customer’s vision and
requirements
● Being a committer or a contributor to an open-source project is a plus