I’m a Fulbright Scholar at the University of Southern California pursuing my Master’s in Computer Science, where I work with Prof. Nanyun Peng and Prof. Emilio Ferrara at the USC Information Sciences Institute. My work primarily focuses on leveraging unsupervised and transfer learning for natural language processing and computational social science in low-resource settings. I also serve as President of GRIDS, USC’s student data science organization.

Previously, I was a research assistant at the Center for Language Engineering with Prof. Sarmad Hussain, where I worked on improving language processing of Urdu, Pakistan’s national language, in order to unlock information for the masses and bridge the digital divide. I received my Bachelor’s degree in Computer Science from the University of Engineering and Technology, Lahore, in 2016. My senior thesis was on the use of deep learning techniques for facial keypoint detection, and my adviser was Prof. Haroon Babri.

My research interests are natural language processing, machine learning, and deep learning.

News

I will be joining USC in Fall 2018 as a Fulbright Scholar!

My paper on Urdu Word Embeddings was published at LREC 2018. PDF.

I gave a talk and tutorial on deep learning with Python at PyCon Pakistan 2017. Slides.

I gave a talk titled “Improving NLP for Low-resource Languages Using Word Embeddings” at the University of San Francisco’s Seminar Series in Analytics. Link.

My work on training Urdu word embeddings was mentioned in a Forbes story on the impact of democratizing artificial intelligence on the developing world. Link.

I received a mention in Andrew Ng’s #LearnML challenge on the Coursera Blog. Link.

Publications

Urdu Word Embeddings, Samar Haider, International Language Resources and Evaluation Conference (LREC), 2018.

Projects

Convolutional Neural Networks for Facial Keypoint Detection

Used deep learning to predict facial keypoints as a building block for applications like face tracking and expression recognition. Took on the Kaggle Facial Keypoint Detection challenge, experimenting with a variety of network architectures and hyperparameters, plus tricks like dropout, batch normalization, and data augmentation, to improve performance.
[Code]

Data Science for Soccer

Used machine learning to predict match outcomes, learn association rules between half-time and full-time results, and guess players’ starting positions from a decade of English Premier League match statistics, betting odds, and in-game events.
[Dataset][Report]

Automatic License Plate Recognition

Built an image processing system to localize license plates from natural images of a range of vehicles in different lighting conditions and recognize their plate numbers.
[Code][Report]

Twitter Sentiment Analysis

Parsed real-time data from the Twitter global feed and performed elementary sentiment analysis on English tweets using the AFINN-111 word-sentiment lexicon.
[Code]

Teaching

Teaching Assistant for:

  • CS375 Data Mining, Fall 2016
  • CSE473 Digital Image Processing, Spring 2016, 2017
  • CS630 Pattern Recognition, Spring 2016