7B Mountain Ave Bar Harbor ME 04609 * m @ sapir.us * http://sapir.us
Problem analysis and formalization, work with domain experts
Development of algorithms for classification, ranking, survival analysis
Statistical data analysis, data mining, discovery of patterns and outliers in data
Design of predictive features
Formulation and technical leadership of research projects
Scientific programming, creation of visualization software and GUI
Key words: machine learning, bioinformatics, statistics, data analysis and data mining, scientific software and algorithm development, programming with C++, MatLab, SAS, R, MSAccess, Photoshop
Machine learning research. Creation of new methods and software for risk modeling, survival analysis and prediction for small samples / high dimensional data. Submitted SBIR grant application. Work as an independent contractor: application of statistical analysis in financial problems.
Machine Learning, R: Discovery of the rules to distinguish toxic and non-toxic compounds. Built efficient predictive models, designed new features.
The startup company successfully predicts of prostate cancer development using biomarkers. I contributed to the company success in the next areas:
Machine Learning:
Invented an unsupervised ranking method for survival analysis which consistently outperforms alternatives
Developed formal and expert-based approaches for feature selection
Enhanced rule-based classification and ranking methods for heterogeneous data
Data Mining:
Integrating medical knowledge with data analysis, I developed several performance boosting predictive features
Created visualisation software facilitating statistical analysis and data mining
Biomedical Image analysis:
Developed methods for high troughput quantification and localization of biomarkers in IF images, when none was available
Created GUI software in MatLab for assistance in development of new image classification rules.
Development of computational methods for discovery of a new knowledge in medical data. The methods are implemented in the software toolkit LogicMill (VC++). Application of the software on the several benchmark datasets proved its superiority in terms of interpretability and accuracy of the solutions.
Provided assistance in statistical analysis of genomics data, including tutorials for biologists, personal consulting and data analysis. Served as a liaison between the vendors and the users of the bioinformatics software. Developed data preprocessing and interpretation software.
Developed software for computerized drug discovery. The software features molecular libraries analysis and high throughput screening, discovery of patterns of biological activity, search of compounds by similarity. The software was applied in a research project for a major pharmaceutical company. As a result, the effectiveness of search for new leads increased 100 times, comparing with original selective testing.
In collaboration with Dr. Churchill, I developed method for assessment of probability of differential gene expresion for RNA microarrays based on a single array. The idea served as foundation of many further statistical studies in various institutions, including University of Berkley. Designed and developed GUI software for modeling of microarray data. Presented the results on the international conference and made invited talks on this subject in universities and commercial companies.
Ph.D., Computer Science. Dissertation: Discovery of optimal logical rules in data. Institute of Automated Control, Russian Academy of Science, Moscow, Russia
Master of Science., Mathematics, Ural State University, Russia
Residence status: US citizen.
Copyright (c) Marina Sapir, 2010