About Marina Sapir, Ph.D.


Marina Sapir Data researcher with wide expertise in machine learning, data analysis, signal processing

Solving customer's machine learning and data analysis problems from economics, finance, ecology, signal processing and other areas of study.

Analysis of mass spectrometry data for medical diagnostics. New algorithms for spectrometry data processing. Machine learning with small datasets and large number of features.

Medical image processing, development of methods and aggregated predictive features for survival analysis with medical applications.

Developed methods and software for computerized drug discovery. Creating commercial application with C++.

Assessment of differential gene expression for RNA microarrays based on a single array. The idea served as foundation of many further statistical studies in various institutions, including University of Berkeley. Designed and developed GUI software for modeling of microarray data. Presented the results on the conference and made invited talks on this subject in universities and commercial companies.

  • Ph.D., Computer Science. Dissertation: Discovery of optimal logical rules in data. Institute of Automated Control, Russian Academy of Science, Moscow, Russia

  • Master of Science., Mathematics, Ural State University, Russia

Consulting Services

Problem formalization

Most of real life problems do not fit easily in machine learning paradigm. We will learn your domain to formulate the task in machine learning terms. Our goal is to provide you with the solution optimized for your decision making process

Development of algorithms

Every problem is unique, so shall be the solutions. The algorithms will be customized or developed a new, as necessary.

Data preprocessing

Quite often, we need to improve, transform data before using them. This includes development of meaningful aggregated predictive features in cooperation with the client.

Statistical analysis

We will find statistical properties of your data, relevant to your tasks

Knowledge discovery

All richness of machine learning methods, including supervised, unsupervised, semi-supervised is used to find the best solution.


The results are presented in intuitive, easy to comprehend ways. If necessary, special ways of visual presentation are developed.

Explanation of the solutions

Clients can count on us to get an explanation of used methods and the results

Software development

We deliver scientific software to implement discovered methods. Most of times, high level languages (as MatLab, R, Python) are used. The code then can be translated into system's language or used as is. In the last case, we build flexible user interface with input and output of the reports.

Let's Work Together!



Sumeet Thadani. Founder of Nudge at Samsung Research USA

"Marina is a seasoned Machine Learning Expert. She's diligent, hard working and innovative. I highly recommend her for difficult Machine Learning problems."

July 3, 2015, Sumeet managed Marina as Independent contractor

James Mentele. Product Development at Predictive Fleet Technologies

"Marina is a fast learner and uses that subject matter knowledge with her mathematical skills to build effective models. Her work involved mathematical analysis of parallel 'noisy' time series with lags to make assessments of system performance. Her processing was efficient and her reports were very intuitive."

April 10, 2014, James was Marina's client

Maxim Tsypin. Senior Scientist at Biodesix

"Marina has a strong background in machine learning and statistical analysis. She shows initiative and creativity in finding and developing custom solutions for practical problems. Marina is eager to learn and apply new knowledge, be it a new computer language, or specifics of a particular biomedical problem".

September 25, 2013, Maxim managed Marina at Biodesix, Inc.

Mikhail Teverovskiy. R&D in Image/Signal Processing, Data Modeling:

"Marina is a talented and creative scientist 100% focusing on a problem. She has a lot of original ideas leading to interesting solutions."

April 2, 2009, Mikhail worked with Marina at Aureon

Olivier Saidi. Partner & Managing Director at CRT Capital Holdings:

"Marina is one of the top Machine Learning scientists I had the pleasure to interact with. Marina has gifted at translating the abstract into the practical and an innate way to constant ingenuity."

March 20, 2009, Olivier managed Marina indirectly at Aureon


  1. M. Sapir (2017) Optimal choice: new machine learning problem and its solution arXiv, :1706.08439.
  2. M. Sapir (2011) Smooth Rank: A Method for Robust Risk Modeling for Smaller Samples. Webmed Central, Biostatistics; 2 (9): WMC002167
  3. M. Sapir (2011) Ensemble Risk Modeling Method for Robust Learning on Scarce Data. arXiv, :1108.2820.
  4. M. Sapir (2011) Bias plus variance decomposition for survival analysis problems. arXiv:1109.5311v1. The work was presented at NIPS2012 (Granada, Spain) as a poster.
  5. M. Sapir, F. M. Khan, Y. Vengrenyuk, G. Fernandez, R. Mesa-Tejada, S. Hamman, M. Teverovskiy, Mi. J. Donovan. (2010) Improved automated Localization and Quantification of Protein Multiplexes via Multispectral Fluorescence Imaging in Heterogeneous Biopsy Samples. ISBI: Biomedical Imaging: From Nano to Macro, 2010 IEEE International Symposium. 157 - 160
  6. M. J. Donovan, F. Khan, G. Fernandez, R. Mesa-Tejada, M. Sapir, V. Bayer Zubek, D. Powell, S. Fogarasi, Y. Vengrenyuk, M. Teverovskiy, M. R. Segal, R. J. Karnes, T. A. Gaffey, C. Busch, M. Haggman, P. Hlavcak, S. J. Freedland, R. T. Vollmer, P. Albertsen, J. Costa, C. Cordon-Cardo. (2009) Personalized Prediction of Tumor Response and Cancer Progression on Prostate Needle Biopsy. Journal of Urology: July, 182(1):125-32.
  7. Teverovskiy M, Vengrenyuk Y, Tabesh A, Sapir M, Fogarasi S, Pang H, Khan F, Hamann S, Capodieci P, Clayton M, Kim R, Fernandez G, Mesa-Tejada R, Donovan M (2008) Automated Localization and Quantification of Protein Multiplexes via Multispectral Fluorescence Imaging. ISBI: Biomedical Imaging: From Nano to Macro, IEEE International Symposium: 200-203.
  8. M. J. Donovan, S. Hamann, M. Clayton, F. Khan, M. Sapir, V. Bayer-Zubek, G. Fernandez, R. Mesa-Tejada, V. Reuter, P. Scardino, C. Cordon-Cardo, (2008). Systems Pathology Approach for the Prediction of Prostate Cancer Progression after Radical Prostatectomy. J Clin Oncol 26(24): 3923-3929
  9. Cordon-Cardo C, Kotsianti, A, Verbel D, Teverovskiy M, Capodieci P, Hamann S, Jeffers Y, Clayton M, Elkhettabi F, Khan F, Sapir M, Bayer V, Vengrenyuk Y, Fogarsi S, Saidi O, Reuter V, Scher H, Kattan M, Bianco F, Wheeler T, Ayala G, Scardino P, Donovan M. (2007) Improved Prediction of Prostate Cancer Recurrence Through Systems Pathology. Journal Clinical Investigation 117:1876-1883
  10. M Sapir, M Teverovskiy (2007) Validity of Probabilistic Rules. CIDM 2007: 6-9
  11. M. Sapir, M. Teverovskiy. (2006) Finding Digest of Rules : Toward Data-Driven Data Mining. IASTED: 360-365
  12. Sapir M, Verbel D, Kotsianti A, Saidi O. (2005) Live Logic: Method for Approximate Knowledge Discovery and Decision Making. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. 10 International Conference. Part I. Lecture Notes in Computer Science 3641: 532-540.
  13. M. Sapir (2004) Formalization of Induction Logic in Biomedical Research. International Symposium on Robotics and Automation. ISRA'2004: 1 - 8.
  14. M. Sapir, G. A. Churchill (2000) Estimating the posterior probability of differential gene expression from microarray data. Poster. Jackson Laboratory.


  1. Methods and systems for feature selection in machine learning based on feature contribution and model fitness. (US Patent 7599893, 2009).
  2. Systems and methods for segmentation and processing of tissue images and feature extraction from same for treating, diagnosing, or predicting medical conditions. Publication number WO2012016242 A3.

Copyright © 1917 Marina Sapir