Today’s infertility clinics gather a lot of data, ranging from demographic details about the couples, via treatment protocol details, even down to specific timings of individual embryos. In short, the era of big data has been upon us for quite some time. However, there is one thing that is very important to keep in mind:
It’s not the size of your data that matters, but how you use it.
And in order to be able to use it, you need to get access to your data. This sounds easy in principle but is unfortunately not always so easy in the real world. Most computer systems come with pre-defined reports, yet the raw data is what is needed for good analysis. Especially since not all reports contain the information you need. If you want to compare the culmulative birth rate in patients younger than 37 years between those who followed a freeze all strategy, an eSET strategy or a DET strategy, you should be able to do so. Or if you introduced endometrial scratching to some patients groups and want to see results for this specific group. However, not all reports are tailor made to suit your clinic’s need, nor updated for the latest advances in ART. This is where the importance of access to your raw data comes in to play, as well as knowing how to handle that data.
Another very important reason for having access to your raw data is so that you can validate said data. If you did a single embryo transfer and also froze remaining embryos, did the transfer get recorded as an eSET? If you recorded that the drug used was an GnRH antagonist, did you record that the treatment protocol was antagonist or can there have been a slip in the drop-down menu that made you record it as an agonist cycle? Because if you make decisions based on data, the data needs to be correct.
Granted that you have access to your data, the possibilities are many. Ranging from checking published research (What proportion of our patients fit the study population? Does our data fit published morphokinetic timelines or do we need to find our own?) to finding additional parameters for lowering twin rates (we increased our SET rate, what now?) to identify more patient specific criteria (can we identify which sub-groups of our patients we need to be extra observant about OHSS with?)
Further, with good analysis you can also check and verify the standard reports produced. Take fertilization rates with ICSI for instance. Is that measured based on how many eggs were injected, or based on how many eggs were collected? Or in the case of split IVF/ICSI, how many eggs were selected for ICSI. This is a very important distinction. If you are doing quality control with lab performance indicators (LPI), the fertilization rate should be based on how many you injected. The embryologist should not be held accountable for not being able to fertilize immature eggs. However, it could be a good idea to also check with respect to eggs collected to see if there is a difference in how well each embryologist identifies GV and MI eggs. On the other hand, if you are comparing IVF and ICSI techniques in your clinic, you should look at the fertilization rate per egg collected (or allocated to IVF or ICSI in split cycles). Because that is what matters for the patient. If I had 10 eggs collected but three were discarded before ICSI and five of the remaining seven embryos were fertilized, my fertilization rate was 50%, even if your LPI says the embryologist’s fertilization rate was 71.4% (5/7).
In short, there are many things that can be discovered with good data and good data analysis. More blog posts will follow with regards to some of the possibilities and limitations with data.
One very important last note on raw data that needs to be kept in mind is patient privacy! Your extracted data should not identify the patient. That means that instead of names or personal ID numbers, your extracted data should contain internal registration numbers and treatment ID numbers. That way you can refer back to the patient using the software, but the data itself does not identify the patient. Preferably, there should also be no date of birth, but rather the age at the start of treatment. These identifying parameters need to be in the computer system you use, but they should not be in the extracted data for analysis.