Molecular Pathway Generator

Sunday, November 6, 2022

OChemdle

In light of the recent popularity of games such as Wordle and its offshoots (Worldle, Octordle, Semantle, Redactle, etc), a conversation began among a number of my friends concerning what our own idiosyncratic *tle/*dle games might be. One friend suggested, for me specifically, Ochemdle. The wheels started turning. In my mind I knew that the closest existing game for what I envisioned Ochemdle to resemble would be Semantle. That is, the basic game play would involve a user entering the IUPAC name of an organic molecule supported by the interface (as opposed to a word) and being given a similarity score in response. This process would then repeat until the user guessed the exact mystery organic molecule.

Developing the first draft of OChemdle was rather straightforward. I figured the game would be easier and more interesting if after the used type the guessed IUPAC name, a visual representation of that molecule was displayed, so I utilized the existing molecule drawing engine to make this feature possible. One friend suggested I make this feature update in real time while the name was typed, so that too was implemented. Coming up with an algorithm for how the similarity score is calculated took a bit of trial and error. I won't divulge the entire details of how this works for sake of not providing an advantage, but essentially the calculation looks at the functional groups and atoms present in both the user's guess and the secret organic molecule to determine the similarity. https://organicchemmaster.com/chemdle The final task for the first draft involved coming up with a list of secret organic molecules.

After coming up with a basic prototype for how the game would work, I decided to publicize it a bit to see if I could get some enthusiasm or at least feedback. And I did get some. https://www.reddit.com/r/OrganicChemistry/comments/v267vg/ochemdle/

One immediate suggestion was adding hints to the game. The ability to view both the degrees of unsaturation of the molecule and the chemical formula of the molecule was added. A leaderboard was also added. And finally, the suggestion of running an entire "OChemdle tournament" was implemented. Thanks to https://www.reddit.com/user/Bubzoluck for these suggestions!

The results of the OChemdle tournament can be viewed here: https://www.reddit.com/r/OChemdle/comments/ynkmwx/tournament_winner_and_wrapup/

Future Considerations: Will there be another OChemdle tournament? Any suggestions for future themes? Further feedback on OChemdle?

Wednesday, September 29, 2021

Introduction of pinch zoom controls for small screen devices

Release 4.4.1

A very quick update. Per feedback from various sources, I decided it was time to add pinch zoom functionality to the small screen version of the app. Not only did people directly comment that this would be a more intuitive approach, I had also observed people naturally attempting the pinch gesture to zoom in and out of a molecule while using a phone to view the app.

Some tweaking was done to modify the sensitivity of reacting to the pinch gesture as well as to occlude any panning activity from being performed simultaneously. An update was also made to not automatically re-center the molecule upon a zoom in or zoom out. Finally, the plus and minus icons were removed from the small screen version of the app as they are no longer needed for the zoom functionality. This removal has allowed more precious small screen real estate to be used.

Controls: A pinch in gesture will now cause a zoom in effect on the molecule. Similarly, a pinch out gesture will cause a zoom out. This was primarily designed to be effective on small screen devices, but other devices that register touch events, such as a laptop with a touch screen, will also support this control.

Future Considerations: Controls to rotate the molecule are being considered as well.

Saturday, August 28, 2021

Support for diesters, Benzoin condensation reaction, and Dieckmann condensation reaction

Release 4.4.0

For this update, as a change of pace, attention was turned towards the educational application of the site, namely the modeling of two new reactions: the Benzoin condensation reaction and the Dieckmann condensation reaction. Inspiration for modeling these reactions came directly from an organic chemist and lecturer from the region of West Bengal in India! I am actually very excited about this request and highly encourage other chemists and chemistry teachers to provide similar requests!

Benzoin Condensation Reaction: Modeling of the Benzoin condensation reaction was straightforward. Some work was done to verify that both the interface and search engine could support the molcule benzoin, but after that the work done was primarily to introduce the rules used for the reaction. An example search for a pathway from benzaldehyde to benzoin can be found here: https://www.organicchemmaster.com/Molgen/Reaction/benzaldehyde/benzoin?options=Calc,Reac

Dieckmann condensation reaction: Modeling of the Dieckmann condensation reaction proved more difficult, primarily because support for diesters (and similarly diethers) was a prerequisite. The molecule chosen to model for the reaction was ethyl,methyl hexanedioate. The most difficult implementation challenge of adding support for diesters was to invent a means to distinguish between the two ester radicals, namely the ethyl and methyl radicals, in the model. And to which Oxygen the two radicals were attached. Otherwise, the modeling for the reaction itself was relatively straightforward, though some special care was taken when converting the molecule to a cycloalkane.

As a product of the Dieckmann condensation reaction applied to ethyl,methyl hexanedioate is methyl (1S)-2-oxocyclopentane-1-carboxylate, it was also necessary to introduce support in the interface and search engine for cycloalkanes with an attached ester. This required updates to the nomenclature engine.

An example search for a pathway involving Dieckmann condensation can be found here: https://tinyurl.com/5cet2mpv.

Standards: The usual IUPAC naming standards were followed. In particular, nomenclature support for cycloalkanes with ester side chains was added, e.g., carboxylate and benzoate.

Controls: No new controls were added for this update. However, an auto-scaling feature was added to display the proper size of more complex molecules in the pathways view.

Future Considerations: Hopefully there will be requests for more reactions and more complex molecules from chemists and lecturers in the future!

Saturday, November 28, 2020

COVID-19: II. Remdesivir

Release 4.3.0

With hydroxychloroquine no longer a candidate for treating COVID-19, it was time to turn attention towards modeling the more likely helpful, yet more complex molecule, Remdesivir. This proved to be a daunting task. Rather than attempting to model Remdesivir entirely at once, the approach was taken to divide the molecule into four separate moieties. After modeling each of the four moeities indivdually, the plan would then be to combine the four moieties into the larger final molecule, Remdesivir.

More concretely, the steps taken for modeling were the following:

Model the most complex of the four moieties, the fused ring structure containing a pyrrole azine fusion.
Model the second most complex of the moieties, the structure containing the furan ring to which the pyrrole azine is attached.
Model the phosphoryl group to which the furan is attached.
Model the structure containing the ester linkage to which the phosphoryl group is attached.
Once all four groups were properly modeled individually, model all three combinations of two adjacent groups connected to each other. That is: the pyrrole azine and furan ring group, the furan ring and phosophoryl group, and finally the phosphoryl group and the ester containing group.
Once the three combinations of adjacent groups had been properly modeled, model the two combinations of three adjacent groups. That is: the pyrolle azine, furan ring, and phosphoryl group combination; and the furan ring, phosphoryl group, and ester containing group combination.
Finally, model all four individual groups as attached to each other thus forming Remdesivir.

While steps 5 and 6 were not explicitly necessary for joining the individual four moeities together to synthesize the overall model of Remdesivir, they did serve as very useful test cases. A bit more in depth on each of the four moeities follows:

Pyrrole Azine moeity: Initially, the nomenclature of this moeity was beyond the scope of my organic chemistry knoweldge. As such, my first step was to personally study a guide for fused-ring arenes and heterocycles. Once I felt confident enough, I went ahead and created the nomenclature logic, which is as follows:

When checking the locant numbering for a bicyclo fused ring, check if the ring is napthalene. If not, proceed to step 2.
Check if the ring is aromatic. If so, proceed to step 3
Determine the name of the components of the fused ring (each indvidual ring).

Determine the main component (larger bridge length) and side component (smaller bridge length).
Name the main and side components
Generate the fusion numbering (following the format ([matching locant 1 of main component, matching locant 2 of main component - matching face of side component]) where the matching locants and faces are the two atoms that are found in both components
Generate full fusion name of both components including fusion numbering

Recheck the locant numbering for the newly created fused ring using proper fused ring locant rules.
Name the entire fused ring using the full fusion name as the name of the primary skeleton (thus ignoring all heteroatoms and pi bonds in the fused ring as they have already been accounted for). Normal rules for naming primary and auxiliary functional groups as well as radical locants apply.

Furan Ring moeity: This moeity was certainly less complex that the previous one. The main challenge was to appropriate the nomenclature specific for furan molecules. Particularly, detecting if the ring is of the furan family first, and then determining how many of the normal two double bonds were saturated and applying the locants for the hydrated Carbons appropriately. One other challenge was properly handling side skeletons when determing stereochemistry of each Carbon in the tetrahydrofuran.

Phosphoryl Group moeity: The phosphoryl group moeity was even less complex still than the furan ring, but it still had one tricky part, namely the fact that the primary skeleton contained zero Carbon atoms. This challenge was overcome by recognizing a phosphoryl component via its length of one (a Phosphorous atom) and the attachments of dual hydoxy groups and one carbonyl group. Once this detection was acccomplished, all attachments could be named as normal following the (attachment 1 name - attachment 2 name)phosphoryl convention. Specifically, an extra methane was used while modelling this group to be able to name the phosphoryl group properly as a radical.

Propanoate Ester moeity: Finally, and the least complex of all moeities was the propanoate ester group. Support for this group had actually already been entirely in place, although there was room for further ester group testing.

Once all four moeities had been modelled and named properly, it was time to begin the synthesis of modelling the three combinations of two adjacent groups. The combinations in more depth as follows:

Pyrrole Azine and Furan Ring: Certainly the most complicated of the three combinations of two adjacent groups. The first challenge was to provide the user with a convient way to add a skeleton attachment at a SPECIFIC location of the attachment to the existing part of the molecule in the interface. The impetus for this interface enhancement actually BEGAN with the modeling of chloroquine, but was delayed for the time being as it was not necessary for the user to create chloroquine in the interface. As the pyrrole azine ring was attached to the furan ring specifically at its number 7 locant, this combination NECESSITATED the creation of such an enhancement.

After hashing out a few different ways of specifying which atom of the new skeletal attachment should be attached to the target atom of the existing molecule in the workspace, I decided to go with handling a a new event. The user now has two options when adding a skeletal attachment to the molecule:

The existing way. That is, clicking on the attachment and dragging it to a specific target atom on the molecule. By default, this will attach the atom numbered 1 of the new attachment.
If the user instead clicks and HOLDS on the attachment for one second (a long press event), the attachment will then be expanded and the user will be able to click on which specific atom of the new skeletal attachment they want to attach to the target atom of the existing molecule. The user can then drag the skeletal attachment as usual to a target atom on the molecule.

This interface enhancement will allow creation of Remdesivir and also allow easier creation of cholroquine.

The only other challenge at this step was creating and running test cases to ensure that the stereochemistry still works properly with a radical attached not at the number one locant of the radical.

Furan Ring and Phosphoryl Group: Again, methane was used as the primary skeleton to which the phosphoryl group was attached for sake of only needing to develop nomenclature for the phosphoryl group. This combination was rather straight forwards to model and test. The one tricky part was implenting proper use of enclosing characters (parentheses, brackets, braces) for nested and complex enough side chains. The convention used was, from outer most enclosing characters to inner most: braces, brackets, parenthesis. The reader who is also a coder might appreciate the importance in separating nomenclature demarcations from coding symbols!

Phosphoryl Group and Propanoate Ester Group: Fortunately, the work in modeling a phosphoryl radical with a primary skeleton of methane proved useful in this step. Otherwise, the one tricky part was handling the nomenclature convention of treating the phosphoryl radical attached to the amino group as phosphoryl)amino as opposed to N-phosphoryl-2-aminopropanoate. This was essentially handled with a special case for when such a group occurs. This case may be more generalized in the future.

And with those three combinations modeled and tested, it was time to turn our attention towards the two combinations of three adjacent moeitieis attached. The two combinations in more depth:

Pyrrole Azine and Furan Rings and Phosphoryl Group: The challenges for joining these three groups together were rather straightforward. One involved testing the need for doubly nested side chain enclosing characters. A number of test cases were developed to aid in getting this correct. The other challenge, while still straightforward, was rather tedious: verifying the proper stereochemistry of all the atoms in the furan ring with the complexity of the larger molecule. Many test cases and some very scrupulous debuging was required. Both for the interface and the search engine.

Furan Ring, Phosphoryl Group, and Propanoate Ester Group: VERY fortunately, modeling these three groups worked immediately without the need for any additional code updates.

Remdesivir: With all the pieces in place, as well as all the pieces of all the pieces in place, it was now time to combine all four individual moeities at once into the larger, final molecule, Remdesivir. Also like the previous step, modeling Remdesivir worked immediately without the need for any additional code updates. The one decision made was, since we now have TRIPLY nested side chains, to use braces again to enclose a side chain which contains braces already. This convention may change in the future, but it does not introduce any ambiguities in the full IUPAC name.

And with Remdesivir fully modeled, this update has been officially finished.

Standards: Existing IUPAC naming conventions were followed as usual. In particular, the fused ring nomenclature including naming of primary and side components as well as fusion numbering and ring numbering after the fusion naming used the following article: Rasmussen, S.C. The nomenclature of fused-ring arenes and heterocycles: a guide to an increasingly important dialect of organic chemistry. ChemTexts 2, 16 (2016).

The order of enclosing demarcations followed was from outer most side chain to inner most side chain: {}, [], () with braces being used to handle nested side chains beyond three levels.

The convention for naming a phosphoryl group attached to an amino group were followed per the PubChem article on Remdesivir.

Controls: The main enhancement for this update was to allow the user to specify which atom of a new alkane chain attachment to attach to the existing molecule. This was accomplished by introducing a long press event to the alkane chain attachments. The user will first press and hold on an alkane chain for one second which will cause that alkane chain to be zoomed in on. Next, the user will drag the mouse over the atom they wish to attach to the existing molecule. Finally, the user will drag the new alkane chain over the existing molecule. If the long press event is not triggered, by default the first atom of the new alkane chain will be attached to the existing molecule.

Future Considerations: Well the FOREMOST question to ask is will Remdesivir continue to be used in treatment for Covid-19 symptoms. And if so, in what way can this site most specifically aid in production of Remdesivir. The first idea I have to continue down this path is to fully model a syntehsis pathway of the drug, as was modeled for pyrimethamine. This will hopefully aid in the detection of any future more efficient or cheaper production models.

Otherwise, with the increasing complexity of molecules being modeled, it's clear the zoom out automatic detection need to be improved.

Some more accurate zooming functionality for an alkane side chain attachment after the long press event would be helpful. A tutorial update would also be useful for users new to this task.

Finally, implementing a rotating clockwise and counterclowise set of buttons would be useful for examing certain parts of the more complex molecules. Work has actually already begun on this enhancement.

Tuesday, November 10, 2020

COVID-19: I. Chloroquine

2020 has been an unprecedented and disorienting year for everyone. To be honest, I had to look back through my notes to really put myself in pre-Covid frame of mind to make a reasonable transition for this update. What were the goals, concerns and hopes for the site back in February 2020? And after the refreshing from my search, I did remember that a recent objective the pathway search engine had accomplished was to independently discover a synthesis pathway for Daraprim (pyrimethamine). And as always, finding ways to improve the interface and make it more user friendly was a high priority.

But when the world changed in mid-March, I decided that I would spend as much energy as I had for the site to see if I could possibly contribute to the fight against Covid-19. I knew that it might be a long shot, and of course any work here does not merit comparison with that of our front line and essential workers, but I did want to see if there is any part the site could play in helping the world solve the pandemic.

The first idea I came up with was to model one of the most promising drugs for treatment of Covid-19. In April, I considered modeling either Remdesivir or Chloroquine. Looking at the chemical structure of the two, I considered the modeling of Chloroquine far more feasible. In fact, some of the moieties of the Remdesivir structure I had not yet acquired the chemical knowledge to model or even properly name. And at the time, Hydroxychloroquine was legitimately being considered as an effective treatment.

The first step to implement support for Chloroquine was to look at the base fused ring component of the molecule. Fortunately, support in the interface was already in place for bicyclo[4.4.0]decane, so support only had to be added for the aromatic version of the fused ring, napthalene, and then afterwards the more specific version quinoline. Support for these two mainly involved updating the IUPAC naming engine.

Next, support for tertiary amines needed to be added to handle the N,N-diethylpentan-2-yl side chain protruding from the amino group located at locant number 5 of the quinoline. This adjustment to the interface proved to be straight forward as well. I did make a mental note at the time that it would be MUCH more efficient to allow the user to select which carbon of an alkane chain addition they wished to attach to the current molecule; the process at the time of adding a pentan-2-yl side chain involved first attaching a butyl and then attaching a methyl to the head of the butyl.

Finally, I took a look at the resulting chloroquine molecule and thought to myself hmm, this is getting pretty convoluted and messy. And as the site is also optimized to work on a small screen device, cleaning it up became even more of a priority. I decided to implement a mode by which the user could view the molecule in a line structure format: where carbons are represented by a point and other atoms by their chemical symbol. After this clean up optimization, the resulting chloroquine molecule is easier to view and interact with.

Standards: Existing IUPAC naming conventions were again followed in the modeling of chloroquine. Specifically, once the interface recognized that the bicyclo[4.4.0]decane skeleton was aromatic, it named it as napthalene. Furthermore, once it recognized a napthalene with a nitrogen heteroatom at the 1 locant, it named it quinoline. There was some trial and error involved to ensure proper stereochemistry naming resulted.

Numbering of locants for the napthalene and quinoline molecules follows the rules per Organic Nomenclature.

Controls: The "View Atom Abbreviation Mode" toggle was added to the control buttons. This allows the user to toggle between viewing full ball and stick molecule respresentations and line structure representations of the molecules.

Future Considerations: As mentioned previously, implementing support for chloroquine made it clear that the interface would be much more effective if the user could select which carbon of an alkane chain would be attached to the existing molecule when adding a chain. This would allow the user to select the second carbon of pentane when wishing to add the radical pentan-2-yl.

AND as most of us with some knowledge of the life sciences are aware, unfortunately hydroxychloroquine proved to NOT be effective as a treament for COVID-19. Nevertheless, I took the enhancements of the interface and search engine provided from modeling chloroquine as valuable gains for the site, and turned my attention to the drug more promising at the time: Remdesivir.

Tuesday, February 4, 2020

Quick interface update per user feedback

Release 4.2.1

A thanks to user J.G. who wrote: "'Im a software engineer, not a chemist. Approaching this website with just memory from a basic college chemistry class many years ago, so "helpful/not helpful" is more like "what parts felt natural to use/easy to understand". That said, this is so well done that I actually opted into a survey about a website. Great software. Sorry I'm not proficient enough in the subject to offer much suggestion, but a point that slowed me down starting to try to make a molecule was that I saw carbon underneath the editor and first could not figure out how to drag that in to start (rather than starting with a skeleton on the left. Tutorial cleared that right up, though. If I have to start with a skeleton, perhaps hide the "Additions" section or make it look visibly disabled until a skeleton is used."

I hope you don't mind me sharing your review! Per feedback, the additions panel now actually IS visibly disabled until the molecule has been created by first giving it a primary skeleton. This feedback is EXACTLY what we're looking for. Keep it coming!

Saturday, February 1, 2020

Support for molecules with cycloalkane side chains, nested side chains, side chains with ether attachments, side chains attached to the parent skeleton with pi bonds and with atoms other than the first Carbon in the side chain, and introduction of support for the reactions used in the synthesis of pyrimethamine

Release 4.2.0

Perhaps the longest title of any entry thus far in the organic chem master blog. This update introduces support for, in general, molecules with more complex side chains. The main impetus for this update was actually to show the potential to take the power to unjustifiably and drastically raise prices for drugs like Daraprim away from greedy biotechnology CEOs like Martin Shkreli. Which of course is a very tall task, in no small part due to chemical patent restrictions, but hopefully the pathway synthesis search engine support added in this update will show a step in that direction and that one day the tool will be able to provide alternative synthesis pathways for important life saving medicines.

With that overarching goal in mind, the particular goal of this update was to empower the search engine to independently discover the same synthesis pathway to produce the drug Daraprim (pyrimethamine) that high school students in Sydney did in 2016. This pathway can be viewed in the image here: https://en.wikipedia.org/wiki/Pyrimethamine#/media/File:Pyrimethamine_traditional_synthesis.png .

The first step in achieving the discovery was to ensure that both the intermediate molecules involved in the synthesis and of course pyrimethamine itself were supported in both the interface and the search engine. The starting molecule, 1-chloro-4-(2-cyanoethyl)benzene, was actually already supported. The next intermediate molecule, 1-chloro-4-((2Z)-1-cyano-3-hydroxypent-2-en-2-yl)benzene, required adding support for side chains (in this case the 1-cyano-3-hydroxypent-2-en-2-yl radical) that were NOT attached to their parent chain at the first Carbon of the side chain. The proceeding intermediate molecule, the etherificated 1-chloro-4-((2Z)-1-cyano-3-methoxypent-2-en-2-yl)benzene, required adding support for side chains containing ethers. This of course leads to the concept of nested side chains! That is, the parent skeleton of a molecule can contain a side skeleton that itself contains a side skeleton. This was previously not allowed in the interface nor the search engine to keep the modeling simpler.

And, finally, support for the molecule pyrimethamine itself, or as know by its IUPAC name 5-(4-chlorophenyl)-6-ethylpyrimidine-2,4-diamine. Support for this molecule specifically required adding support for side chains that are cycloalkanes which in turn required the introduction of an algorithm to determine which of two attached cycloalkanes should function as the primary skeleton. In particular, should the pyrimidine ring be considered the primary skeleton of the molecule or should the chlorobenzene ring be considered the primary skeleton.

As a side note, support was also added for molecules containing side skeletons bonded to the primary skeleton with a pi bond, such as propylidenecyclohexane.

After support for ALL intermediate molecules and the product was added, it was time to add support for the reactions. Support for the following three reactions was added to the pathway search engine: Ethyl propionate condensation, Diazomethane etherification, and Guanidine condensation.

And once all modifications were in place, the search engine was able to successfully "rediscover" the synthesis pathway of Daraprim.

Standards: Per usual, IUPAC naming rules were followed. In particular, the style for nomenclature used for radicals with a Carbon atom with a locant other than 1 attached to the parent skeleton was to use the locant followed by "-yl or -ylidine" as in (propan-2-yl)cyclohexane. The radical prefix "ylidine" was used to indicate the radical was attached to the parent via a double bond. The condensation reactions were modeled after the wikpedia article, employing the strong deactivation properties of the cyano group. The etherification via diazomethane reaction was also modeled after the wikipedia article.

Controls: No new controls were introduced. The user can still create the molecules via the molecule design tool or entering the IUPAC name in the interface and click the beaker icon to perform a synthesis pathway search.

Future Considerations: Hopefully even more power can be added to the pathway search engine via support for more complex molecules, more reactions, and more efficient search techniques in the future.