Friday, January 25, 2019

Introduction of Smart Search feature, fix to bug with the Cumene process reaction, minor stylistic and interface updates, fix for bug of adding a cycloalkane to a smaller linear chain in interface

Release 3.1.0

A pretty exciting update for the search engine with this release! This time, first I'll begin with the interface and search engine bug fixes:

  1. Upon using the Discover feature of the home page, the user will now be greeted with a more friendly message upon selecting a reaction that cannot be applied to the selected molecule.
  2. A minor stylistic fix on the Reactions section was made. For example, see the first reaction of the Calvin cycle, the arrow in the View Reaction panel now looks correct.
  3. Previously, in the interface, the user was unable to design a molecule with a cyclic alkane primary skeleton and a straight-chain alkane attachment (for example propylcyclohexane) by first dragging the straight-chain alkane into the workspace and then adding the cyclic alkane. The user was required to first add the cyclohexane and then add the propyl attachment. Now either order is possible.
  4. A bug was fixed in the search engine for the modeling of the Cumene process reaction. The hydroxylation of benzene now results in the proper search engine modeling of Phenol.

And the exciting part of the update: the introduction of a Smart Search feature for the search engine. This new feature will use a heuristic calculation to guide its pathway search from the origin molecule to the goal molecule. For this release, the heuristic used in the smart search is NOT admissible, that is it will potentially overestimate the cost for the synthesis pathway between any given intermediate molecule and the goal molecule. The synthesis pathway found will thus POSSIBLY be sub-optimal. This was an acceptable trade-off made for the first release/iteration of the smart search. Subsequent releases will use admissible heuristics only to guarantee optimality. I am personally more than happy to explain more of the nature of the heuristic calculation used if you private or direct message me.

A motivating example for introducing a heuristic was the search to find a synthesis pathway from benzene to 2-acetoxybenzoic acid (aspirin) using ONLY the Pathway Calculations search option. This search had actually been previously accomplished utilizing both the Pathway Calculations and the MolGen Reactions search options. However, removing the MolGen Reactions search option (which basically provided a very strong hint for the search engine to begin with converting benzene to phenol), would result in a search timed out. The goal was to find such a synthesis pathway without the strong hint.

As I introduced the smart search option using a heuristic, I actually discovered the aforementioned bug in the modeling of the Cumene process reaction. Figuring it was more essential/urgent to fix the modeling bug, I went ahead and did so before proceeding with implementing the smart search feature. Low and behold, fixing this bug actually resulted in a successful synthesis pathway search from Benzene to Aspirin using ONLY the Pathway Calculations search option! There was no longer a time out issue! Running the following search for a pathway from Benzene to Aspirin using ONLY the pathway calculations search (and NOT the smart search feature) will now successfully find the synthesis pathway.

As I had already begun working on implementing the smart search feature, I went ahead and finished that feature as well. I ran benchmark tests on my local development environment and did indeed find that the search performs faster with the Smart Search feature turned on. Tests can actually be performed on www.organicchemmaster.com as well as the user wishes, comparing the search WITHOUT the Smart Search feature to the search WITH the Smart Seach feature. The user SHOULD see a shorter search time for the latter, but I have less control in performing benchmark tests on the server that hosts www.organicchemmaster.com than I do in my local environment.

Standards: Per usual, IUPAC naming rules were followed. Specifically, in this case the fix to the interface allowing the user to attach a cyclic alkane to a straight-chain alkane results in the properly named molecule: the cyclic alkane being designated the parent skeleton chain and taking naming precedence. Of note, the heuristic used for the Smart Search feature is by design NOT an admissible heuristic for this release/iteration. It will thus NOT necessarily guarantee an optimal synthesis pathway. Turning the feature off will STILL result in an optimal and complete (if there is one) synthesis pathway.

Controls: The only significant update to the controls is adding the new Smart Search feature option. This option can be selected in the "Search" checkbox section under the options popup.

Future Considerations: Obviously, we will eventually want the Smart Search feature to use an admissible heuristic to guarantee optimality of the pathway search. There will be a lot of choices to be taken into consideration to improve the heuristic(s) used in terms of trading off heuristic function calculation time and search time. That said, I am looking forward to utilizing the more power this update provides to the search engine !

OChemdle

In light of the recent popularity of games such as Wordle and its offshoots (Worldle, Octordle, Semantle, Redactle, etc), a conversation beg...